Github user jlhitt commented on the issue: https://github.com/apache/spark/pull/14762 @srowen and @rxin, Sorry for the delay in getting this data to you.. Let me know if you have any questions. To see if there were any regressions we ran tests on 2-chip Broadwell E5 v4 10core/chip server. We focused on single node so any regression in performance was not obscured by multi-node scaling issues. All tests were run at two sizes with the size of the larger one being 10x the smaller. 1) a variety of in-memory Spark SQL queries using DataFrames. These included Full Table Scans, range scans, SQL with subselects, joins, etc... 2) Spark SQL using .cube.orderby methods 3) Spark SQL doing Pivot based on the following blog info https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-spark.html including the larger problem it was almost 4% faster with the fix. 4) We also looked at the overall run time of both. Same conclusion as detailed timings. If you'd like to review the spark.conf, please let us know. Spark SQL test: -------------------------- * 2 executors (EXI), 20 spark.executor.cores(EXC), Shuffle partitions(40) * SF is test size<br/> "SF 10" was run with 100 iterations to reduce system variance<br/> "SF 100" was run with 100 iterations to reduce system variance, 10x larger than "SF 10" * X6-2 2 Broadwell(E5 v4) chips: Each chip is 10-core (20 hyperthread/vCPU) * OFFHEAP false <table> <tr> <th>SYS</th><th>EXI</th><th>EXC</th><th>SHP</th><th>Q1</th><th>Q2</th><th>Q3</th> <th>Q4</th><th>Q5</th><th>Q6</th><th>Cube</th><th>Pivot</th><th>Size</th><th>Version</th><th>times</th> </tr> <tr> <td>X6-2</td><td>2</td><td>20</td><td>40</td><td>0.07</td><td>0.14</td><td>1.1</td><td> 0.9</td><td>1.1</td><td>1.1</td><td>2.6</td><td>2.3</td><td> 10</td><td> 091916-base </td><td> 100 iterations</td> </tr> <tr> <td>X6-2</td><td> 2</td><td>20</td><td>40</td><td>0.07</td><td>0.13</td><td>1.1</td><td> 0.9</td><td>1.1</td><td>1.1</td><td>2.8</td><td>2.4</td><td>10</td><td> 091916-fixes</td><td> 100 iterations</td> </tr> <tr></th> <tr> <td>X6-2</td><td>2</td><td>20</td><td>40</td><td>0.19</td><td>0.79</td><td>8.5</td><td> 6.5</td><td>8.2</td><td>8.2</td><td>18.2</td><td>19.1</td><td>100</td><td> 091916-base</td><td>100 iterations</td> </th> <tr> <td>X6-2</td><td>2</td><td>20</td><td>40</td><td>0.19</td><td>0.80</td><td>8.2</td><td> 6.1</td><td>8.4</td><td>8.4</td><td>18.0</td><td>18.4</td><td>100</td><td> 091916-fixes</td><td> 100 iterations</td> </tr> </table>
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org