Github user jlhitt commented on the issue:
https://github.com/apache/spark/pull/14762
@srowen and @rxin, Sorry for the delay in getting this data to you.. Let me
know if you have any questions.
To see if there were any regressions we ran tests on 2-chip Broadwell E5 v4
10core/chip server.
We focused on single node so any regression in performance was not obscured
by multi-node
scaling issues. All tests were run at two sizes with the size of the
larger one being 10x the smaller.
1) a variety of in-memory Spark SQL queries using DataFrames.
These included Full Table Scans, range scans, SQL with subselects,
joins, etc...
2) Spark SQL using .cube.orderby methods
3) Spark SQL doing Pivot based on the following blog info
https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-spark.html
including the larger problem it was almost 4% faster with the fix.
4) We also looked at the overall run time of both. Same conclusion as
detailed timings.
If you'd like to review the spark.conf, please let us know.
Spark SQL test:
--------------------------
* 2 executors (EXI), 20 spark.executor.cores(EXC), Shuffle partitions(40)
* SF is test size<br/>
"SF 10" was run with 100 iterations to reduce system variance<br/>
"SF 100" was run with 100 iterations to reduce system variance, 10x
larger than "SF 10"
* X6-2 2 Broadwell(E5 v4) chips: Each chip is 10-core (20 hyperthread/vCPU)
* OFFHEAP false
<table>
<tr>
<th>SYS</th><th>EXI</th><th>EXC</th><th>SHP</th><th>Q1</th><th>Q2</th><th>Q3</th>
<th>Q4</th><th>Q5</th><th>Q6</th><th>Cube</th><th>Pivot</th><th>Size</th><th>Version</th><th>times</th>
</tr>
<tr>
<td>X6-2</td><td>2</td><td>20</td><td>40</td><td>0.07</td><td>0.14</td><td>1.1</td><td>
0.9</td><td>1.1</td><td>1.1</td><td>2.6</td><td>2.3</td><td> 10</td><td>
091916-base </td><td> 100 iterations</td>
</tr>
<tr>
<td>X6-2</td><td>
2</td><td>20</td><td>40</td><td>0.07</td><td>0.13</td><td>1.1</td><td>
0.9</td><td>1.1</td><td>1.1</td><td>2.8</td><td>2.4</td><td>10</td><td>
091916-fixes</td><td> 100 iterations</td>
</tr>
<tr></th>
<tr>
<td>X6-2</td><td>2</td><td>20</td><td>40</td><td>0.19</td><td>0.79</td><td>8.5</td><td>
6.5</td><td>8.2</td><td>8.2</td><td>18.2</td><td>19.1</td><td>100</td><td>
091916-base</td><td>100 iterations</td>
</th>
<tr>
<td>X6-2</td><td>2</td><td>20</td><td>40</td><td>0.19</td><td>0.80</td><td>8.2</td><td>
6.1</td><td>8.4</td><td>8.4</td><td>18.0</td><td>18.4</td><td>100</td><td>
091916-fixes</td><td> 100 iterations</td>
</tr>
</table>
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]