[GitHub] spark issue #14762: [SPARK-16962][CORE][SQL] Fix misaligned record accesses ...

jlhitt Sun, 25 Sep 2016 07:29:11 -0700

Github user jlhitt commented on the issue:

    https://github.com/apache/spark/pull/14762
  
    @srowen and @rxin, Sorry for the delay in getting this data to you.. Let me 
know if you have any questions.
    
    To see if there were any regressions we ran tests on 2-chip Broadwell E5 v4 
10core/chip server.
    We focused on single node so any regression in performance was not obscured 
by multi-node
    scaling issues.  All tests were run at two sizes with the size of the 
larger one being 10x the smaller.
    
    1) a variety of in-memory Spark SQL queries using DataFrames.
        These included Full Table Scans, range scans, SQL with subselects, 
joins, etc...
    2) Spark SQL using .cube.orderby methods
    3) Spark SQL doing Pivot based on the following blog info
        
https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-spark.html 
        including the larger problem it was almost 4% faster with the fix.
    4) We also looked at the overall run time of both.  Same conclusion as 
detailed timings.
    
    If you'd like to review the spark.conf, please let us know.
    
    Spark SQL test:
    --------------------------
    * 2 executors (EXI), 20 spark.executor.cores(EXC), Shuffle partitions(40)
    * SF is test size<br/>
      "SF 10" was run with 100 iterations to reduce system variance<br/>
      "SF 100" was run with 100 iterations to reduce system variance, 10x 
larger than "SF 10"
    * X6-2 2 Broadwell(E5 v4) chips: Each chip is 10-core (20 hyperthread/vCPU)
    * OFFHEAP false
    
    <table>
    <tr>
    
<th>SYS</th><th>EXI</th><th>EXC</th><th>SHP</th><th>Q1</th><th>Q2</th><th>Q3</th>
    
<th>Q4</th><th>Q5</th><th>Q6</th><th>Cube</th><th>Pivot</th><th>Size</th><th>Version</th><th>times</th>
    </tr>
    <tr>
    
<td>X6-2</td><td>2</td><td>20</td><td>40</td><td>0.07</td><td>0.14</td><td>1.1</td><td>
 0.9</td><td>1.1</td><td>1.1</td><td>2.6</td><td>2.3</td><td>   10</td><td>   
091916-base </td><td>   100 iterations</td>
    </tr>
    <tr>
    <td>X6-2</td><td> 
2</td><td>20</td><td>40</td><td>0.07</td><td>0.13</td><td>1.1</td><td> 
0.9</td><td>1.1</td><td>1.1</td><td>2.8</td><td>2.4</td><td>10</td><td> 
091916-fixes</td><td> 100 iterations</td>
    </tr>
    <tr></th>
    <tr>
    
<td>X6-2</td><td>2</td><td>20</td><td>40</td><td>0.19</td><td>0.79</td><td>8.5</td><td>
   6.5</td><td>8.2</td><td>8.2</td><td>18.2</td><td>19.1</td><td>100</td><td> 
091916-base</td><td>100 iterations</td>
    </th>
    <tr>
    
<td>X6-2</td><td>2</td><td>20</td><td>40</td><td>0.19</td><td>0.80</td><td>8.2</td><td>
   6.1</td><td>8.4</td><td>8.4</td><td>18.0</td><td>18.4</td><td>100</td><td> 
091916-fixes</td><td> 100 iterations</td>
    </tr>
    </table>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14762: [SPARK-16962][CORE][SQL] Fix misaligned record accesses ...

Reply via email to