[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

viirya Mon, 26 Dec 2016 00:58:05 -0800

Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/14452
  
    I would like to share some numbers I ran this on my local cluster (5 nodes, 
on yarn, 8GB).
    
    after this:
    
    q2: 9955 ms
    q11: 23490 ms
    q39a: 11226 ms
    q39b: 11303 ms
    q47: 21878 ms
    q57: 19661 ms
    q59: 10803 ms
    q65: 9949 ms
    q74: 20290 ms
    q75: 21561 ms
    
    before this:
    
    q2: 9414 ms
    q11: 29764 ms
    q39a: 12961 ms
    q39b: 11578 ms
    q47: 46424 ms
    q57: 34798 ms
    q59: 10420 ms
    q65: 10259 ms
    q74: 29574 ms
    q75: 27767 ms
    
    q64 causes out-of-memory on the cluster. These queries are CTE queries.
    
    The data is generated with spark-sql-perf using scaleFactor = 1 to make 
these queries working on my cluster.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

Reply via email to