[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

MickDavies Sun, 07 Jun 2015 05:18:07 -0700

GitHub user MickDavies reopened a pull request:

    https://github.com/apache/spark/pull/6673


    [SPARK-8077][SQL] Optimization for  TreeNodes with large numbers of children

    For example large IN clauses
    
    Large IN clauses are parsed very slowly. For example SQL below (10K items 
in IN) takes 45-50s.
    
    s"""SELECT * FROM Person WHERE ForeName IN ('${(1 to 10000).map("n" + 
_).mkString("','")}')"""
    
    This is principally due to TreeNode which repeatedly call contains on 
children, where children in this case is a List that is 10K long. In effect 
parsing for large IN clauses is O(N squared).
    A lazily initialised Set based on children for contains reduces parse time 
to around 2.5s

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/MickDavies/spark SPARK-8077

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6673.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6673
    
----
commit e6be8beb72936bb457343e6c9bd0dfddeede040f
Author: Michael Davies <[email protected]>
Date:   2015-06-05T18:02:15Z

    SPARK-8077: Optimization for  TreeNodes with large numbers of children
    
    For example large IN clauses

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8077][SQL] Optimization for TreeNodes w...

Reply via email to