GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/17541

    [SPARK-20229][SQL] add semanticHash to QueryPlan

    ## What changes were proposed in this pull request?
    
    Like `Expression`, `QueryPlan` should also have a `semanticHash` method, 
then we can put plans to a hash map and look it up fast. This PR refactors 
`QueryPlan` to follow `Expression` and put all the normalization logic in 
`QueryPlan.canonicalized`, so that it's very natural to implement 
`semanticHash`.
    
    follow-up: improve `CacheManager` to leverage this `semanticHash` and speed 
up plan lookup, instead of iterating all cached plans.
    
    ## How was this patch tested?
    
    existing tests. Note that we don't need to test the `semanticHash` method, 
once the existing tests prove `sameResult` is correct, we are good.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark plan-semantic

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17541.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17541
    
----
commit 02f4a020fc4e91929782ae293895f7e1c5977d72
Author: Wenchen Fan <[email protected]>
Date:   2017-04-05T17:53:42Z

    add semanticHash to QueryPlan

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to