GitHub user concretevitamin opened a pull request:

    https://github.com/apache/spark/pull/1238

    [SQL] [WIP] Prototype implementation of size estimations for Catalyst 
logical plans.

    The idea is that every Catalyst logical plan gets hold of an Estimates 
class, the usage of which provides useful estimations on various statistics. 
See the implementations of `ParquetRelation` and `MetastoreRelation`.
    
    This patch also includes one usage of the estimation interface -- namely, 
use physical table sizes from the estimate interface to convert an equi-join to 
a broadcast join (when doing so is beneficial, as determined by a size 
threshold). 
    
    This PR still needs some cleanups, but just want to put it out here first 
to gather feedback on the interface & high-level approach.
    
    TODOs:  
    - [ ] Add a separate test suite & improve test coverage.
    - [ ] Support `ParquetRelation` in the aforementioned BHJ optimization.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/concretevitamin/spark estimates

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1238.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1238
    
----
commit 54671bb8003629220375343427259cdecb8003ce
Author: Zongheng Yang <[email protected]>
Date:   2014-06-24T23:32:27Z

    Prototype impl of estimations for Catalyst logical plans.
    
    - Also add simple size-getters for ParquetRelation and
      MetastoreRelation.
    - Add a rule to auto-convert equi-joins to BroadcastHashJoin, if a table
      has smaller size, based on the above getter (for MetastoreRelation).

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to