GitHub user piaozhexiu opened a pull request:

    https://github.com/apache/spark/pull/7216

    [SPARK-6910] [SQL] Support for pushing predicates down to metastore for 
partition pruning

    This PR supersedes my old one #6921. Since my patch has changed quite a 
bit, I am opening a new PR to make it easier to review.
    
    The changes include-
    * Implement `toMetastoreFilter()` function in `HiveShim` that takes 
`Seq[Expression]` and converts them into a filter string for Hive metastore.
     * This functions matches all the `AttributeReference` + 
`BinaryComparisonOp` + `Integral/StringType` patterns in `Seq[Expression]` and 
fold them into a string.
    * Change `hiveQlPartitions` field in `MetastoreRelation` to 
`getHiveQlPartitions()` function that takes a filter string parameter.
    * Call `getHiveQlPartitions()` in `HiveTableScan` with a filter string.
    
    But there are some cases in which predicate pushdown is disabled-
    
    Case | Predicate pushdown
    ------- | -----------------------------
    Hive integral and string types | Yes
    Hive varchar type | No
    Hive 0.13 and newer | Yes
    Hive 0.12 and older | No
    convertMetastoreParquet=false | Yes
    convertMetastoreParquet=true | No
    
    In case of `convertMetastoreParquet=true`, predicates are not pushed down 
because this conversion happens in an `Analyzer` rule 
(`HiveMetastoreCatalog.ParquetConversions`). At this point, `HiveTableScan` 
hasn't run, so predicates are not available. But reading the source code, I 
think it is intentional to convert the entire Hive table w/ all the partitions 
into `ParquetRelation` because then `ParquetRelation` can be cached and reused 
for any query against that table. Please correct me if I am wrong.
    
    cc @marmbrus 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/piaozhexiu/spark SPARK-6910-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7216.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7216
    
----
commit dce7abf40003ad36675034e9467a5003f7e551ad
Author: Cheolsoo Park <[email protected]>
Date:   2015-07-03T19:51:03Z

    Predicate pushdown into Hive metastore

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to