GitHub user piaozhexiu opened a pull request:

    https://github.com/apache/spark/pull/6921

    [SPARK-6910] [SQL] [WIP] Support for pushing predicates down to metastore 
for partition pruning

    @marmbrus per our email conversation, I am sending a PR that implements the 
idea that I outlined in the jira FYI.
    * `Analyzer` captures predicates if a `unresolved relation` is followed by 
a `filter expression` and pushes them down into `HiveMetastoreCatalog`.
    * `HiveMetastoreCatalog` extracts partition predicates whose types are 
string and integral and constructs a string representation for them.
    * Hive client invokes `getPartitionsByFilter(Table tbl, String filter)` 
with the filter string.
    
    In addition, I understand better the limitations of Hive 
`getPartitionsByFilter(Table tbl, String filter)` function now.
    * Good news: it works with integral types as well as string.
    * Bad news: it only works with equals operators.
    
    The following is where the *"filtering is supported only on partition keys 
of type string"* error from in Hive-
    ```java
    // 
metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
    public String getFilterPushdownParam(Table table, int partColIndex) throws 
MetaException {
      boolean isIntegralSupported = doesOperatorSupportIntegral(operator);
      String colType = table.getPartitionKeys().get(partColIndex).getType();
      // Can only support partitions whose types are string, or maybe integers
      if (!colType.equals(serdeConstants.STRING_TYPE_NAME)
          && (!isIntegralSupported || !isIntegralType(colType))) {
        throw new MetaException("Filtering is supported only on partition keys 
of type " +
            "string" + (isIntegralSupported ? ", or integral types" : ""));
      }
      boolean isStringValue = value instanceof String;
      if (!isStringValue && (!isIntegralSupported || !(value instanceof Long))) 
{
        throw new MetaException("Filtering is supported only on partition keys 
of type " +
            "string" + (isIntegralSupported ? ", or integral types" : ""));
      }
      return isStringValue ? (String) value : Long.toString((Long) value);
    }
    ```
    As can be seen, one of following conditions has to be met-
    * Partition key is string type; or
    * Partition key is integral type, and operator supports integral types. 
(i.e `doesOperatorSupportIntegral()` returns true.) 
    
    But looking at the `doesOperatorSupportIntegral()` function, only equals 
and  not-equals operators support integral types-
    ```java
    // 
metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
    private static boolean doesOperatorSupportIntegral(Operator operator) {
      // TODO: for SQL-based filtering, this could be amended if we added casts.
      return (operator == Operator.EQUALS)
          || (operator == Operator.NOTEQUALS)
          || (operator == Operator.NOTEQUALS2);
    }
    ```
    This is not good because `<`, `>`, `<=`, and `>=` which are very common in 
practice are not supported.
    
    In this PR, I restricted predicate pushdown to `=`. With that restriction, 
all the unit tests pass now.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/piaozhexiu/spark SPARK-6910

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6921.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6921
    
----
commit 945a882db58191ef3ee8eec1197cc4bf3866770d
Author: Cheolsoo Park <[email protected]>
Date:   2015-06-19T20:56:22Z

    Implement predicate pushdown for hive metastore catalog

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to