Russell Alexander Spitzer created SPARK-10978:
-------------------------------------------------

             Summary: Allow PrunedFilterScan to eliminate predicates from 
further evaluation
                 Key: SPARK-10978
                 URL: https://issues.apache.org/jira/browse/SPARK-10978
             Project: Spark
          Issue Type: New Feature
          Components: SQL
    Affects Versions: 1.5.0, 1.4.0, 1.3.0
            Reporter: Russell Alexander Spitzer
             Fix For: 1.6.0


Currently PrunedFilterScan allows implementors to push down predicates to an 
underlying datasource. This is done solely as an optimization as the predicate 
will be reapplied on the Spark side as well. This allows for bloom-filter like 
operations but ends up doing a redundant scan for those sources which can do 
accurate pushdowns.

In addition it makes it difficult for underlying sources to accept queries 
which reference non-existent to provide ancillary function. In our case we 
allow a solr query to be passed in via a non-existent solr_query column. Since 
this column is not returned when Spark does a filter on "solr_query" nothing 
passes. 

Suggestion on the ML from [~marmbrus] 
{quote}
We have to try and maintain binary compatibility here, so probably the easiest 
thing to do here would be to add a method to the class.  Perhaps something like:

def unhandledFilters(filters: Array[Filter]): Array[Filter] = filters

By default, this could return all filters so behavior would remain the same, 
but specific implementations could override it.  There is still a chance that 
this would conflict with existing methods, but hopefully that would not be a 
problem in practice.
{quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to