[GitHub] spark pull request: SPARK-1487 [SQL] Support record filtering via ...

AndreSchumacher Fri, 25 Apr 2014 22:15:27 -0700

Github user AndreSchumacher commented on a diff in the pull request:

    https://github.com/apache/spark/pull/511#discussion_r12022983
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
    @@ -175,12 +175,35 @@ abstract class SparkStrategies extends 
QueryPlanner[SparkPlan] {
             InsertIntoParquetTable(relation, planLater(child), 
overwrite=true)(sparkContext) :: Nil
           case logical.InsertIntoTable(table: ParquetRelation, partition, 
child, overwrite) =>
             InsertIntoParquetTable(table, planLater(child), 
overwrite)(sparkContext) :: Nil
    -      case PhysicalOperation(projectList, filters, relation: 
ParquetRelation) =>
    -        // TODO: Should be pushing down filters as well.
    +      case PhysicalOperation(projectList, filters: Seq[Expression], 
relation: ParquetRelation) => {
    +        val remainingFilters =
    +          if 
(sparkContext.conf.getBoolean(ParquetFilters.PARQUET_FILTER_PUSHDOWN_ENABLED, 
true)) {
    +            filters.filter {
    +              // Note: filters cannot be pushed down to Parquet if they 
contain more complex
    +              // expressions than simple "Attribute cmp Literal" 
comparisons. Here we remove
    +              // all filters that have been pushed down. Note that a 
predicate such as
    +              // "A AND B" can result in "A" being pushed down.
    --- End diff --
    
    Good point, bad example. That's why I initially didn't treat ANDs at all 
when creating the filters from the expressions. But then I thought one could 
have expressions such as (A AND B) OR C which should probably be treated in the 
planner I guess and turned into (A OR C) AND (B OR C) but currently are not. 
Please correct me if I am wrong. It may be that the parser doesn't currently 
allow these kind of filter expressions with '(', ')' though although nothing 
speaks against them I guess.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1487 [SQL] Support record filtering via ...

Reply via email to