Github user AndreSchumacher commented on a diff in the pull request:
https://github.com/apache/spark/pull/511#discussion_r12022983
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala ---
@@ -175,12 +175,35 @@ abstract class SparkStrategies extends
QueryPlanner[SparkPlan] {
InsertIntoParquetTable(relation, planLater(child),
overwrite=true)(sparkContext) :: Nil
case logical.InsertIntoTable(table: ParquetRelation, partition,
child, overwrite) =>
InsertIntoParquetTable(table, planLater(child),
overwrite)(sparkContext) :: Nil
- case PhysicalOperation(projectList, filters, relation:
ParquetRelation) =>
- // TODO: Should be pushing down filters as well.
+ case PhysicalOperation(projectList, filters: Seq[Expression],
relation: ParquetRelation) => {
+ val remainingFilters =
+ if
(sparkContext.conf.getBoolean(ParquetFilters.PARQUET_FILTER_PUSHDOWN_ENABLED,
true)) {
+ filters.filter {
+ // Note: filters cannot be pushed down to Parquet if they
contain more complex
+ // expressions than simple "Attribute cmp Literal"
comparisons. Here we remove
+ // all filters that have been pushed down. Note that a
predicate such as
+ // "A AND B" can result in "A" being pushed down.
--- End diff --
Good point, bad example. That's why I initially didn't treat ANDs at all
when creating the filters from the expressions. But then I thought one could
have expressions such as (A AND B) OR C which should probably be treated in the
planner I guess and turned into (A OR C) AND (B OR C) but currently are not.
Please correct me if I am wrong. It may be that the parser doesn't currently
allow these kind of filter expressions with '(', ')' though although nothing
speaks against them I guess.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---