[jira] [Comment Edited] (SPARK-21652) Optimizer cannot reach a fixed point on certain queries
[ https://issues.apache.org/jira/browse/SPARK-21652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16614664#comment-16614664 ] Pengfei Chang edited comment on SPARK-21652 at 9/14/18 11:09 AM: - Hi, after this change, there are some cases not supported, see follows: select * from test1, test2, test3 where test1.a = test2.a and test1.a = test3.a and test3.a =1; The filter a = 1 will only be pushed down to test1 and test3, not test2. Is this theĀ expected behavior? was (Author: pfchang): Hi, after this change, there are some cases not supported, see follows: select * from test1, test2, test3 where test1.a = test2.a and test1.a and test3.a and test3.a =1; The filter a = 1 will only be pushed down to test1 and test3, not test2. Is this theĀ expected behavior? > Optimizer cannot reach a fixed point on certain queries > --- > > Key: SPARK-21652 > URL: https://issues.apache.org/jira/browse/SPARK-21652 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 2.2.0 >Reporter: Anton Okolnychyi >Assignee: Xiao Li >Priority: Major > Fix For: 2.3.0 > > > The optimizer cannot reach a fixed point on the following query: > {code} > Seq((1, 2)).toDF("col1", "col2").write.saveAsTable("t1") > Seq(1, 2).toDF("col").write.saveAsTable("t2") > spark.sql("SELECT * FROM t1, t2 WHERE t1.col1 = 1 AND 1 = t1.col2 AND t1.col1 > = t2.col AND t1.col2 = t2.col").explain(true) > {code} > At some point during the optimization, InferFiltersFromConstraints infers a > new constraint '(col2#33 = col1#32)' that is appended to the join condition, > then PushPredicateThroughJoin pushes it down, ConstantPropagation replaces > '(col2#33 = col1#32)' with '1 = 1' based on other propagated constraints, > ConstantFolding replaces '1 = 1' with 'true and BooleanSimplification finally > removes this predicate. However, InferFiltersFromConstraints will again infer > '(col2#33 = col1#32)' on the next iteration and the process will continue > until the limit of iterations is reached. > See below for more details > {noformat} > === Applying Rule > org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints === > !Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) > Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && > (col2#33 = col#34))) > :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && > (1 = col2#33))) :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && > ((col1#32 = 1) && (1 = col2#33))) > : +- Relation[col1#32,col2#33] parquet > : +- Relation[col1#32,col2#33] parquet > +- Filter ((1 = col#34) && isnotnull(col#34)) > +- Filter ((1 = col#34) && isnotnull(col#34)) > +- Relation[col#34] parquet >+- Relation[col#34] parquet > > === Applying Rule > org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin === > !Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && (col2#33 = > col#34))) Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) > !:- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && > (1 = col2#33))) :- Filter (col2#33 = col1#32) > !: +- Relation[col1#32,col2#33] parquet > : +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && > ((col1#32 = 1) && (1 = col2#33))) > !+- Filter ((1 = col#34) && isnotnull(col#34)) > : +- Relation[col1#32,col2#33] parquet > ! +- Relation[col#34] parquet > +- Filter ((1 = col#34) && isnotnull(col#34)) > ! >+- Relation[col#34] parquet > > === Applying Rule org.apache.spark.sql.catalyst.optimizer.CombineFilters === > Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) >Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) > !:- Filter (col2#33 = col1#32) >:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && > ((col1#32 = 1) && (1 = col2#33))) && (col2#33 = col1#32)) > !: +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) > && (1 = col2#33))) : +- Relation[col1#32,col2#33] parquet > !: +- Relation[col1#32,col2#33] parquet >+- Filter ((1 = col#34) && isnotnull(col#34)) > !+- Fi
[jira] [Comment Edited] (SPARK-21652) Optimizer cannot reach a fixed point on certain queries
[ https://issues.apache.org/jira/browse/SPARK-21652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126917#comment-16126917 ] Takeshi Yamamuro edited comment on SPARK-21652 at 8/15/17 8:08 AM: --- User 'maropu' has created a pull request for this issue: https://github.com/apache/spark/pull/18882 was (Author: maropu): The pr is: https://github.com/apache/spark/pull/18882 > Optimizer cannot reach a fixed point on certain queries > --- > > Key: SPARK-21652 > URL: https://issues.apache.org/jira/browse/SPARK-21652 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 2.2.0 >Reporter: Anton Okolnychyi > > The optimizer cannot reach a fixed point on the following query: > {code} > Seq((1, 2)).toDF("col1", "col2").write.saveAsTable("t1") > Seq(1, 2).toDF("col").write.saveAsTable("t2") > spark.sql("SELECT * FROM t1, t2 WHERE t1.col1 = 1 AND 1 = t1.col2 AND t1.col1 > = t2.col AND t1.col2 = t2.col").explain(true) > {code} > At some point during the optimization, InferFiltersFromConstraints infers a > new constraint '(col2#33 = col1#32)' that is appended to the join condition, > then PushPredicateThroughJoin pushes it down, ConstantPropagation replaces > '(col2#33 = col1#32)' with '1 = 1' based on other propagated constraints, > ConstantFolding replaces '1 = 1' with 'true and BooleanSimplification finally > removes this predicate. However, InferFiltersFromConstraints will again infer > '(col2#33 = col1#32)' on the next iteration and the process will continue > until the limit of iterations is reached. > See below for more details > {noformat} > === Applying Rule > org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints === > !Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) > Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && > (col2#33 = col#34))) > :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && > (1 = col2#33))) :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && > ((col1#32 = 1) && (1 = col2#33))) > : +- Relation[col1#32,col2#33] parquet > : +- Relation[col1#32,col2#33] parquet > +- Filter ((1 = col#34) && isnotnull(col#34)) > +- Filter ((1 = col#34) && isnotnull(col#34)) > +- Relation[col#34] parquet >+- Relation[col#34] parquet > > === Applying Rule > org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin === > !Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && (col2#33 = > col#34))) Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) > !:- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && > (1 = col2#33))) :- Filter (col2#33 = col1#32) > !: +- Relation[col1#32,col2#33] parquet > : +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && > ((col1#32 = 1) && (1 = col2#33))) > !+- Filter ((1 = col#34) && isnotnull(col#34)) > : +- Relation[col1#32,col2#33] parquet > ! +- Relation[col#34] parquet > +- Filter ((1 = col#34) && isnotnull(col#34)) > ! >+- Relation[col#34] parquet > > === Applying Rule org.apache.spark.sql.catalyst.optimizer.CombineFilters === > Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) >Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) > !:- Filter (col2#33 = col1#32) >:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && > ((col1#32 = 1) && (1 = col2#33))) && (col2#33 = col1#32)) > !: +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) > && (1 = col2#33))) : +- Relation[col1#32,col2#33] parquet > !: +- Relation[col1#32,col2#33] parquet >+- Filter ((1 = col#34) && isnotnull(col#34)) > !+- Filter ((1 = col#34) && isnotnull(col#34)) > +- Relation[col#34] parquet > ! +- Relation[col#34] parquet > > > === Applying Rule org.apache.spark.sql.catalyst.optimizer.ConstantPropagation > === > Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) > Join Inner, ((col1#32 = col#34) &
[jira] [Comment Edited] (SPARK-21652) Optimizer cannot reach a fixed point on certain queries
[ https://issues.apache.org/jira/browse/SPARK-21652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16116936#comment-16116936 ] Anton Okolnychyi edited comment on SPARK-21652 at 8/7/17 8:06 PM: -- Yes, disabling the constraint propagation helps because `InferFiltersFromConstraints` will not apply. I found several known issues regarding the performance of `InferFiltersFromConstraints` but what about the logic of `ConstantPropagation` in the above example? Should it replace such predicates as `(a = b)` with `(1 = 1)` even if it is semantically correct? was (Author: aokolnychyi): Yes, disabling the constraint propagation helps because `InferFiltersFromConstraints` will not apply. I found several issues regarding the performance of InferFiltersFromConstraints but what about the logic of `ConstantPropagation` in the above example? Should it replace such predicates as `(a = b)` with `(1 = 1)` even if it is semantically correct? > Optimizer cannot reach a fixed point on certain queries > --- > > Key: SPARK-21652 > URL: https://issues.apache.org/jira/browse/SPARK-21652 > Project: Spark > Issue Type: Bug > Components: Optimizer, SQL >Affects Versions: 2.2.0 >Reporter: Anton Okolnychyi > > The optimizer cannot reach a fixed point on the following query: > {code} > Seq((1, 2)).toDF("col1", "col2").write.saveAsTable("t1") > Seq(1, 2).toDF("col").write.saveAsTable("t2") > spark.sql("SELECT * FROM t1, t2 WHERE t1.col1 = 1 AND 1 = t1.col2 AND t1.col1 > = t2.col AND t1.col2 = t2.col").explain(true) > {code} > At some point during the optimization, InferFiltersFromConstraints infers a > new constraint '(col2#33 = col1#32)' that is appended to the join condition, > then PushPredicateThroughJoin pushes it down, ConstantPropagation replaces > '(col2#33 = col1#32)' with '1 = 1' based on other propagated constraints, > ConstantFolding replaces '1 = 1' with 'true and BooleanSimplification finally > removes this predicate. However, InferFiltersFromConstraints will again infer > '(col2#33 = col1#32)' on the next iteration and the process will continue > until the limit of iterations is reached. > See below for more details > {noformat} > === Applying Rule > org.apache.spark.sql.catalyst.optimizer.InferFiltersFromConstraints === > !Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) > Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && > (col2#33 = col#34))) > :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && > (1 = col2#33))) :- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && > ((col1#32 = 1) && (1 = col2#33))) > : +- Relation[col1#32,col2#33] parquet > : +- Relation[col1#32,col2#33] parquet > +- Filter ((1 = col#34) && isnotnull(col#34)) > +- Filter ((1 = col#34) && isnotnull(col#34)) > +- Relation[col#34] parquet >+- Relation[col#34] parquet > > === Applying Rule > org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin === > !Join Inner, ((col2#33 = col1#32) && ((col1#32 = col#34) && (col2#33 = > col#34))) Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) > !:- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) && > (1 = col2#33))) :- Filter (col2#33 = col1#32) > !: +- Relation[col1#32,col2#33] parquet > : +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && > ((col1#32 = 1) && (1 = col2#33))) > !+- Filter ((1 = col#34) && isnotnull(col#34)) > : +- Relation[col1#32,col2#33] parquet > ! +- Relation[col#34] parquet > +- Filter ((1 = col#34) && isnotnull(col#34)) > ! >+- Relation[col#34] parquet > > === Applying Rule org.apache.spark.sql.catalyst.optimizer.CombineFilters === > Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) >Join Inner, ((col1#32 = col#34) && (col2#33 = col#34)) > !:- Filter (col2#33 = col1#32) >:- Filter (((isnotnull(col1#32) && isnotnull(col2#33)) && > ((col1#32 = 1) && (1 = col2#33))) && (col2#33 = col1#32)) > !: +- Filter ((isnotnull(col1#32) && isnotnull(col2#33)) && ((col1#32 = 1) > && (1 = col2#33))) : +- Relation[col1#32,col2#33] parquet > !: +- Relation[col1#32,col2#33] parquet