[ https://issues.apache.org/jira/browse/SPARK-25212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800201#comment-16800201 ]
Jose Sanchez commented on SPARK-25212: -------------------------------------- The issue here is that when catalyst runs the *Operator Optimization before Inferring Filters* batch, it removes the conditions from the Join Inner, so the *Check Cartesian Products* batch detects a cartesian product: {code} Unable to find source-code formatter for language: shell. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml19/03/18 06:23:40 TRACE BaseSessionStateBuilder$$anon$2: === Applying Rule org.apache.spark.sql.catalyst.optimizer.FoldablePropagation === !Join Inner, (value1#3 = value3#10) Join Inner, (value1#3 = 1) :- Project [value#1 AS value1#3] :- Project [value#1 AS value1#3] : +- LocalRelation [value#1] : +- LocalRelation [value#1] +- Project [value#6 AS value2#8, 1 AS value3#10] +- Project [value#6 AS value2#8, 1 AS value3#10] +- LocalRelation [value#6] +- LocalRelation [value#6] 19/03/18 06:23:40 TRACE BaseSessionStateBuilder$$anon$2: === Applying Rule org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin === !Join Inner, (value1#3 = 1) Join Inner !:- Project [value#1 AS value1#3] :- Filter (value1#3 = 1) !: +- LocalRelation [value#1] : +- Project [value#1 AS value1#3] !+- Project [value#6 AS value2#8, 1 AS value3#10] : +- LocalRelation [value#1] ! +- LocalRelation [value#6] +- Project [value#6 AS value2#8, 1 AS value3#10] ! +- LocalRelation [value#6] ... === Result of Batch Operator Optimization before Inferring Filters === !Join Inner, (value1#3 = value3#10) Join Inner :- Project [value#1 AS value1#3] :- Project [value#1 AS value1#3] !: +- LocalRelation [value#1] : +- Filter (value#1 = 1) !+- Project [value2#8, 1 AS value3#10] : +- LocalRelation [value#1] ! +- Project [value#6 AS value2#8] +- Project [value#6 AS value2#8, 1 AS value3#10] ! +- LocalRelation [value#6] +- LocalRelation [value#6] {code} It makes sense because the join condition has a literal, which is propagated as a Filter. I had a similar use case where a literal ends up as a *JOIN* condition, I guess this does not happen very frequently, but it has been fixed in Spark 2.4.0 by accident with SPARK-25212, thanks to the Batch *LocalRelation early* which collapses the literal into the *LocalRelation*, so none of the rules above are triggered. I wonder, is it worth to backport SPARK-25212 to Spark 2.3 to "fix" this issue? as I said, looks like this is not a common use case, but Spark 2.4.0 has a different outcome for these situations. > Support Filter in ConvertToLocalRelation > ---------------------------------------- > > Key: SPARK-25212 > URL: https://issues.apache.org/jira/browse/SPARK-25212 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.1 > Reporter: Bogdan Raducanu > Assignee: Bogdan Raducanu > Priority: Major > Fix For: 2.4.0 > > > ConvertToLocalRelation can make short queries faster but currently it only > supports Project and Limit. > It can be extended with other operators such as Filter. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org