[ 
https://issues.apache.org/jira/browse/SPARK-25212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800201#comment-16800201
 ] 

Jose Sanchez commented on SPARK-25212:
--------------------------------------

The issue here is that when catalyst runs the *Operator Optimization before 
Inferring Filters* batch, it removes the conditions from the Join Inner, so the 
*Check Cartesian Products* batch detects a cartesian product:
{code}
Unable to find source-code formatter for language: shell. Available languages 
are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, 
groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, 
php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, 
yaml19/03/18 06:23:40 TRACE BaseSessionStateBuilder$$anon$2:
=== Applying Rule org.apache.spark.sql.catalyst.optimizer.FoldablePropagation 
===
!Join Inner, (value1#3 = value3#10)                 Join Inner, (value1#3 = 1)
:- Project [value#1 AS value1#3]                   :- Project [value#1 AS 
value1#3]
:  +- LocalRelation [value#1]                      : +- LocalRelation [value#1]
+- Project [value#6 AS value2#8, 1 AS value3#10]   +- Project [value#6 AS 
value2#8, 1 AS value3#10]
   +- LocalRelation [value#6]                         +- LocalRelation [value#6]

19/03/18 06:23:40 TRACE BaseSessionStateBuilder$$anon$2:
=== Applying Rule 
org.apache.spark.sql.catalyst.optimizer.PushPredicateThroughJoin ===
!Join Inner, (value1#3 = 1)                         Join Inner
!:- Project [value#1 AS value1#3]                   :- Filter (value1#3 = 1)
!:  +- LocalRelation [value#1]                      : +- Project [value#1 AS 
value1#3]
!+- Project [value#6 AS value2#8, 1 AS value3#10]   : +- LocalRelation [value#1]
!   +- LocalRelation [value#6]                      +- Project [value#6 AS 
value2#8, 1 AS value3#10]
!                                                      +- LocalRelation 
[value#6]

...

=== Result of Batch Operator Optimization before Inferring Filters ===
!Join Inner, (value1#3 = value3#10)      Join Inner
:- Project [value#1 AS value1#3]        :- Project [value#1 AS value1#3]
!:  +- LocalRelation [value#1]           : +- Filter (value#1 = 1)
!+- Project [value2#8, 1 AS value3#10]   : +- LocalRelation [value#1]
!   +- Project [value#6 AS value2#8]     +- Project [value#6 AS value2#8, 1 AS 
value3#10]
!      +- LocalRelation [value#6]           +- LocalRelation [value#6]
{code}
 

It makes sense because the join condition has a literal, which is propagated as 
a Filter. I had a similar use case where a literal ends up as a *JOIN* 
condition, I guess this does not happen very frequently, but it has been fixed 
in Spark 2.4.0 by accident with SPARK-25212, thanks to the Batch *LocalRelation 
early* which collapses the literal into the *LocalRelation*, so none of the 
rules above are triggered.

I wonder, is it worth to backport SPARK-25212 to Spark 2.3 to "fix" this issue? 
as I said, looks like this is not a common use case, but Spark 2.4.0 has a 
different outcome for these situations.

> Support Filter in ConvertToLocalRelation
> ----------------------------------------
>
>                 Key: SPARK-25212
>                 URL: https://issues.apache.org/jira/browse/SPARK-25212
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.3.1
>            Reporter: Bogdan Raducanu
>            Assignee: Bogdan Raducanu
>            Priority: Major
>             Fix For: 2.4.0
>
>
> ConvertToLocalRelation can make short queries faster but currently it only 
> supports Project and Limit.
> It can be extended with other operators such as Filter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to