GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/22284

    [SPARK-25278][SQL] Avoid duplicated Exec nodes when the same logical plan 
appears in the query

    ## What changes were proposed in this pull request?
    
    In the Planner, we collect the placeholder which need to be substituted in 
the query execution plan and once we plan them, we substitute the placeholder 
with the effective plan.
    
    In this second phase, we rely on the `==` comparison, ie. the `equals` 
method. This means that if two placeholder plans - which are different 
instances - have the same attributes (so that they are equal, according to the 
equal method) they are both substituted with their corresponding new physical 
plans. So, in such a situation, the first time we substitute both them with the 
first of the 2 new generated plan and the second time we substitute nothing.
    
    This is usually of no harm for the execution of the query itself, as the 2 
plans are identical. But since they are the same instance, now, the local 
variables are shared (which is unexpected). This causes issues for the metrics 
collected, as the same node is executed 2 times, so the metrics are accumulated 
2 times, wrongly.
    
    The PR proposes to use the `eq` method in checking which placeholder needs 
to be substituted,; thus in the previous situation, actually both the two 
different physical nodes which are created (one for each time the logical plan 
appears in the query plan) are used and the metrics are collected properly for 
each of them.
    
    ## How was this patch tested?
    
    added UT

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-25278

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22284.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22284
    
----
commit e945bf109f7e7df8683c14ae557d21d05e980efa
Author: Marco Gaido <marcogaido91@...>
Date:   2018-08-30T14:25:08Z

    [SPARK-25278][SQL] Avoid duplicated Exec nodes when the same logical plan 
appears in the query

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to