[GitHub] spark pull request #21432: [SPARK-24373][SQL] Add AnalysisBarrier to Relatio...

mgaido91 Fri, 25 May 2018 07:11:36 -0700

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/21432


    [SPARK-24373][SQL] Add AnalysisBarrier to RelationalGroupedDataset's child

    ## What changes were proposed in this pull request?
    
    When we create a `RelationalGroupedDataset` we set its child to the 
`logicalPlan` of the `DataFrame` we need to aggregate. Since the `logicalPlan` 
is already analyzed, we should not analyze it again. But this happens when the 
new plan of the aggregate is analyzed.
    
    The current behavior in most of the cases is likely to produce no harm, but 
in other cases re-analyzing an analyzed plan can change it, since the analysis 
is not idempotent. This can cause issues liek the one described in the JIRA 
(missing to find a cached plan).
    
    The PR adds an `AnalysisBarrier` to the `logicalPlan` which is used as 
child of `RelationalGroupedDataset`.
    
    ## How was this patch tested?
    
    added UT


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-24373

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21432
    
----
commit 361fee8082b3401128ea13be82e878a987bc9b61
Author: Marco Gaido <marcogaido91@...>
Date:   2018-05-25T13:59:49Z

    [SPARK-24373][SQL] Add AnalysisBarrier to RelationalGroupedDataset's child

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21432: [SPARK-24373][SQL] Add AnalysisBarrier to Relatio...

Reply via email to