[GitHub] spark pull request #14797: [SPARK-17230] [SQL] Should not pass optimized que...

davies Wed, 24 Aug 2016 14:21:04 -0700

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/14797


    [SPARK-17230] [SQL] Should not pass optimized query into QueryExecution in 
DataFrameWriter

    ## What changes were proposed in this pull request?
    
    Some analyzer rules have assumptions on logical plans, optimizer may break 
these assumption, we should not pass an optimized query plan into 
QueryExecution (will be analyzed again), otherwise we may some weird bugs.
    
    For example, we have a rule for decimal calculation to promote the 
precision before binary operations, use PromotePrecision as placeholder to 
indicate that this rule should not apply twice. But a Optimizer rule will 
remove this placeholder, that break the assumption, then the rule applied 
twice, cause wrong result.
    
    Ideally, we should make all the analyzer rules all idempotent, that may 
require lots of effort to double checking them one by one (may be not easy).
    
    An easier approach could be never feed a optimized plan into Analyzer, this 
PR fix the case for RunnableComand, they will be optimized, during execution, 
the passed `query` will also be passed into QueryExecution again. This PR make 
these `query` not part of the children, so they will not be optimized and 
analyzed again.
    
    Right now, we did not know a logical plan is optimized or not, we could 
introduce a flag for that, and make sure a optimized logical plan will not be 
analyzed again. 
    
    ## How was this patch tested?
    
    Added regression tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark fix_writer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14797.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14797
    
----
commit 9e1c555ebbab62c73f0b5a39617e5cd284720851
Author: Davies Liu <[email protected]>
Date:   2016-08-24T21:06:17Z

    fix DataFrameWriter

commit 25c451a67aeb912c0d215548e49c1a661d308b14
Author: Davies Liu <[email protected]>
Date:   2016-08-24T21:07:00Z

    Merge branch 'master' of github.com:apache/spark into fix_writer

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #14797: [SPARK-17230] [SQL] Should not pass optimized que...

Reply via email to