GitHub user DonnyZone opened a pull request:

    https://github.com/apache/spark/pull/18920

    [SPARK-19471][SQL]AggregationIterator does not initialize the generated 
result projection before using it

    ## What changes were proposed in this pull request?
    
    Recently, we have also encountered such NPE issues in our production 
environment as described in:
    https://issues.apache.org/jira/browse/SPARK-19471
    
    This issue can be reproduced by the following examples:
    ` val df = spark.createDataFrame(Seq(("1", 1), ("1", 2), ("2", 3), ("2", 
4))).toDF("x", "y")
    
    //HashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
    df.groupBy("x").agg(rand(),sum("y")).show()
    
    //ObjectHashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
    df.groupBy("x").agg(rand(),collect_list("y")).show()
    
    //SortAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false 
&&SQLConf.USE_OBJECT_HASH_AGG.key=false
    df.groupBy("x").agg(rand(),collect_list("y")).show()`
    `
    
    This PR is based on PR-16820(https://github.com/apache/spark/pull/16820) 
with test cases for all aggregation paths. We want to push it forward. 
    
    > When AggregationIterator generates result projection, it does not call 
the initialize method of the Projection class. This will cause a runtime 
NullPointerException when the projection involves nondeterministic expressions.
    
    ## How was this patch tested?
    
    unit test
    verified in production environment


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/DonnyZone/spark Branch-spark-19471

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18920.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18920
    
----
commit b932d2f3a6741a8ef052cbd8087f4b0836c617d6
Author: donnyzone <[email protected]>
Date:   2017-08-11T13:00:00Z

    spark-19471

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to