GitHub user DonnyZone opened a pull request:
https://github.com/apache/spark/pull/18920
[SPARK-19471][SQL]AggregationIterator does not initialize the generated
result projection before using it
## What changes were proposed in this pull request?
Recently, we have also encountered such NPE issues in our production
environment as described in:
https://issues.apache.org/jira/browse/SPARK-19471
This issue can be reproduced by the following examples:
` val df = spark.createDataFrame(Seq(("1", 1), ("1", 2), ("2", 3), ("2",
4))).toDF("x", "y")
//HashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
df.groupBy("x").agg(rand(),sum("y")).show()
//ObjectHashAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
df.groupBy("x").agg(rand(),collect_list("y")).show()
//SortAggregate, SQLConf.WHOLESTAGE_CODEGEN_ENABLED.key=false
&&SQLConf.USE_OBJECT_HASH_AGG.key=false
df.groupBy("x").agg(rand(),collect_list("y")).show()`
`
This PR is based on PR-16820(https://github.com/apache/spark/pull/16820)
with test cases for all aggregation paths. We want to push it forward.
> When AggregationIterator generates result projection, it does not call
the initialize method of the Projection class. This will cause a runtime
NullPointerException when the projection involves nondeterministic expressions.
## How was this patch tested?
unit test
verified in production environment
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/DonnyZone/spark Branch-spark-19471
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18920.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18920
----
commit b932d2f3a6741a8ef052cbd8087f4b0836c617d6
Author: donnyzone <[email protected]>
Date: 2017-08-11T13:00:00Z
spark-19471
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]