GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/15796
[SPARK-18125][SQL][Branch-2.0] Fix a compilation error in codegen due to
splitExpression
## What changes were proposed in this pull request?
Backport to branch 2.0.
As reported in the jira, sometimes the generated java code in codegen will
cause compilation error.
Code snippet to test it:
case class Route(src: String, dest: String, cost: Int)
case class GroupedRoutes(src: String, dest: String, routes: Seq[Route])
val ds = sc.parallelize(Array(
Route("a", "b", 1),
Route("a", "b", 2),
Route("a", "c", 2),
Route("a", "d", 10),
Route("b", "a", 1),
Route("b", "a", 5),
Route("b", "c", 6))
).toDF.as[Route]
val grped = ds.map(r => GroupedRoutes(r.src, r.dest, Seq(r)))
.groupByKey(r => (r.src, r.dest))
.reduceGroups { (g1: GroupedRoutes, g2: GroupedRoutes) =>
GroupedRoutes(g1.src, g1.dest, g1.routes ++ g2.routes)
}.map(_._2)
The problem here is, in `ReferenceToExpressions` we evaluate the children
vars to local variables. Then the result expression is evaluated to use those
children variables. In the above case, the result expression code is too long
and will be split by `CodegenContext.splitExpression`. So those local variables
cannot be accessed and cause compilation error.
## How was this patch tested?
Jenkins tests.
Please review
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before
opening a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1
fix-codege-compilation-error-2.0
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/15796.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #15796
----
commit 0358dff2a44276484bab1f66def9e34f06e904ee
Author: Liang-Chi Hsieh <[email protected]>
Date: 2016-11-07T15:16:43Z
Backport SPARK-18125 to branch 2.0.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]