GitHub user viirya reopened a pull request:
https://github.com/apache/spark/pull/18931
[SPARK-21717][SQL][WIP] Decouple consume functions of physical operators in
whole-stage codegen
## What changes were proposed in this pull request?
It has been observed in SPARK-21603 that whole-stage codegen suffers
performance degradation, if the generated functions are too long to be
optimized by JIT.
We basically produce a single function to incorporate generated codes from
all physical operators in whole-stage. Thus, it is possibly to grow the size of
generated function over a threshold that we can't have JIT optimization for it
anymore.
This patch is trying to decouple the logic of consuming rows in physical
operators to avoid a giant function processing rows.
## How was this patch tested?
Existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 SPARK-21717
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18931.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18931
----
commit 05274e7ad4c74e6241b5a05a9365c475f0c3c0a3
Author: Liang-Chi Hsieh <[email protected]>
Date: 2017-08-13T06:06:10Z
Decouple consume functions of physical operators in whole-stage codegen.
commit e0e7a6ecc957b4659db9b0367ef32d09537b32fd
Author: Liang-Chi Hsieh <[email protected]>
Date: 2017-08-13T07:43:17Z
shouldStop is called outside consume().
commit 413707dd0c31a15514f00aea9addca77fe1dd2ce
Author: Liang-Chi Hsieh <[email protected]>
Date: 2017-08-13T10:52:28Z
Fix the condition and the case of using continue in consume.
commit 0bb8c0ec70243e75f5593ca83788e830e9e4bc25
Author: Liang-Chi Hsieh <[email protected]>
Date: 2017-08-13T10:57:45Z
More comment.
commit 6d600d5eb4a275eb6bc72ccf353d2d1ded03635f
Author: Liang-Chi Hsieh <[email protected]>
Date: 2017-08-13T14:17:01Z
Fix aggregation.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]