GitHub user inouehrs opened a pull request:
https://github.com/apache/spark/pull/13539
[SPARK-15795] [SQL] Enable more optimizations in whole stage codegen when
isNull is a compile-time constant
## What changes were proposed in this pull request?
Whole stage codegen often creates `isNull` variable initialized with
constant _false_, like
`boolean mapelements_isNull = false || false;`
If there is no further assignment for this `isNull` variable, whole stage
codegen can do more optimizations by assuming `isNull` as a compile-time
constant.
In the example below, which is generated for a dataset map operation,
`mapelements_isNull` defined at line 115 can be assumed by a compile-time
constant (false).
By assuming this as a constant, the whole stage codegen eliminates
`zeroOutNullBytes` at line 119 and an if-statement at line 121.
In addition to the benefits of improved readability of generated code,
eliminating `zeroOutNullBytes` will give performance advantage since it is
difficult to remove for Java JIT compiler.
without this patch
```
/* 107 */ // CONSUME: Project [id#0L AS l#3L]
/* 108 */ // CONSUME: DeserializeToObject l#3: bigint, obj#16: bigint
/* 109 */ // CONSUME: MapElements <function1>, obj#17: bigint
/* 110 */ // CONSUME: SerializeFromObject [input[0, bigint, true] AS
value#18L]
/* 111 */ // <function1>.apply
/* 112 */ Object mapelements_obj = ((Expression)
references[1]).eval(null);
/* 113 */ scala.Function1 mapelements_value1 = (scala.Function1)
mapelements_obj;
/* 114 */
/* 115 */ boolean mapelements_isNull = false || false;
/* 116 */ final long mapelements_value = mapelements_isNull ? -1L :
(Long) mapelements_value1.apply(range_value);
/* 117 */
/* 118 */ // CONSUME: WholeStageCodegen
/* 119 */ serializefromobject_rowWriter.zeroOutNullBytes();
/* 120 */
/* 121 */ if (mapelements_isNull) {
/* 122 */ serializefromobject_rowWriter.setNullAt(0);
/* 123 */ } else {
/* 124 */ serializefromobject_rowWriter.write(0, mapelements_value);
/* 125 */ }
/* 126 */ append(serializefromobject_result);
```
with this patch
```
/* 107 */ // CONSUME: Project [id#0L AS l#3L]
/* 108 */ // CONSUME: DeserializeToObject l#3: bigint, obj#9: bigint
/* 109 */ // CONSUME: MapElements <function1>, obj#10: bigint
/* 110 */ // CONSUME: SerializeFromObject [input[0, bigint, true] AS
value#11L]
/* 111 */ // <function1>.apply
/* 112 */ Object mapelements_obj = ((Expression)
references[1]).eval(null);
/* 113 */ scala.Function1 mapelements_value1 = (scala.Function1)
mapelements_obj;
/* 114 */
/* 115 */ final boolean mapelements_isNull = false || false;
/* 116 */ final long mapelements_value = mapelements_isNull ? -1L :
(Long) mapelements_value1.apply(range_value);
/* 117 */
/* 118 */ // CONSUME: WholeStageCodegen
/* 119 */ serializefromobject_rowWriter.write(0, mapelements_value);
/* 120 */ append(serializefromobject_result);
```
## How was this patch tested?
by unit tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/inouehrs/spark dev_nullcheck_opt
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/13539.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #13539
----
commit fb6f3b5a6c5fb80adc249fbb54a8c2ed884c7dbb
Author: Hiroshi Inoue <[email protected]>
Date: 2016-06-06T17:31:04Z
enable null check elimination based on generated code
commit 32f158ce5da29f9562c9aa3b4751d2241c4898ca
Author: Hiroshi Inoue <[email protected]>
Date: 2016-06-06T19:41:32Z
Merge branch 'apache/master' into dev_nullcheck_opt
commit 60f582dc3e75db6ff5fe642b692f22f5d7bc7ab2
Author: Hiroshi Inoue <[email protected]>
Date: 2016-06-07T01:53:33Z
make definition of isNull final
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]