GitHub user kiszk opened a pull request:
https://github.com/apache/spark/pull/11984
[Spark-14138][SQL] Fix generated SpecificColumnarIterator code can exceed
JVM size limit for cached DataFrames
## What changes were proposed in this pull request?
This PR reduces Java byte code size of method in
```SpecificColumnarIterator``` by using two approaches:
1. Generate and call ```getTYPEColumnAccessor()``` for each type, which is
actually used, for instantiating accessors
2. Group a lot of method calls (more than 4000) into a method
## How was this patch tested?
Added a new unit test to ```InMemoryColumnarQuerySuite```
Here is generate code
```java
/* 033 */ private org.apache.spark.sql.execution.columnar.CachedBatch
batch = null;
/* 034 */
/* 035 */ private
org.apache.spark.sql.execution.columnar.IntColumnAccessor accessor;
/* 036 */ private
org.apache.spark.sql.execution.columnar.IntColumnAccessor accessor1;
/* 037 */
/* 038 */ public SpecificColumnarIterator() {
/* 039 */ this.nativeOrder = ByteOrder.nativeOrder();
/* 030 */ this.mutableRow = new MutableUnsafeRow(rowWriter);
/* 041 */ }
/* 042 */
/* 043 */ public void initialize(Iterator input, DataType[] columnTypes,
int[] columnIndexes,
/* 044 */ boolean columnNullables[]) {
/* 044 */ this.input = input;
/* 046 */ this.columnTypes = columnTypes;
/* 047 */ this.columnIndexes = columnIndexes;
/* 048 */ }
/* 049 */
/* 050 */
/* 051 */ private
org.apache.spark.sql.execution.columnar.IntColumnAccessor
getIntColumnAccessor(int idx) {
/* 052 */ byte[] buffer = batch.buffers()[columnIndexes[idx]];
/* 053 */ return new
org.apache.spark.sql.execution.columnar.IntColumnAccessor(ByteBuffer.wrap(buffer).order(nativeOrder));
/* 054 */ }
/* 055 */
/* 056 */
/* 057 */
/* 058 */
/* 059 */
/* 060 */
/* 061 */ public boolean hasNext() {
/* 062 */ if (currentRow < numRowsInBatch) {
/* 063 */ return true;
/* 064 */ }
/* 065 */ if (!input.hasNext()) {
/* 066 */ return false;
/* 067 */ }
/* 068 */
/* 069 */ batch = (org.apache.spark.sql.execution.columnar.CachedBatch)
input.next();
/* 070 */ currentRow = 0;
/* 071 */ numRowsInBatch = batch.numRows();
/* 072 */ accessor = getIntColumnAccessor(0);
/* 073 */ accessor1 = getIntColumnAccessor(1);
/* 074 */
/* 075 */ return hasNext();
/* 076 */ }
/* 077 */
/* 078 */ public InternalRow next() {
/* 079 */ currentRow += 1;
/* 080 */ bufferHolder.reset();
/* 081 */ rowWriter.zeroOutNullBytes();
/* 082 */ accessor.extractTo(mutableRow, 0);
/* 083 */ accessor1.extractTo(mutableRow, 1);
/* 084 */ unsafeRow.setTotalSize(bufferHolder.totalSize());
/* 085 */ return unsafeRow;
/* 086 */ }
```
(If this patch involves UI changes, please attach a screenshot; otherwise,
remove this)
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/kiszk/spark SPARK-14138
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11984.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11984
----
commit ab67d33787e568245c9e2ab30e51b471f21fa2ed
Author: Kazuaki Ishizaki <[email protected]>
Date: 2016-03-27T04:15:06Z
make code size of hasNext() smaller by preparing get*Acceessor() methods
group a lot of calls into a method
commit fea2a524bbd5b1d0d285e02e6eda590d1f7d67e3
Author: Kazuaki Ishizaki <[email protected]>
Date: 2016-03-27T04:15:38Z
add test case
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]