subject:"spark git commit\: \[SPARK\-23598\]\[SQL\] Make methods in BufferedRowIterator public to avoid runtime error for a large query"

spark git commit: [SPARK-23598][SQL] Make methods in BufferedRowIterator public to avoid runtime error for a large query

2018-03-13 Thread hvanhovell

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 33ba8db8d -> f3efbfa4b


[SPARK-23598][SQL] Make methods in BufferedRowIterator public to avoid runtime 
error for a large query

## What changes were proposed in this pull request?

This PR fixes runtime error regarding a large query when a generated code has 
split classes. The issue is `append()`, `stopEarly()`, and other methods are 
not accessible from split classes that are not subclasses of 
`BufferedRowIterator`.
This PR fixes this issue by making them `public`.

Before applying the PR, we see the following exception by running the attached 
program with `CodeGenerator.GENERATED_CLASS_SIZE_THRESHOLD=-1`.
```
  test("SPARK-23598") {
// When set -1 to CodeGenerator.GENERATED_CLASS_SIZE_THRESHOLD, an 
exception is thrown
val df_pet_age = Seq((8, "bat"), (15, "mouse"), (5, "horse")).toDF("age", 
"name")
df_pet_age.groupBy("name").avg("age").show()
  }
```

Exception:
```
19:40:52.591 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
19:41:32.319 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in 
stage 0.0 (TID 0)
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.shouldStop()Z from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1$agg_NestedClass1
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1$agg_NestedClass1.agg_doAggregateWithKeys$(generated.java:203)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:160)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:616)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
...
```

Generated code (line 195 calles `stopEarly()`).
```
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
/* 004 */
/* 005 */ // codegenStageId=1
/* 006 */ final class GeneratedIteratorForCodegenStage1 extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 007 */   private Object[] references;
/* 008 */   private scala.collection.Iterator[] inputs;
/* 009 */   private boolean agg_initAgg;
/* 010 */   private boolean agg_bufIsNull;
/* 011 */   private double agg_bufValue;
/* 012 */   private boolean agg_bufIsNull1;
/* 013 */   private long agg_bufValue1;
/* 014 */   private agg_FastHashMap agg_fastHashMap;
/* 015 */   private org.apache.spark.unsafe.KVIterator 
agg_fastHashMapIter;
/* 016 */   private org.apache.spark.unsafe.KVIterator agg_mapIter;
/* 017 */   private 
org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap agg_hashMap;
/* 018 */   private org.apache.spark.sql.execution.UnsafeKVExternalSorter 
agg_sorter;
/* 019 */   private scala.collection.Iterator inputadapter_input;
/* 020 */   private boolean agg_agg_isNull11;
/* 021 */   private boolean agg_agg_isNull25;
/* 022 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder[] 
agg_mutableStateArray1 = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder[2];
/* 023 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] 
agg_mutableStateArray2 = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[2];
/* 024 */   private UnsafeRow[] agg_mutableStateArray = new UnsafeRow[2];
/* 025 */
/* 026 */   public GeneratedIteratorForCodegenStage1(Object[] references) {
/* 027 */ this.references = references;
/* 028 */   }
/* 029 */
/* 030 */   public void init(int index, scala.collection.Iterator[] inputs) {
/* 031 */ partitionIndex = index;
/* 032 */ this.inputs = inputs;
/* 033 */
/* 034 */ agg_fastHashMap = new 
agg_FastHashMap(((org.apache.spark.sql.execution.aggregate.HashAggregateExec) 
references[0] /* plan */).getTaskMemoryManager(), 
((org.apache.spark.sql.execution.aggregate.HashAggregateExec) references[0] /* 
plan */).getEmptyAggregationBuffer());
/* 035 */ agg_h

spark git commit: [SPARK-23598][SQL] Make methods in BufferedRowIterator public to avoid runtime error for a large query

2018-03-13 Thread hvanhovell

Repository: spark
Updated Branches:
  refs/heads/master 918fb9bee -> 1098933b0


[SPARK-23598][SQL] Make methods in BufferedRowIterator public to avoid runtime 
error for a large query

## What changes were proposed in this pull request?

This PR fixes runtime error regarding a large query when a generated code has 
split classes. The issue is `append()`, `stopEarly()`, and other methods are 
not accessible from split classes that are not subclasses of 
`BufferedRowIterator`.
This PR fixes this issue by making them `public`.

Before applying the PR, we see the following exception by running the attached 
program with `CodeGenerator.GENERATED_CLASS_SIZE_THRESHOLD=-1`.
```
  test("SPARK-23598") {
// When set -1 to CodeGenerator.GENERATED_CLASS_SIZE_THRESHOLD, an 
exception is thrown
val df_pet_age = Seq((8, "bat"), (15, "mouse"), (5, "horse")).toDF("age", 
"name")
df_pet_age.groupBy("name").avg("age").show()
  }
```

Exception:
```
19:40:52.591 WARN org.apache.hadoop.util.NativeCodeLoader: Unable to load 
native-hadoop library for your platform... using builtin-java classes where 
applicable
19:41:32.319 ERROR org.apache.spark.executor.Executor: Exception in task 0.0 in 
stage 0.0 (TID 0)
java.lang.IllegalAccessError: tried to access method 
org.apache.spark.sql.execution.BufferedRowIterator.shouldStop()Z from class 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1$agg_NestedClass1
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1$agg_NestedClass1.agg_doAggregateWithKeys$(generated.java:203)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(generated.java:160)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$11$$anon$1.hasNext(WholeStageCodegenExec.scala:616)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:125)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
...
```

Generated code (line 195 calles `stopEarly()`).
```
/* 001 */ public Object generate(Object[] references) {
/* 002 */   return new GeneratedIteratorForCodegenStage1(references);
/* 003 */ }
/* 004 */
/* 005 */ // codegenStageId=1
/* 006 */ final class GeneratedIteratorForCodegenStage1 extends 
org.apache.spark.sql.execution.BufferedRowIterator {
/* 007 */   private Object[] references;
/* 008 */   private scala.collection.Iterator[] inputs;
/* 009 */   private boolean agg_initAgg;
/* 010 */   private boolean agg_bufIsNull;
/* 011 */   private double agg_bufValue;
/* 012 */   private boolean agg_bufIsNull1;
/* 013 */   private long agg_bufValue1;
/* 014 */   private agg_FastHashMap agg_fastHashMap;
/* 015 */   private org.apache.spark.unsafe.KVIterator 
agg_fastHashMapIter;
/* 016 */   private org.apache.spark.unsafe.KVIterator agg_mapIter;
/* 017 */   private 
org.apache.spark.sql.execution.UnsafeFixedWidthAggregationMap agg_hashMap;
/* 018 */   private org.apache.spark.sql.execution.UnsafeKVExternalSorter 
agg_sorter;
/* 019 */   private scala.collection.Iterator inputadapter_input;
/* 020 */   private boolean agg_agg_isNull11;
/* 021 */   private boolean agg_agg_isNull25;
/* 022 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder[] 
agg_mutableStateArray1 = new 
org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder[2];
/* 023 */   private 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[] 
agg_mutableStateArray2 = new 
org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter[2];
/* 024 */   private UnsafeRow[] agg_mutableStateArray = new UnsafeRow[2];
/* 025 */
/* 026 */   public GeneratedIteratorForCodegenStage1(Object[] references) {
/* 027 */ this.references = references;
/* 028 */   }
/* 029 */
/* 030 */   public void init(int index, scala.collection.Iterator[] inputs) {
/* 031 */ partitionIndex = index;
/* 032 */ this.inputs = inputs;
/* 033 */
/* 034 */ agg_fastHashMap = new 
agg_FastHashMap(((org.apache.spark.sql.execution.aggregate.HashAggregateExec) 
references[0] /* plan */).getTaskMemoryManager(), 
((org.apache.spark.sql.execution.aggregate.HashAggregateExec) references[0] /* 
plan */).getEmptyAggregationBuffer());
/* 035 */ agg_hashM

spark git commit: [SPARK-23598][SQL] Make methods in BufferedRowIterator public to avoid runtime error for a large query

spark git commit: [SPARK-23598][SQL] Make methods in BufferedRowIterator public to avoid runtime error for a large query

2 matches

Site Navigation

Mail list logo

Footer information