cxzl25 commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r901783862
##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with
SharedSparkSession {
}
}
}
+
+ test("SPARK-39387: BytesColumnVector should not throw RuntimeException due
to overflow") {
Review Comment:
I tested it with JDK11 locally and it can run successfully.
```bash
setjdk 1.11
build/mvn -Dtest=none
-DwildcardSuites=org.apache.spark.sql.execution.datasources.orc.OrcV1QuerySuite
test
```

I saw the GA error because the writing process encountered OOM, which should
have nothing to do with JDK11.
```java
2022-06-16T14:30:19.8285352Z Caused by: java.lang.OutOfMemoryError: Java
heap space
2022-06-16T14:30:19.8285963Z at
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.allocateBuffer(BytesColumnVector.java:300)
2022-06-16T14:30:19.8286885Z at
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.ensureValPreallocated(BytesColumnVector.java:218)
2022-06-16T14:30:19.8287675Z at
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:182)
2022-06-16T14:30:19.8288377Z at
org.apache.orc.mapred.OrcMapredRecordWriter.setBinaryValue(OrcMapredRecordWriter.java:87)
2022-06-16T14:30:19.8289257Z at
org.apache.orc.mapred.OrcMapredRecordWriter.setColumn(OrcMapredRecordWriter.java:235)
2022-06-16T14:30:19.8289956Z at
org.apache.orc.mapred.OrcMapredRecordWriter.setStructValue(OrcMapredRecordWriter.java:133)
2022-06-16T14:30:19.8290654Z at
org.apache.orc.mapred.OrcMapredRecordWriter.setColumn(OrcMapredRecordWriter.java:248)
2022-06-16T14:30:19.8291438Z at
org.apache.orc.mapred.OrcMapredRecordWriter.setListValue(OrcMapredRecordWriter.java:162)
2022-06-16T14:30:19.8292127Z at
org.apache.orc.mapred.OrcMapredRecordWriter.setColumn(OrcMapredRecordWriter.java:256)
2022-06-16T14:30:19.8292824Z at
org.apache.orc.mapreduce.OrcMapreduceRecordWriter.write(OrcMapreduceRecordWriter.java:73)
2022-06-16T14:30:19.8293554Z at
org.apache.spark.sql.execution.datasources.orc.OrcOutputWriter.write(OrcOutputWriter.scala:56)
2022-06-16T14:30:19.8294523Z at
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:175)
```
This test does not seem to be able to compress buffer memory like PR #34284,
it requires a relatively large memory to write to ORC to ensure test coverage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]