[GitHub] [spark] cxzl25 commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

GitBox Mon, 20 Jun 2022 08:16:28 -0700


cxzl25 commented on code in PR #36787:
URL: https://github.com/apache/spark/pull/36787#discussion_r901783862



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala:
##########
@@ -832,6 +832,18 @@ abstract class OrcQuerySuite extends OrcQueryTest with 
SharedSparkSession {
       }
     }
   }
+
+  test("SPARK-39387: BytesColumnVector should not throw RuntimeException due 
to overflow") {

Review Comment:
   I tested it with JDK11 locally and it can run successfully.
   ```bash
   setjdk 1.11
   build/mvn -Dtest=none 
-DwildcardSuites=org.apache.spark.sql.execution.datasources.orc.OrcV1QuerySuite 
test
   ```
   
![image](https://user-images.githubusercontent.com/3898450/174632802-10abcf43-d1df-4b1b-a8ac-097f240338d6.png)
   
   
   
   I saw the GA error because the writing process encountered OOM, which should 
have nothing to do with JDK11.
   ```java
   2022-06-16T14:30:19.8285352Z Caused by: java.lang.OutOfMemoryError: Java 
heap space
   2022-06-16T14:30:19.8285963Z         at 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.allocateBuffer(BytesColumnVector.java:300)
   2022-06-16T14:30:19.8286885Z         at 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.ensureValPreallocated(BytesColumnVector.java:218)
   2022-06-16T14:30:19.8287675Z         at 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:182)
   2022-06-16T14:30:19.8288377Z         at 
org.apache.orc.mapred.OrcMapredRecordWriter.setBinaryValue(OrcMapredRecordWriter.java:87)
   2022-06-16T14:30:19.8289257Z         at 
org.apache.orc.mapred.OrcMapredRecordWriter.setColumn(OrcMapredRecordWriter.java:235)
   2022-06-16T14:30:19.8289956Z         at 
org.apache.orc.mapred.OrcMapredRecordWriter.setStructValue(OrcMapredRecordWriter.java:133)
   2022-06-16T14:30:19.8290654Z         at 
org.apache.orc.mapred.OrcMapredRecordWriter.setColumn(OrcMapredRecordWriter.java:248)
   2022-06-16T14:30:19.8291438Z         at 
org.apache.orc.mapred.OrcMapredRecordWriter.setListValue(OrcMapredRecordWriter.java:162)
   2022-06-16T14:30:19.8292127Z         at 
org.apache.orc.mapred.OrcMapredRecordWriter.setColumn(OrcMapredRecordWriter.java:256)
   2022-06-16T14:30:19.8292824Z         at 
org.apache.orc.mapreduce.OrcMapreduceRecordWriter.write(OrcMapreduceRecordWriter.java:73)
   2022-06-16T14:30:19.8293554Z         at 
org.apache.spark.sql.execution.datasources.orc.OrcOutputWriter.write(OrcOutputWriter.scala:56)
   2022-06-16T14:30:19.8294523Z         at 
org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.write(FileFormatDataWriter.scala:175)
   ```
   
   This test does not seem to be able to compress buffer memory like PR #34284, 
it requires a relatively large memory to write to ORC to ensure test coverage.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cxzl25 commented on a diff in pull request #36787: [SPARK-39387][FOLLOWUP][TESTS] Add a test case for HIVE-25190

Reply via email to