Github user dongjoon-hyun commented on a diff in the pull request:
https://github.com/apache/spark/pull/20511#discussion_r168833177
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
---
@@ -160,6 +160,15 @@ abstract class OrcSuite extends OrcTest with
BeforeAndAfterAll {
}
}
}
+
+ test("SPARK-23340 Empty float/double array columns raise EOFException") {
+ Seq(Seq(Array.empty[Float]).toDF(),
Seq(Array.empty[Double]).toDF()).foreach { df =>
+ withTempPath { path =>
--- End diff --
Ur, I think you are still confused two things.
First of all, we have five ORC readers. We didn't check `ORC MR reader` and
`ORC Vectorized Copy` explicitly. We usually test 1, 2, and 3.
1. Hive Serde
2. Hive OrcFileFormat
3. Apache ORC Vectorized Wrapper
4. Apache ORC Vectorized Copy
5. Apache ORC MR
In this PR, we already adds 1, 2, 3. 3 is the vectorized wrapper reader.
1. Hive Serde : `HiveOrcQuerySuite.test(SPARK-23340 Empty float/double
array columns raise EOFException)`
2. Hive OrcFileFormat: `OrcSourceSuite` <= `OrcSuite.test("SPARK-23340
Empty float/double array columns raise EOFException")`
3. Apache ORC Vectorized Wrapper: `HiveOrcSourceSuite` <=
`OrcSuite.test("SPARK-23340 Empty float/double array columns raise
EOFException")`
Second, this test schema includes complex types. So, 3 (vectorized wrapper
reader) configuration is also going to fall-back ORC MR Reader path. In other
words, case 5. Please note that Apache Spark support vectorization for `Atomic`
types only in both Parquet and ORC.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]