[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31002: [SPARK-33978][SQL] Support ZSTD compression in ORC data source

GitBox Sun, 03 Jan 2021 23:08:09 -0800


dongjoon-hyun commented on a change in pull request #31002:
URL: https://github.com/apache/spark/pull/31002#discussion_r551146496




##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcSourceSuite.scala
##########
@@ -594,4 +594,12 @@ class OrcSourceSuite extends OrcSuite with 
SharedSparkSession {
     val df = 
readResourceOrcFile("test-data/TestStringDictionary.testRowIndex.orc")
     assert(df.where("str < 'row 001000'").count() === 1000)
   }
+
+  test("SPARK-33978: Write and read a file with ZSTD compression") {
+    withTempPath { dir =>
+      val path = dir.getAbsolutePath
+      spark.range(3).write.option("compression", "zstd").orc(path)
+      checkAnswer(spark.read.orc(path), Seq(Row(0), Row(1), Row(2)))

Review comment:
       I avoided it due to the two reasons.
   1. `OrcFileOperator` is in `hive` module and this is `sql` module.
   2. The ORC native data source file names do not end with `.zstd.orc` postfix.
   ```
   $ ls -al /tmp/zstd/*.orc | head -n1
   -rw-r--r--  1 dongjoon  wheel  109 Jan  3 22:41 
/tmp/zstd/part-00000-a63d9a17-456f-42d3-87a1-d922112ed28c-c000.orc
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #31002: [SPARK-33978][SQL] Support ZSTD compression in ORC data source

Reply via email to