[GitHub] [spark] LuciferYang commented on a diff in pull request #37214: [SPARK-39806][SQL] Accessing `_metadata` on partitioned table can crash a query

GitBox Mon, 18 Jul 2022 04:10:59 -0700


LuciferYang commented on code in PR #37214:
URL: https://github.com/apache/spark/pull/37214#discussion_r923241837



##########
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/FileMetadataStructSuite.scala:
##########
@@ -564,4 +566,28 @@ class FileMetadataStructSuite extends QueryTest with 
SharedSparkSession {
       )
     }
   }
+
+  Seq(true, false).foreach { useVectorizedReader =>
+    val label = if (useVectorizedReader) "reading batches" else "reading rows"
+    test(s"SPARK-39806: metadata for a partitioned table ($label)") {
+      withSQLConf(SQLConf.PARQUET_VECTORIZED_READER_ENABLED.key -> 
useVectorizedReader.toString) {
+        withTempPath { dir =>
+          // Store dynamically partitioned data.
+          Seq(1 -> 1).toDF("a", "b").write.format("parquet").partitionBy("b")
+            .save(dir.getAbsolutePath)
+
+          // Identify the data file and its metadata.
+          // We expect there to be exactly one subdirectory containing exactly 
one parquet file.
+          val subdirectory = dir.listFiles().filter(_.isDirectory).head

Review Comment:
   If the number of files is not too large, maybe we can change to use 
   ```
   val file = TestUtils.listDirectory(dir).head
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LuciferYang commented on a diff in pull request #37214: [SPARK-39806][SQL] Accessing `_metadata` on partitioned table can crash a query

Reply via email to