[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38397: [SPARK-40918][SQL] Mismatch between FileSourceScanExec and Orc and ParquetFileFormat on producing columnar output

GitBox Thu, 27 Oct 2022 22:26:30 -0700


dongjoon-hyun commented on code in PR #38397:
URL: https://github.com/apache/spark/pull/38397#discussion_r1007641451



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala:
##########
@@ -126,9 +136,24 @@ class OrcFileFormat
 
     val resultSchema = StructType(requiredSchema.fields ++ 
partitionSchema.fields)
     val sqlConf = sparkSession.sessionState.conf
-    val enableVectorizedReader = supportBatch(sparkSession, resultSchema)
     val capacity = sqlConf.orcVectorizedReaderBatchSize
 
+    // Should always be set by FileSourceScanExec creating this.
+    // Check conf before checking option, to allow working around an issue by 
changing conf.
+    val enableVectorizedReader = sqlConf.orcVectorizedReaderEnabled &&
+      options.get(FileFormat.OPTION_RETURNING_BATCH)
+        .getOrElse {
+          throw new IllegalArgumentException(
+            "OPTION_RETURNING_BATCH should always be set for OrcFileFormat." +
+              "To workaround this issue, set 
spark.sql.orc.enableVectorizedReader=false.")

Review Comment:
   Is this a correct recommendation? Why not recommend to set 
`OPTION_RETURNING_BATCH`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #38397: [SPARK-40918][SQL] Mismatch between FileSourceScanExec and Orc and ParquetFileFormat on producing columnar output

Reply via email to