[GitHub] [spark] yaooqinn commented on a change in pull request #33888: [SPARK-36634][SQL] Support access and read parquet file by column ordinal

GitBox Fri, 03 Sep 2021 10:09:34 -0700


yaooqinn commented on a change in pull request #33888:
URL: https://github.com/apache/spark/pull/33888#discussion_r702049276




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
##########
@@ -200,19 +200,28 @@ private[parquet] class ParquetRowConverter(
 
   // Converters for each field.
   private[this] val fieldConverters: Array[Converter with 
HasParentContainerUpdater] = {
-    // (SPARK-31116) Use case insensitive map if spark.sql.caseSensitive is 
false
-    // to prevent throwing IllegalArgumentException when searching catalyst 
type's field index
-    val catalystFieldNameToIndex = if (SQLConf.get.caseSensitiveAnalysis) {
-      catalystType.fieldNames.zipWithIndex.toMap
+    if (SQLConf.get.parquetAccessByIndex) {
+      // SPARK-36634: When access parquet file by the idx of columns, we can 
not ensure 2 types
+      // matched
+      parquetType.getFields.asScala.zip(catalystType).zipWithIndex.map {

Review comment:
       I have added test cases for this, for name or idx mapping the output 
sets for the exceed part result in `nulls`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] yaooqinn commented on a change in pull request #33888: [SPARK-36634][SQL] Support access and read parquet file by column ordinal

Reply via email to