HyukjinKwon commented on a change in pull request #27888: [SPARK-31116][SQL]
Consider case sensitivity in ParquetRowConverter
URL: https://github.com/apache/spark/pull/27888#discussion_r391987192
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala
##########
@@ -122,12 +122,18 @@ class ParquetReadSupport(val convertTz: Option[ZoneId],
keyValueMetaData: JMap[String, String],
fileSchema: MessageType,
readContext: ReadContext): RecordMaterializer[InternalRow] = {
+ val caseSensitive = conf.getBoolean(SQLConf.CASE_SENSITIVE.key,
+ SQLConf.CASE_SENSITIVE.defaultValue.get)
+ val schemaPruningEnabled =
conf.getBoolean(SQLConf.NESTED_SCHEMA_PRUNING_ENABLED.key,
+ SQLConf.NESTED_SCHEMA_PRUNING_ENABLED.defaultValue.get)
val parquetRequestedSchema = readContext.getRequestedSchema
new ParquetRecordMaterializer(
parquetRequestedSchema,
ParquetReadSupport.expandUDT(catalystRequestedSchema),
new ParquetToSparkSchemaConverter(conf),
- convertTz)
+ convertTz,
+ caseSensitive,
+ schemaPruningEnabled)
Review comment:
It should probably pass `schemaPruningEnabled && !enableVectorizedReader`
since vectorized reader path doesn't support nested reading complex types.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]