Github user liancheng commented on a diff in the pull request:
https://github.com/apache/spark/pull/12002#discussion_r57651085
--- Diff:
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala ---
@@ -145,15 +146,15 @@ private[sql] class DefaultSource
(file: PartitionedFile) => {
val conf = broadcastedConf.value.value
- // SPARK-8501: Empty ORC files always have an empty schema stored in
their footer. In this
- // case, `OrcFileOperator.readSchema` returns `None`, and we can
simply return an empty
- // iterator.
+ // SPARK-8501: Empty ORC files always have an empty schema stored in
their footer. In this
+ // case, `OrcFileOperator.readSchema` returns `None`, and we can't
read the underlying file
+ // using the given physical schema. Instead, we simply return an
empty iterator.
val maybePhysicalSchema =
OrcFileOperator.readSchema(Seq(file.filePath), Some(conf))
if (maybePhysicalSchema.isEmpty) {
Iterator.empty
} else {
val physicalSchema = maybePhysicalSchema.get
- OrcRelation.setRequiredColumns(conf, physicalSchema, dataSchema)
+ OrcRelation.setRequiredColumns(conf, physicalSchema,
requiredSchema)
--- End diff --
Note that this `physicalSchema` is NOT the one passed to `buildReader`, but
the one just read from physical data file to be scanned. Otherwise it breaks
existing test cases related to case sensitivity.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]