[GitHub] spark pull request: [SPARK-14206][SQL] buildReader() implementatio...

liancheng Mon, 28 Mar 2016 16:27:07 -0700

Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12002#discussion_r57651085
  
    --- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/orc/OrcRelation.scala ---
    @@ -145,15 +146,15 @@ private[sql] class DefaultSource
         (file: PartitionedFile) => {
           val conf = broadcastedConf.value.value
     
    -      // SPARK-8501: Empty ORC files always have an empty schema stored in 
their footer.  In this
    -      // case, `OrcFileOperator.readSchema` returns `None`, and we can 
simply return an empty
    -      // iterator.
    +      // SPARK-8501: Empty ORC files always have an empty schema stored in 
their footer. In this
    +      // case, `OrcFileOperator.readSchema` returns `None`, and we can't 
read the underlying file
    +      // using the given physical schema. Instead, we simply return an 
empty iterator.
           val maybePhysicalSchema = 
OrcFileOperator.readSchema(Seq(file.filePath), Some(conf))
           if (maybePhysicalSchema.isEmpty) {
             Iterator.empty
           } else {
             val physicalSchema = maybePhysicalSchema.get
    -        OrcRelation.setRequiredColumns(conf, physicalSchema, dataSchema)
    +        OrcRelation.setRequiredColumns(conf, physicalSchema, 
requiredSchema)
    --- End diff --
    
    Note that this `physicalSchema` is NOT the one passed to `buildReader`, but 
the one just read from physical data file to be scanned. Otherwise it breaks 
existing test cases related to case sensitivity.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-14206][SQL] buildReader() implementatio...

Reply via email to