[GitHub] cloud-fan commented on a change in pull request #23387: [SPARK-26447][SQL]Allow OrcColumnarBatchReader to return less partition columns

GitBox Wed, 26 Dec 2018 11:07:41 -0800

cloud-fan commented on a change in pull request #23387: [SPARK-26447][SQL]Allow 
OrcColumnarBatchReader to return less partition columns
URL: https://github.com/apache/spark/pull/23387#discussion_r244036202


 ##########
 File path: 
sql/core/src/main/java/org/apache/spark/sql/execution/datasources/orc/OrcColumnarBatchReader.java
 ##########
 @@ -58,10 +58,16 @@
 
   /**
    * The column IDs of the physical ORC file schema which are required by this 
reader.
-   * -1 means this required column doesn't exist in the ORC file.
+   * -1 means this required column is partition column, or it doesn't exist in 
the ORC file.
 
 Review comment:
   I think we need more comments here.
   
   Ideally partition column should never appear in the physical file, and 
should only appear in the directory name. However, Spark is OK with partition 
columns inside physical file, but Spark will discard the values from the file, 
and use the partition value got from directory name.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] cloud-fan commented on a change in pull request #23387: [SPARK-26447][SQL]Allow OrcColumnarBatchReader to return less partition columns

Reply via email to