SaurabhChawla100 edited a comment on pull request #29045: URL: https://github.com/apache/spark/pull/29045#issuecomment-657978027
> Can you be more specific about the problem? Are you saying that the actual file schema doesn't match the table schema specified by the user? So in case of orc data created by the hive no field names in the physical schema. Please find the below code for reference. https://github.com/apache/spark/blob/24be81689cee76e03cd5136dfd089123bbff4595/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcUtils.scala#L133 So from this code we are sending the index of the col from the dataschema. But Where as in the below code , we are passing the input result schema and that result schema will not have that index number that is passed from OrcUtils.scala https://github.com/apache/spark/blob/24be81689cee76e03cd5136dfd089123bbff4595/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/orc/OrcFileFormat.scala#L211 For example - ``` val u = """select date_dim. d_year from date_dim limit 5""" spark.sql(u).collect ``` Here the value of index(d_year returned by the OrcUtils.scala#L133 is 6 where the resultSchema passed in OrcFileFormat.scala#L211 is having only one struct<`d_year`:int> So now on using the index value 6 in the resultSchema schema which is having size 1 is giving the exception ``` org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 2, 192.168.0.103, executor driver): java.lang.ArrayIndexOutOfBoundsException: 6 at org.apache.spark.sql.execution.datasources.orc.OrcColumnarBatchReader.initBatch(OrcColumnarBatchReader.java:156) at org.apache.spark.sql.execution.datasources.orc.OrcFileFormat.$anonfun$buildReaderWithPartitionValues$7(OrcFileFormat.scala:258) ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org