ala commented on PR #38777:
URL: https://github.com/apache/spark/pull/38777#issuecomment-1333637318

   Well, the issue seems to be that  the vectorized reader recognizes the row 
index column as a "missing column" (aka. columns that are not read from the 
file, but instead populated by a higher layer in the reader). Since these are 
normally populated with nulls, it's a problem if the data type is non-nullable. 
https://github.com/apache/spark/blob/0f1c515179e5ed34aca27c51f500c26ca19cc748/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java#L372-L376
   We could tweak this `if` condition to not throw on generate column/row 
index, or use the workaround you put in place already.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to