[GitHub] [spark] ala commented on pull request #38777: [SPARK-41151][FOLLOW-UP][SQL] Keep built-in file _metadata fields nullable value consistent

GitBox Thu, 01 Dec 2022 03:41:24 -0800


ala commented on PR #38777:
URL: https://github.com/apache/spark/pull/38777#issuecomment-1333637318


   Well, the issue seems to be that  the vectorized reader recognizes the row 
index column as a "missing column" (aka. columns that are not read from the 
file, but instead populated by a higher layer in the reader). Since these are 
normally populated with nulls, it's a problem if the data type is non-nullable. 
https://github.com/apache/spark/blob/0f1c515179e5ed34aca27c51f500c26ca19cc748/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedParquetRecordReader.java#L372-L376
   We could tweak this `if` condition to not throw on generate column/row 
index, or use the workaround you put in place already.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] ala commented on pull request #38777: [SPARK-41151][FOLLOW-UP][SQL] Keep built-in file _metadata fields nullable value consistent

Reply via email to