subject:"\[GitHub\] \[spark\] ala commented on pull request #38777\: \[SPARK\-41151\]\[FOLLOW\-UP\]\[SQL\] Keep built\-in file _metadata fields nullable value consistent"

[GitHub] [spark] ala commented on pull request #38777: [SPARK-41151][FOLLOW-UP][SQL] Keep built-in file _metadata fields nullable value consistent

2022-12-01 Thread GitBox

ala commented on PR #38777: URL: https://github.com/apache/spark/pull/38777#issuecomment-1333637318 Well, the issue seems to be that the vectorized reader recognizes the row index column as a "missing column" (aka. columns that are not read from the file, but instead populated by a higher

[GitHub] [spark] ala commented on pull request #38777: [SPARK-41151][FOLLOW-UP][SQL] Keep built-in file _metadata fields nullable value consistent

2022-11-29 Thread GitBox

ala commented on PR #38777: URL: https://github.com/apache/spark/pull/38777#issuecomment-1330730907 Sorry, I was on PTO/sick for a couple of days. My idea was not to include the row index in `_metadata` for formats that cannot generate it. While we have _many_ Parquet readers, I