ala commented on PR #38777:
URL: https://github.com/apache/spark/pull/38777#issuecomment-1330730907

   Sorry, I was on PTO/sick for a couple of days. 
   
   My idea was not to include the row index in `_metadata` for formats that 
cannot generate it. While we have _many_ Parquet readers, I believe all of them 
support row index generation right now, so that's not a problem.
   
   For me the main concern might be that we might want to keep growing the 
`_metadata` struct (most recently: row id, cc @tomvanbussel, 
@juliuszsompolski). These new fields might not be immediately supported in all 
the readers (or if they are Databricks-internal, we might not want to support 
them in OSS readers at all). So while I think it's OK to make row index 
non-nullable, it would be an issue if we wanted to require that all `_metadata` 
fields are non-nullable in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to