ala commented on PR #38777: URL: https://github.com/apache/spark/pull/38777#issuecomment-1330730907
Sorry, I was on PTO/sick for a couple of days. My idea was not to include the row index in `_metadata` for formats that cannot generate it. While we have _many_ Parquet readers, I believe all of them support row index generation right now, so that's not a problem. For me the main concern might be that we might want to keep growing the `_metadata` struct (most recently: row id, cc @tomvanbussel, @juliuszsompolski). These new fields might not be immediately supported in all the readers (or if they are Databricks-internal, we might not want to support them in OSS readers at all). So while I think it's OK to make row index non-nullable, it would be an issue if we wanted to require that all `_metadata` fields are non-nullable in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
