Yaohua628 commented on PR #38777: URL: https://github.com/apache/spark/pull/38777#issuecomment-1325752236
@cloud-fan @dongjoon-hyun @HeartSaVioR Sorry for the back and forth. [The previous PR](https://github.com/apache/spark/pull/38683), we changed the `_metadata` to not null. And I just realized we probably should make all fields inside of the `_metadata` (`file_path`, `file_name`, `file_modification_time`, `file_size`, `row_index`) not null as well for consistency. Please let me know WDYT. As @cloud-fan mentioned, it should be fine to write not-null data into a nullable column. But my only concern is this change might break the existing stateful streaming schema compatibility check? Also, cc @ala to confirm `row_index` will always be not null for supported file formats (e.g. Parquet) Thanks for all your help! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
