ZiyaZa commented on PR #52557: URL: https://github.com/apache/spark/pull/52557#issuecomment-3416301451
@gengliangwang > I am not a big fan of such behavior changes. This is a behavior change that is required to fix a correctness issue. The issue is described in more detail in the linked [JIRA ticket](https://issues.apache.org/jira/browse/SPARK-53535). According to the comment https://github.com/apache/spark/pull/52557#issuecomment-3387926488 above, we could also get NullPointerException if the struct is marked as nullable, because we would wrongly assume all struct values to be null previously. > why just picking one arbitrary field, instead of setting all the fields null? We need to understand **for each row** if the struct value is null or it is a struct with all the fields as null (to explain in JSON notation, `null` and `{ 'a': null }` are not the same thing). The only way we can understand this is by looking at a child field of a struct that is present in the file, because Parquet stores nullability information in the definition levels of leaf columns. Based on that definition levels, we can identify in which rows struct is null or non-null. > Could you provide more details on this one? Updated the description. > For breaking changes, usually we should introduce a SQL configuration to control the new/legacy behaviors Added a flag to control this behavior. > also we should update https://spark.apache.org/docs/latest/sql-migration-guide.html Can you please explain how we update that? It looks like that is built from the `sql-migration-guide.md` file, but it currently does not contain anything for the next release if I read it correctly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
