ZiyaZa commented on PR #52557:
URL: https://github.com/apache/spark/pull/52557#issuecomment-3416301451

   @gengliangwang 
   > I am not a big fan of such behavior changes. 
   
   This is a behavior change that is required to fix a correctness issue. The 
issue is described in more detail in the linked [JIRA 
ticket](https://issues.apache.org/jira/browse/SPARK-53535). According to the 
comment https://github.com/apache/spark/pull/52557#issuecomment-3387926488 
above, we could also get NullPointerException if the struct is marked as 
nullable, because we would wrongly assume all struct values to be null 
previously.
   
   > why just picking one arbitrary field, instead of setting all the fields 
null?
   
   We need to understand **for each row** if the struct value is null or it is 
a struct with all the fields as null (to explain in JSON notation, `null` and 
`{ 'a': null }` are not the same thing). The only way we can understand this is 
by looking at a child field of a struct that is present in the file, because 
Parquet stores nullability information in the definition levels of leaf 
columns. Based on that definition levels, we can identify in which rows struct 
is null or non-null.
   
   > Could you provide more details on this one?
   
   Updated the description.
   
   > For breaking changes, usually we should introduce a SQL configuration to 
control the new/legacy behaviors
   
   Added a flag to control this behavior.
   
   > also we should update 
https://spark.apache.org/docs/latest/sql-migration-guide.html
   
   Can you please explain how we update that? It looks like that is built from 
the `sql-migration-guide.md` file, but it currently does not contain anything 
for the next release if I read it correctly.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to