viirya commented on pull request #30210:
URL: https://github.com/apache/spark/pull/30210#issuecomment-720716145


   > I think it's ok to change the original param to a SQL config for end users.
   > 
   > ```
   > This change may break some query which may work if end users are super 
careful and know in details and go ahead.
   > ```
   > 
   > +1 for this concern. So how about change the default value to `false`?
   > 
   
   I believe this is not the first change that may break some queries. We did 
some similar. For such changes, we provided some configs so users still can 
keep with legacy behavior if they want. This change basically follows this 
approach.
   
   This involves correctness and may not be aware by users. Users need to be 
very careful to avoid the issue. I think we should provide a baseline which is 
definitely correct, and provide an option (the config) for users to run with 
correctness risk.
   
   > @viirya qq: Do we have the real cases on enabling this config without 
correctness issues? It would be great to keep updating the document by 
providing demo cases and specific usage of this config.
   
   For outer join or aggregation, I think the risk of correctness is pretty 
high. `FlatMapGroupsWithState`, I am not sure, but I think it is possible to 
not emit late rows in the state function, maybe @HeartSaVioR has some real 
cases?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to