viirya commented on pull request #30210: URL: https://github.com/apache/spark/pull/30210#issuecomment-720716145
> I think it's ok to change the original param to a SQL config for end users. > > ``` > This change may break some query which may work if end users are super careful and know in details and go ahead. > ``` > > +1 for this concern. So how about change the default value to `false`? > I believe this is not the first change that may break some queries. We did some similar. For such changes, we provided some configs so users still can keep with legacy behavior if they want. This change basically follows this approach. This involves correctness and may not be aware by users. Users need to be very careful to avoid the issue. I think we should provide a baseline which is definitely correct, and provide an option (the config) for users to run with correctness risk. > @viirya qq: Do we have the real cases on enabling this config without correctness issues? It would be great to keep updating the document by providing demo cases and specific usage of this config. For outer join or aggregation, I think the risk of correctness is pretty high. `FlatMapGroupsWithState`, I am not sure, but I think it is possible to not emit late rows in the state function, maybe @HeartSaVioR has some real cases? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
