HeartSaVioR commented on a change in pull request #35552:
URL: https://github.com/apache/spark/pull/35552#discussion_r809738879
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
##########
@@ -287,6 +287,9 @@ abstract class StreamExecution(
// Disable cost-based join optimization as we do not want stateful
operations
// to be rearranged
sparkSessionForStream.conf.set(SQLConf.CBO_ENABLED.key, "false")
+ // Disable any config affecting the required child distribution of
stateful operators.
+ // Please read through the NOTE on the classdoc of
HashClusteredDistribution for details.
+
sparkSessionForStream.conf.set(SQLConf.REQUIRE_ALL_CLUSTER_KEYS_FOR_AGGREGATE.key,
"false")
Review comment:
This is super important. The new config should never be set to true
before we fix the fundamental problem with considering backward compatibility,
since stateful operator would follow the changed output partitioning as well.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]