[GitHub] [spark] HeartSaVioR commented on a change in pull request #35552: [SPARK-38237][SQL][SS] Introduce a new config to require all cluster keys on Aggregate

GitBox Thu, 17 Feb 2022 23:34:33 -0800


HeartSaVioR commented on a change in pull request #35552:
URL: https://github.com/apache/spark/pull/35552#discussion_r809738879




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
##########
@@ -287,6 +287,9 @@ abstract class StreamExecution(
         // Disable cost-based join optimization as we do not want stateful 
operations
         // to be rearranged
         sparkSessionForStream.conf.set(SQLConf.CBO_ENABLED.key, "false")
+        // Disable any config affecting the required child distribution of 
stateful operators.
+        // Please read through the NOTE on the classdoc of 
HashClusteredDistribution for details.
+        
sparkSessionForStream.conf.set(SQLConf.REQUIRE_ALL_CLUSTER_KEYS_FOR_AGGREGATE.key,
 "false")

Review comment:
       This is super important. The new config should never be set to true 
before we fix the fundamental problem with considering backward compatibility, 
since stateful operator would follow the changed output partitioning as well.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a change in pull request #35552: [SPARK-38237][SQL][SS] Introduce a new config to require all cluster keys on Aggregate

Reply via email to