[GitHub] [spark] sigmod edited a comment on pull request #35552: [SPARK-38237][SQL][SS] Introduce a new config to require all cluster keys on Aggregate

GitBox Fri, 18 Feb 2022 12:42:54 -0800


sigmod edited a comment on pull request #35552:
URL: https://github.com/apache/spark/pull/35552#issuecomment-1045149556



   > StatefulOpClusteredDistribution back for state store correctness only,
   > and I don't think it's in the right direction to make it more general
   
   @c21  the same problem also exists in
   join(t1.x = t2.x) followed by window(t1.x, t1.y) or join(t1.x = t3.x and 
t1.y = t3.y)
   
   Note that AQE doesn't have a chance to kick in because there's no shuffle 
between those operators. 
   Thus, I suspect configs with `HashClusteredDistribution` can at least rescue 
such queries from timeout/disk-space-full etc. It's not correctness issue, but 
can also be severe if a user doesn't have a way to workaround.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sigmod edited a comment on pull request #35552: [SPARK-38237][SQL][SS] Introduce a new config to require all cluster keys on Aggregate

Reply via email to