sigmod edited a comment on pull request #35552: URL: https://github.com/apache/spark/pull/35552#issuecomment-1045149556
> StatefulOpClusteredDistribution back for state store correctness only, > and I don't think it's in the right direction to make it more general @c21 the same problem also exists in join(t1.x = t2.x) followed by window(t1.x, t1.y) or join(t1.x = t3.x and t1.y = t3.y) Note that AQE doesn't have a chance to kick in because there's no shuffle between those operators. Thus, I suspect configs with `HashClusteredDistribution` can at least rescue such queries from timeout/disk-space-full etc. It's not correctness issue, but can also be severe if a user doesn't have a way to workaround. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
