cloud-fan commented on pull request #35425:
URL: https://github.com/apache/spark/pull/35425#issuecomment-1035975940
I don't think AQE makes a big difference here. The normal broadcast can also
hit this problem if there are many concurrent queries. The broadcast timeout is
not a proper
cloud-fan commented on pull request #35425:
URL: https://github.com/apache/spark/pull/35425#issuecomment-1035950763
> it keeps the disable timeout for broadcast stages that is converted from
shuffle in AQE.
This is not sufficient. The timeout can never be accurate because the
cloud-fan commented on pull request #35425:
URL: https://github.com/apache/spark/pull/35425#issuecomment-1034427438
This reverts
[SPARK-36414](https://issues.apache.org/jira/browse/SPARK-36414), right? If a
query has many broadcast stages (not converted from shuffle), and the broadcast