SauronShepherd opened a new pull request, #49724: URL: https://github.com/apache/spark/pull/49724
### What changes were proposed in this pull request? This PR introduces a new explain mode `off` to disable the generation of physical plan strings. It also modifies the internal attribute `cachedName` of `CachedRDDBuilder` objects. ### Why are the changes needed? Whenever a plan changes (which happens frequently when AQE kicks in), the physical plan's explain is generated as a plain string. This process is highly expensive for large plans. Moreover, these strings are stored in the `ListenerBus` of `SparkContext`, consuming heap memory and potentially leading to OutOfMemory errors. Due to its potential negative impact on Spark applications, this information should be available only on demand for debugging purposes. This PR introduces a new explain mode `off`, which is set as the default to prevent unnecessary string generation. However, explicit explanations of a DataFrame remain accessible even when this mode is active. Additionally, when a `CachedRDDBuilder` object is created without a defined `tableName`, the full string representation of the plan is also computed, only to later extract the first 1024 characters. This is an expensive operation and has been replaced with a more efficient call to `simpleStringWithNodeId` to avoid unnecessary computation. **IMPORTANT NOTE:** This issue is causing an OutOfMemory (OOM) error in certain unit tests within GraphFrames, as reported in [Connected Components gives wrong results](https://github.com/graphframes/graphframes/issues/453). It may also be a contributing factor to the frequent overuse of checkpoints not only in GraphFrames, but also for many Spark users. ### Does this PR introduce _any_ user-facing change? Yes. By default, plan descriptions will no longer be available in the Spark UI. If users require this information, they must explicitly enable it by setting the `spark.sql.ui.explainMode` Spark configuration. ### How was this patch tested? Unit tests from **sql/core** and **sql/catalyst**, along with the test attached to the [SPARK-50992](https://issues.apache.org/jira/browse/SPARK-50992) ticket. ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
