j-esse commented on issue #23556: [SPARK-26626][SQL] Maximum size for repeatedly substituted aliases in SQL expressions URL: https://github.com/apache/spark/pull/23556#issuecomment-475453821 @HyukjinKwon I think there are a few more things: The issue doesn't just manifest in CollapseProject, it happens in `collectProjectsAndFilters` in `PhysicalOperation` as well (https://github.com/apache/spark/pull/23556/files#diff-820e654df2a5133c0f86c17e2fc5512e), even when CollapseProject is excluded. We're actually investigating another instance of this issue, which we think lies in `PushDownPredicate`, we might have to make another fix there. In response to your concerns: 1. As I said above, we don't want to disable the rule - for most queries, the rule will be applied unchanged. For some queries, the rule will be partially applied (some aliases that get overly large will stop being substituted). But CollapseProject will never be fully disabled (except in the unlikely case that the original aliases have more than `spark.sql.maxRepeatedAliasSize` aliases) 2. `spark.sql.maxRepeatedAliasSize` really just needs to be a high threshold to catch exponential alias expansion, we'd generally never expect anyone to need to change the value from the default. We can think about other heuristics to detect exponential alias expansion, if you're concerned about having a fixed value? 3. Sorry, I'm not quite sure what you mean here? The issue isn't specifically around driver memory OOMs, that's just one resulting effect of the explosive alias expansion - other effects include slowness, hangs, etc.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
