j-esse commented on issue #23556: [SPARK-26626][SQL] Maximum size for 
repeatedly substituted aliases in SQL expressions
URL: https://github.com/apache/spark/pull/23556#issuecomment-475453821
 
 
   @HyukjinKwon  I think there are a few more things:
   The issue doesn't just manifest in CollapseProject, it happens in 
`collectProjectsAndFilters` in `PhysicalOperation`  as well 
(https://github.com/apache/spark/pull/23556/files#diff-820e654df2a5133c0f86c17e2fc5512e),
 even when CollapseProject is excluded.  We're actually investigating another 
instance of this issue, which we think lies in `PushDownPredicate`, we might 
have to make another fix there.
   
   In response to your concerns:
   1. As I said above, we don't want to disable the rule - for most queries, 
the rule will be applied unchanged.  For some queries, the rule will be 
partially applied (some aliases that get overly large will stop being 
substituted).  But CollapseProject will never be fully disabled (except in the 
unlikely case that the original aliases have more than 
`spark.sql.maxRepeatedAliasSize` aliases)
   2. `spark.sql.maxRepeatedAliasSize` really just needs to be a high threshold 
to catch exponential alias expansion, we'd generally never expect anyone to 
need to change the value from the default. We can think about other heuristics 
to detect exponential alias expansion, if you're concerned about having a fixed 
value?
   3. Sorry, I'm not quite sure what you mean here? The issue isn't 
specifically around driver memory OOMs, that's just one resulting effect of the 
explosive alias expansion - other effects include slowness, hangs, etc.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to