Github user dvogelbacher commented on the issue:
https://github.com/apache/spark/pull/21993
@gatorsmile yes, I found that workaround. Very useful :)
I think it would still be good to handle this better by default. I can see
that introducing such an arbitrary configuration param doesn't seem optimal and
am open to better suggestions.
Not sure if blacklisting case-when statements outright is the right way to
go. That could have negative perf impacts as well? And it wouldn't handle the
case in the unit test where we have exponential growth when adding/subtracting
columns (though that example might be somewhat contrived).
Maybe we should just not collapse if the number of leaf expressions in the
collapsed project is higher than the sum of the number of leaf expressions in
the original statements?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]