Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/19720
No, a query with a `coalesce` with many/complex parameters will hit this
problem. A query with a lot of small `coalesce` will not have the problem.
For `AtLeastNNonNulls ` the fix would be safe to be backported, because no
class variables are defined, but for `coalesce` it is safer to fix it only with
SPARK-18016. In particular, the ongoing PR will solve the issue.
The same is true also for all the other similar PRs.
Maybe what we can do to backport this to branch-2.2 is to do the splitting
and define class level variables only after a threshold of parameter is met,
otherwise we go on with the previous code generation (without splitting). In
this way we don't have any regression.
Or maybe we can backport to 2.2 only those fix which are not introducing
class level variables, like for `AtLeastNNonNulls`.
Actually I think that the most important of all of these fixes is
`AtLeastNNonNulls` indeed, because it is used to drop rows containing all nulls
and this fails with dataset with a lot of columns before this PR. All the other
functions are less likely to have a huge amount of parameters, despite this may
happen and we should support it.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]