Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19229
Yeah, I think that fix should work for the strategy `Imputer.mean` because
`Imputer.mean` aggregates many columns at once now and that can be a too large
gen'd code for aggregation.
For the strategy `Imputer.median`, because it uses `approxQuantile` which
calls rdd's aggregate API, I think codegen doesn't affect this part.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]