Github user superbobry commented on the issue:
https://github.com/apache/spark/pull/19992
Minor update: I've simulated #18162 on one of our 80G event logs and
(unless there is a bug in the filtering code) the log shrank to 157M. The
effect of this patch was almost negligible, it brought the size down to 155M.
It is unclear for now if this generalizes to other workloads.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]