Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19170
> Maybe we are using SHS too aggressively, but the GC issue is one of the
major issues we met.
Can you describe what this issue is? That is not what the bug is showing.
The bug shows a heap dump with a lot of `BlockStatus` objects. I'm saying that
with the new code, you should not get into that situation, because the SHS does
not hold on to those objects. Is that not what you see?
If you see `BlockStatus` objects still being referenced then there is
probably a bug somewhere.
Barring the issue above, this patch to the best of my knowledge would not
help much with GC. The code still loads data from disk for these events (=
creates garbage) and still creates json4s objects for it (= more garbage).
You'd be avoiding a trivial amount of garbage after that by doing this
filtering.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]