Github user countmdm commented on the issue:
https://github.com/apache/spark/pull/21811
@kiszk the situation "before" is well understood. In the respective
SPARK-24801 ticket I present a fragment from the analysis of this heap dump by
jxray (www.jxray.com). It shows that ~2.5GB of memory, or 64% of the used heap
size, is wasted by ~40.5 thousand emtpty byte[] arrays in question:
2,597,946K (64.1%): byte[]: 40583 / 100% of empty 2,597,946K (64.1%)
âorg.apache.spark.network.util.ByteArrayWritableChannel.data
âorg.apache.spark.network.sasl.SaslEncryption$EncryptedMessage.byteChannel
âio.netty.channel.ChannelOutboundBuffer$Entry.msg
...
However, we don't, and probably cannot, get the real "after" evidence.
That's because, as I said, I don't know how to reproduce the situation in
house. And I think it's very unlikely that the customer can easily reproduce it
either, let alone accept our patched code and collect the necessary data before
and after the fix. However, I believe this fix is simple and obvious enough,
and thus we can be pretty sure that with it, in the above situation there would
simply be no problematic byte[] arrays anymore, and memory consumption will be
64% smaller.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]