Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
@arunmahadevan
I didn't add the metric to StateOperatorProgress cause this behavior is
specific to HDFSBackedStateStoreProvider (though this is only one
implementation available in Apache Spark) so not sure this metric can be
treated as a general one. (@tdas what do you think about this?)
Btw, the cache is going to clean up when maintenance operation is in
progress, so there could be more than 100 versions in map. Not sure why it
shows 150x, but I couldn't find missing spot on the patch. Maybe the issue is
from SizeEstimator.estimate()?
One thing we need to check is how SizeEstimator.estimate() calculate the
memory usage when Unsafe row objects are shared across versions. If
SizeEstimator adds the size of object whenever it is referenced, it will report
much higher memory usage than actual.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]