Github user mridulm commented on the issue:
https://github.com/apache/spark/pull/17596
My comments were based on a fix in 1.6; actually lot of values were
actually observed to be 0 for a lot of cases - just a few were not (even here
it is relevant - resultSize, gctime, various bytes spilled, etc). The bitmask
actually ends up being a single long for the cardinality of metrics we have -
which typically gets encoded in a byte or two in reality.
In addition, things like input/output/shuffle metrics, accumulator updates,
block updates, etc (present in 1.6 in TaskMetrics) - can all be avoided in the
serialized stream when not present.
When present, I used read/write External on those classes to directly
encode the values.
IIRC this is relevant not just for the final result, but for heartbeats
also - so serde saving helped a lot more than initially expected.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]