Github user squito commented on the pull request:
https://github.com/apache/spark/pull/7770#issuecomment-127365342
Accumulators handle task failures, but have super confusing semantics
around recomputation from shared lineage, speculative execution, and stage
retries. (eg., I don't understand the current logic for clearing these new
values).
Making this an internal accumulator is certainly "safe" in that its not
technically exposed at all, so we could change it without breaking any contract
at all. I just meant that we seem to be headed to a state where (a) any user
SparkListener won't have any access to this, other than just hardcoding a check
for `"peakExecutionMemory"` into their code, which could break at any time. So
a user spark listener "shouldn't" do that, but we do expose other UI info that
way so its weird for this one tiny bit to be hidden. And I'm saying that in
general, we seem to be headed to a future where a user SparkListener will get
some set of metrics from the strongly typed and developer api TaskMetrics, and
then they'll just snoop our internal accumulators for other stuff.
In any case, I guess I've said my piece and I'm not convincing anybody.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]