[GitHub] spark pull request: [SPARK-8735] [SQL] Expose memory usage for shu...

squito Mon, 03 Aug 2015 11:40:27 -0700

Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/7770#issuecomment-127365342
  
    Accumulators handle task failures, but have super confusing semantics 
around recomputation from shared lineage, speculative execution, and stage 
retries.  (eg., I don't understand the current logic for clearing these new 
values).
    
    Making this an internal accumulator is certainly "safe" in that its not 
technically exposed at all, so we could change it without breaking any contract 
at all.  I just meant that we seem to be headed to a state where (a) any user 
SparkListener won't have any access to this, other than just hardcoding a check 
for `"peakExecutionMemory"` into their code, which could break at any time.  So 
a user spark listener "shouldn't" do that, but we do expose other UI info that 
way so its weird for this one tiny bit to be hidden.  And I'm saying that in 
general, we seem to be headed to a future where a user SparkListener will get 
some set of metrics from the strongly typed and developer api TaskMetrics, and 
then they'll just snoop our internal accumulators for other stuff.
    
    In any case, I guess I've said my piece and I'm not convincing anybody.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-8735] [SQL] Expose memory usage for shu...

Reply via email to