[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

sryza Wed, 11 Jun 2014 19:22:24 -0700

Github user sryza commented on the pull request:

    https://github.com/apache/spark/pull/1056#issuecomment-45823306
  
    > Will this add additional overhead to Spark run-time, especially for Spark 
Streaming jobs in which batchDuration is quite short?
    
    It will add overheard, as it's another RPC, but it should be tiny.  The 
overheard isn't affected by the streaming batch duration or number of tasks of 
tasks that run - we just take a snapshot of the metrics for any running tasks 
on the node every 2 seconds.  If tasks are started frequently, that traffic 
will far exceed the heartbeat traffic.
    
    > Some metrics like shuffle write metrics will only be updated before the 
task is finished, so fetch these metrics in every 2 seconds will always get 0.
    
    This patch doesn't rip out the existing metrics reports that accompany task 
completions, so metrics will still end up collected even for tasks that start 
and finish in between heartbeats.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

Reply via email to