[GitHub] spark pull request: [SPARK-5697] Configurable registration retry i...

pwendell Mon, 16 Feb 2015 15:23:39 -0800

Github user pwendell commented on the pull request:

    https://github.com/apache/spark/pull/4481#issuecomment-74588769
  
    Hey Matt, Sorry I'm still a bit confused. Basically my concern is that 
we're trying to ignore an underlying bug by adding a configuration option, 
which is something we really try not to do.
    
    Is it the case that you regularly have unreliable network connectivity 
inside of your cluster? Spark overall assumes that nodes can establish reliable 
TCP connections to one another. Do you actually have TCP flows that are 
terminated from within the network as a regular occurrence? It's very hard for 
me to imagine a modern hardware cluster where this is the case.
    
    The second explanation you gave was the Akka message queue. Akka in general 
should be able to process thousands of messages per second which is _way_ more 
than anyone would reasonably submit to the standalone cluster manager. It's 
possible that we are in some way blocking inside of our actors in a way that is 
severely limiting throughput. If that is the case, then we should identify and 
fix the bug.
    
    Are you seeing specific akka timeouts or some type of error message that 
could help pin down what is happening? My guess is that there is just something 
buggy about job submission, and ideally we should fix that instead of trying to 
add more knobs to turn to work around it.
    
    If you have a reproduction of this behavior that would actually be the 
best. I.e. a stress test or something that could identify what is going on.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5697] Configurable registration retry i...

Reply via email to