Github user ash211 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1777#discussion_r15796591
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -1331,4 +1331,49 @@ private[spark] object Utils extends Logging {
           .map { case (k, v) => s"-D$k=$v" }
       }
     
    +  /**
    +   * Attempt to start a service on the given port, or fail after a number 
of attempts.
    +   * Each subsequent attempt uses 1 + the port used in the previous 
attempt.
    +   *
    +   * @param startPort The initial port to start the service on.
    +   * @param maxRetries Maximum number of retries to attempt.
    +   *                   A value of 3 means attempting ports n, n+1, n+2, 
and n+3, for example.
    +   * @param startService Function to start service on a given port.
    +   *                     This is expected to throw java.net.BindException 
on port collision.
    +   * @throws SparkException When unable to start the service after a given 
number of attempts
    +   */
    +  def startServiceOnPort[T](
    +      startPort: Int,
    +      startService: Int => (T, Int),
    +      serviceName: String = "",
    +      maxRetries: Int = 3): (T, Int) = {
    --- End diff --
    
    Part of me is worried that the maxRetries parameter here is an effective 
cap on the number of concurrent shells/executors/drivers that can be run on one 
machine at once when in restricted firewall mode.  Because in Standalone mode 
each app gets its own set of executors across the cluster, this is a cap on the 
number of concurrent applications on a cluster.
    
    What do you think of creating a config option, say spark.ports.maxRetries 
that can be set to change this?  I might change the default to be a bit higher, 
like 16 or so.
    
    The way I'd expect network teams to run this then, is set 
spark.ports.maxRetries to say 20, and then open up a range of size 20 starting 
from each of the relevant ports in the config list you pasted in the summary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to