Hi Jacob, This post might give you a brief idea about the ports being used
https://groups.google.com/forum/#!topic/spark-users/PN0WoJiB0TA On Fri, Apr 25, 2014 at 8:53 PM, Jacob Eisinger <jeis...@us.ibm.com> wrote: > Howdy, > > We tried running Spark 0.9.1 stand-alone inside docker containers > distributed over multiple hosts. This is complicated due to Spark opening > up ephemeral / dynamic ports for the workers and the CLI. To ensure our > docker solution doesn't break Spark in unexpected ways and maintains a > secure cluster, I am interested in understanding more about Spark's network > architecture. I'd appreciate it if you could you point us to any > documentation! > > A couple specific questions: > > 1. What are these ports being used for? > Checking out the code / experiments, it looks like asynchronous > communication for shuffling around results. Anything else? > 2. How do you secure the network? > Network administrators tend to secure and monitor the network at the > port level. If these ports are dynamic and open randomly, firewalls are not > easily configured and security alarms are raised. Is there a way to limit > the range easily? (We did investigate setting the kernel parameter > ip_local_reserved_ports, but this is broken [1] on some versions of Linux's > cgroups.) > > > Thanks, > Jacob > > [1] https://github.com/lxc/lxc/issues/97 > > Jacob D. Eisinger > IBM Emerging Technologies > jeis...@us.ibm.com - (512) 286-6075