Found the issue in JIRA: https://issues.apache.org/jira/browse/SPARK-4389?jql=project%20%3D%20SPARK%20AND%20text%20~%20NAT
On Tue, Jan 6, 2015 at 10:45 AM, Aaron <aarongm...@gmail.com> wrote: > From what I can tell, this isn't a "firewall" issue per se..it's how the > Remoting Service "binds" to an IP given cmd line parameters. So, if I have > a VM (or OpenStack or EC2 instance) running on a private network let's say, > where the IP address is 192.168.X.Y...I can't tell the Workers to "reach me > on this IP." Because the Remoting Service binds to the interface passed in > those parameters. > > So, if my "public" IP is a routable IP address...but the one the VM sees > is the 192.168.X.Y address..it appears I can't do some kinda of port > forwarding from the external to the internal...is this correct? > > If I set spark.driver.host and spark.driver.port properties at the command > line..it tries to actually bind to that IP..rather than just telling the > worker..reach back to this IP. Is there a way around this? Is there a way > to tell the workers which IP address to use..WITHOUT, binding to it maybe? > Maybe allow the Remoting Service to bind to the internal IP..but, advertise > it differently? > > > > On Mon, Jan 5, 2015 at 9:02 AM, Aaron <aarongm...@gmail.com> wrote: > >> Thanks for the link! However, from reviewing the thread, it appears you >> cannot have a NAT/firewall between the cluster and the >> spark-driver/shell..is this correct? >> >> When the shell starts up, it binds to the internal IP (e.g. >> 192.168.x.y)..not the external floating IP..which is routable from the >> cluster. >> >> When i did set a static port for the spark.driver.port and set the >> spark.driver.host to the floating IP address...I get the same exception, >> (Caused >> by: java.net.BindException: Cannot assign requested address: bind), because >> of the use of the InetAddress.getHostAddress method call. >> >> >> Cheers, >> Aaron >> >> >> On Mon, Jan 5, 2015 at 8:28 AM, Akhil Das <ak...@sigmoidanalytics.com> >> wrote: >> >>> You can have a look at this discussion >>> http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-job-on-Unix-cluster-from-dev-environment-Windows-td16989.html >>> >>> Thanks >>> Best Regards >>> >>> On Mon, Jan 5, 2015 at 6:11 PM, Aaron <aarongm...@gmail.com> wrote: >>> >>>> Hello there, I was wondering if there is a way to have the spark-shell >>>> (or pyspark) sit behind a NAT when talking to the cluster? >>>> >>>> Basically, we have OpenStack instances that run with internal IPs, and >>>> we assign floating IPs as needed. Since the workers make direct TCP >>>> connections back, the spark-shell is binding to the internal IP..not the >>>> "floating." Our other use case is running Vagrant VMs on our local >>>> machines..but, we don't have those VMs' NICs setup in "bridged" mode..it >>>> too has an "internal" IP. >>>> >>>> I tried using the SPARK_LOCAL_IP, and the various --conf >>>> spark.driver.host parameters...but it still get's "angry." >>>> >>>> Any thoughts/suggestions? >>>> >>>> Currently our work around is to VPNC connection from inside the vagrant >>>> VMs or Openstack instances...but, that doesn't seem like a long term plan. >>>> >>>> Thanks in advance! >>>> >>>> Cheers, >>>> Aaron >>>> >>> >>> >> >