RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-06 Thread Jacob Eisinger

Howdy Andrew,

Agreed - if that subnet is configured to only allow THOSE docker images
onto it, than, yeah, I figure it would be secure.  Great setup, in my
opinion!

(And, I think we both agree - a better one would be to have Spark only
listen on well known ports to allow for a secured firewall/network.)

Also, you might check out Pipework [1] to add those containers directly to
the subnet.

Jacob

[1]
https://github.com/jpetazzo/pipework#let-the-docker-host-communicate-over-macvlan-interfaces

Jacob D. Eisinger
IBM Emerging Technologies
jeis...@us.ibm.com - (512) 286-6075



From:   Andrew Lee 
To: "user@spark.apache.org" 
Date:   05/06/2014 02:26 AM
Subject:    RE: spark-shell driver interacting with Workers in YARN mode -
    firewall blocking communication



Hi Jacob,

I agree, we need to address both driver and workers bidirectionally.

If the subnet is isolated and self-contained, only limited ports are
configured to access the driver via a dedicated gateway for the user, could
you explain your concern? or what doesn't satisfy the security criteria?

Are you referring to any security certificate or regulation requirement
that separate subnet with a configurable policy couldn't satisfy?

What I meant a subnet basically includes both driver and Workers running in
this subnet. See following example setup.

e.g. (254 max nodes for example)
Hadoop / HDFS => 10.5.5.0/24 (GW 10.5.5.1) eth0
Spark Driver and Worker bind to => 10.10.10.0/24 eth1 with routing to
10.5.5.0/24 on specific ports for NameNode and DataNode.
So basically driver and Worker are bound to the same subnet that is
separated from others.
iptables for 10.10.10.0/24 can allow SSH 22 login (or port forwarding) onto
the Spark Driver machine to launch shell or submit spark jobs.



Subject: RE: spark-shell driver interacting with Workers in YARN mode -
firewall blocking communication
To: user@spark.apache.org
From: jeis...@us.ibm.com
Date: Mon, 5 May 2014 12:40:53 -0500

Howdy Andrew,

I agree; the subnet idea is a good one...  unfortunately, it doesn't really
help to secure the network.

You mentioned that the drivers need to talk to the workers.  I think it is
slightly broader - all of the workers and the driver/shell need to be
addressable from/to each other on any dynamic port.

I would check out setting the environment variable SPARK_LOCAL_IP [1].
This seems to enable Spark to bind correctly to a private subnet.

Jacob

[1]  http://spark.apache.org/docs/latest/configuration.html

Jacob D. Eisinger
IBM Emerging Technologies
jeis...@us.ibm.com - (512) 286-6075

Andrew Lee ---05/04/2014 09:57:08 PM---Hi Jacob, Taking both concerns into
account, I'm actually thinking about using a separate subnet to

From: Andrew Lee 
To: "user@spark.apache.org" 
Date: 05/04/2014 09:57 PM
Subject: RE: spark-shell driver interacting with Workers in YARN mode -
firewall blocking communication




Hi Jacob,

Taking both concerns into account, I'm actually thinking about using a
separate subnet to isolate the Spark Workers, but need to look into how to
bind the process onto the correct interface first. This may require some
code change.
Separate subnet doesn't limit itself with port range so port exhaustion
should rarely happen, and won't impact performance.

By opening up all port between 32768-61000 is actually the same as no
firewall, this expose some security concerns, but need more information
whether that is critical or not.

The bottom line is the driver needs to talk to the Workers. The way how
user access the Driver should be easier to solve such as launching Spark
(shell) driver on a specific interface.

Likewise, if you found out any interesting solutions, please let me know.
I'll share the solution once I have something up and running. Currently, it
is running ok with iptables off, but still need to figure out how to
product-ionize the security part.

Subject: RE: spark-shell driver interacting with Workers in YARN mode -
firewall blocking communication
To: user@spark.apache.org
From: jeis...@us.ibm.com
Date: Fri, 2 May 2014 16:07:50 -0500

Howdy Andrew,

I think I am running into the same issue [1] as you.  It appears that Spark
opens up dynamic / ephemera [2] ports for each job on the shell and the
workers.  As you are finding out, this makes securing and managing the
network for Spark very difficult.

> Any idea how to restrict the 'Workers' port range?
The port range can be found by running:
  $ sysctl net.ipv4.ip_local_port_range
  net.ipv4.ip_local_port_range = 32768 61000

With that being said, a couple avenues you may try:
  Limit the dynamic ports [3] to a more reasonable number and open all
  of these ports on your firewall; obviously, this might have
  unintended consequences like port exhaustion.
  Secure the network another way like through a private VPN; this may
  reduce Spark's performance.

If you have oth

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-06 Thread Andrew Lee
Hi Jacob,
I agree, we need to address both driver and workers bidirectionally.
If the subnet is isolated and self-contained, only limited ports are configured 
to access the driver via a dedicated gateway for the user, could you explain 
your concern? or what doesn't satisfy the security criteria?
Are you referring to any security certificate or regulation requirement that 
separate subnet with a configurable policy couldn't satisfy?
What I meant a subnet basically includes both driver and Workers running in 
this subnet. See following example setup.
e.g. (254 max nodes for example)Hadoop / HDFS => 10.5.5.0/24 (GW 10.5.5.1) 
eth0Spark Driver and Worker bind to => 10.10.10.0/24 eth1 with routing to 
10.5.5.0/24 on specific ports for NameNode and DataNode.So basically driver and 
Worker are bound to the same subnet that is separated from others.iptables for 
10.10.10.0/24 can allow SSH 22 login (or port forwarding) onto the Spark Driver 
machine to launch shell or submit spark jobs.


Subject: RE: spark-shell driver interacting with Workers in YARN mode - 
firewall blocking communication
To: user@spark.apache.org
From: jeis...@us.ibm.com
Date: Mon, 5 May 2014 12:40:53 -0500


Howdy Andrew,



I agree; the subnet idea is a good one...  unfortunately, it doesn't really 
help to secure the network.



You mentioned that the drivers need to talk to the workers.  I think it is 
slightly broader - all of the workers and the driver/shell need to be 
addressable from/to each other on any dynamic port.



I would check out setting the environment variable SPARK_LOCAL_IP [1].  This 
seems to enable Spark to bind correctly to a private subnet.



Jacob



[1]  http://spark.apache.org/docs/latest/configuration.html 



Jacob D. Eisinger

IBM Emerging Technologies

jeis...@us.ibm.com - (512) 286-6075



Andrew Lee ---05/04/2014 09:57:08 PM---Hi Jacob, Taking both concerns into 
account, I'm actually thinking about using a separate subnet to



From:   Andrew Lee 

To: "user@spark.apache.org" 

Date:   05/04/2014 09:57 PM

Subject:    RE: spark-shell driver interacting with Workers in YARN mode - 
firewall blocking communication








Hi Jacob,



Taking both concerns into account, I'm actually thinking about using a separate 
subnet to isolate the Spark Workers, but need to look into how to bind the 
process onto the correct interface first. This may require some code change.

Separate subnet doesn't limit itself with port range so port exhaustion should 
rarely happen, and won't impact performance.



By opening up all port between 32768-61000 is actually the same as no firewall, 
this expose some security concerns, but need more information whether that is 
critical or not.



The bottom line is the driver needs to talk to the Workers. The way how user 
access the Driver should be easier to solve such as launching Spark (shell) 
driver on a specific interface.



Likewise, if you found out any interesting solutions, please let me know. I'll 
share the solution once I have something up and running. Currently, it is 
running ok with iptables off, but still need to figure out how to 
product-ionize the security part.



Subject: RE: spark-shell driver interacting with Workers in YARN mode - 
firewall blocking communication

To: user@spark.apache.org

From: jeis...@us.ibm.com

Date: Fri, 2 May 2014 16:07:50 -0500



Howdy Andrew,



I think I am running into the same issue [1] as you.  It appears that Spark 
opens up dynamic / ephemera [2] ports for each job on the shell and the 
workers.  As you are finding out, this makes securing and managing the network 
for Spark very difficult.



> Any idea how to restrict the 'Workers' port range?

The port range can be found by running: 
$ sysctl net.ipv4.ip_local_port_range

net.ipv4.ip_local_port_range = 32768 61000


With that being said, a couple avenues you may try: 

Limit the dynamic ports [3] to a more reasonable number and open all of these 
ports on your firewall; obviously, this might have unintended consequences like 
port exhaustion. 
Secure the network another way like through a private VPN; this may reduce 
Spark's performance.


If you have other workarounds, I am all ears --- please let me know!

Jacob



[1] 
http://apache-spark-user-list.1001560.n3.nabble.com/Securing-Spark-s-Network-tp4832p4984.html

[2] http://en.wikipedia.org/wiki/Ephemeral_port

[3] 
http://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html



Jacob D. Eisinger

IBM Emerging Technologies

jeis...@us.ibm.com - (512) 286-6075



Andrew Lee ---05/02/2014 03:15:42 PM---Hi Yana,  I did. I configured the the 
port in spark-env.sh, the problem is not the driver port which



From: Andrew Lee 

To: "user@spark.apache.org" 

Date: 05/02/2014 03:15 PM

Subject: RE: spark-shell driver interacting with Workers in YARN mode - 
firewall blocking communication







Hi Yana, 



I did. I c

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-05 Thread Jacob Eisinger

Howdy Andrew,

I agree; the subnet idea is a good one...  unfortunately, it doesn't really
help to secure the network.

You mentioned that the drivers need to talk to the workers.  I think it is
slightly broader - all of the workers and the driver/shell need to be
addressable from/to each other on any dynamic port.

I would check out setting the environment variable SPARK_LOCAL_IP [1].
This seems to enable Spark to bind correctly to a private subnet.

Jacob

[1]  http://spark.apache.org/docs/latest/configuration.html

Jacob D. Eisinger
IBM Emerging Technologies
jeis...@us.ibm.com - (512) 286-6075



From:   Andrew Lee 
To: "user@spark.apache.org" 
Date:   05/04/2014 09:57 PM
Subject:    RE: spark-shell driver interacting with Workers in YARN mode -
    firewall blocking communication



Hi Jacob,

Taking both concerns into account, I'm actually thinking about using a
separate subnet to isolate the Spark Workers, but need to look into how to
bind the process onto the correct interface first. This may require some
code change.
Separate subnet doesn't limit itself with port range so port exhaustion
should rarely happen, and won't impact performance.

By opening up all port between 32768-61000 is actually the same as no
firewall, this expose some security concerns, but need more information
whether that is critical or not.

The bottom line is the driver needs to talk to the Workers. The way how
user access the Driver should be easier to solve such as launching Spark
(shell) driver on a specific interface.

Likewise, if you found out any interesting solutions, please let me know.
I'll share the solution once I have something up and running. Currently, it
is running ok with iptables off, but still need to figure out how to
product-ionize the security part.

Subject: RE: spark-shell driver interacting with Workers in YARN mode -
firewall blocking communication
To: user@spark.apache.org
From: jeis...@us.ibm.com
Date: Fri, 2 May 2014 16:07:50 -0500

Howdy Andrew,

I think I am running into the same issue [1] as you.  It appears that Spark
opens up dynamic / ephemera [2] ports for each job on the shell and the
workers.  As you are finding out, this makes securing and managing the
network for Spark very difficult.

> Any idea how to restrict the 'Workers' port range?
The port range can be found by running:
  $ sysctl net.ipv4.ip_local_port_range
  net.ipv4.ip_local_port_range = 32768 61000

With that being said, a couple avenues you may try:
  Limit the dynamic ports [3] to a more reasonable number and open all
  of these ports on your firewall; obviously, this might have
  unintended consequences like port exhaustion.
  Secure the network another way like through a private VPN; this may
  reduce Spark's performance.

If you have other workarounds, I am all ears --- please let me know!
Jacob

[1]
http://apache-spark-user-list.1001560.n3.nabble.com/Securing-Spark-s-Network-tp4832p4984.html

[2] http://en.wikipedia.org/wiki/Ephemeral_port
[3]
http://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html


Jacob D. Eisinger
IBM Emerging Technologies
jeis...@us.ibm.com - (512) 286-6075

Andrew Lee ---05/02/2014 03:15:42 PM---Hi Yana,  I did. I configured the
the port in spark-env.sh, the problem is not the driver port which

From: Andrew Lee 
To: "user@spark.apache.org" 
Date: 05/02/2014 03:15 PM
Subject: RE: spark-shell driver interacting with Workers in YARN mode -
firewall blocking communication



Hi Yana,

I did. I configured the the port in spark-env.sh, the problem is not the
driver port which is fixed.
it's the Workers port that are dynamic every time when they are launched in
the YARN container. :-(

Any idea how to restrict the 'Workers' port range?

Date: Fri, 2 May 2014 14:49:23 -0400
Subject: Re: spark-shell driver interacting with Workers in YARN mode -
firewall blocking communication
From: yana.kadiy...@gmail.com
To: user@spark.apache.org

I think what you want to do is set spark.driver.port to a fixed port.


On Fri, May 2, 2014 at 1:52 PM, Andrew Lee  wrote:
  Hi All,

  I encountered this problem when the firewall is enabled between the
  spark-shell and the Workers.

  When I launch spark-shell in yarn-client mode, I notice that Workers
  on the YARN containers are trying to talk to the driver
  (spark-shell), however, the firewall is not opened and caused
  timeout.

  For the Workers, it tries to open listening ports on 54xxx for each
  Worker? Is the port random in such case?
  What will be the better way to predict the ports so I can configure
  the firewall correctly between the driver (spark-shell) and the
  Workers? Is there a range of ports we can specify in the
  firewall/iptables?

  Any ideas?


RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-04 Thread Andrew Lee
Hi Jacob,
Taking both concerns into account, I'm actually thinking about using a separate 
subnet to isolate the Spark Workers, but need to look into how to bind the 
process onto the correct interface first. This may require some code 
change.Separate subnet doesn't limit itself with port range so port exhaustion 
should rarely happen, and won't impact performance.
By opening up all port between 32768-61000 is actually the same as no firewall, 
this expose some security concerns, but need more information whether that is 
critical or not.
The bottom line is the driver needs to talk to the Workers. The way how user 
access the Driver should be easier to solve such as launching Spark (shell) 
driver on a specific interface.
Likewise, if you found out any interesting solutions, please let me know. I'll 
share the solution once I have something up and running. Currently, it is 
running ok with iptables off, but still need to figure out how to 
product-ionize the security part.
Subject: RE: spark-shell driver interacting with Workers in YARN mode - 
firewall blocking communication
To: user@spark.apache.org
From: jeis...@us.ibm.com
Date: Fri, 2 May 2014 16:07:50 -0500


Howdy Andrew,



I think I am running into the same issue [1] as you.  It appears that Spark 
opens up dynamic / ephemera [2] ports for each job on the shell and the 
workers.  As you are finding out, this makes securing and managing the network 
for Spark very difficult.



> Any idea how to restrict the 'Workers' port range?

The port range can be found by running:

$ sysctl net.ipv4.ip_local_port_range

net.ipv4.ip_local_port_range = 3276861000


With that being said, a couple avenues you may try:

Limit the dynamic ports [3] to a more reasonable number and open all of these 
ports on your firewall; obviously, this might have unintended consequences like 
port exhaustion.
Secure the network another way like through a private VPN; this may reduce 
Spark's performance.


If you have other workarounds, I am all ears --- please let me know!

Jacob



[1] 
http://apache-spark-user-list.1001560.n3.nabble.com/Securing-Spark-s-Network-tp4832p4984.html

[2] http://en.wikipedia.org/wiki/Ephemeral_port

[3] 
http://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html



Jacob D. Eisinger

IBM Emerging Technologies

jeis...@us.ibm.com - (512) 286-6075



Andrew Lee ---05/02/2014 03:15:42 PM---Hi Yana,  I did. I configured the the 
port in spark-env.sh, the problem is not the driver port which



From:   Andrew Lee 

To: "user@spark.apache.org" 

Date:   05/02/2014 03:15 PM

Subject:    RE: spark-shell driver interacting with Workers in YARN mode - 
firewall blocking communication







Hi Yana, 



I did. I configured the the port in spark-env.sh, the problem is not the driver 
port which is fixed.

it's the Workers port that are dynamic every time when they are launched in the 
YARN container. :-(



Any idea how to restrict the 'Workers' port range?



Date: Fri, 2 May 2014 14:49:23 -0400

Subject: Re: spark-shell driver interacting with Workers in YARN mode - 
firewall blocking communication

From: yana.kadiy...@gmail.com

To: user@spark.apache.org



I think what you want to do is set spark.driver.port to a fixed port.





On Fri, May 2, 2014 at 1:52 PM, Andrew Lee  wrote:
Hi All,



I encountered this problem when the firewall is enabled between the spark-shell 
and the Workers.



When I launch spark-shell in yarn-client mode, I notice that Workers on the 
YARN containers are trying to talk to the driver (spark-shell), however, the 
firewall is not opened and caused timeout.



For the Workers, it tries to open listening ports on 54xxx for each Worker? Is 
the port random in such case?

What will be the better way to predict the ports so I can configure the 
firewall correctly between the driver (spark-shell) and the Workers? Is there a 
range of ports we can specify in the firewall/iptables?



Any ideas?

  

RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-02 Thread Jacob Eisinger

Howdy Andrew,

I think I am running into the same issue [1] as you.  It appears that Spark
opens up dynamic / ephemera [2] ports for each job on the shell and the
workers.  As you are finding out, this makes securing and managing the
network for Spark very difficult.

> Any idea how to restrict the 'Workers' port range?
The port range can be found by running:
   $ sysctl net.ipv4.ip_local_port_range
   net.ipv4.ip_local_port_range = 32768 61000

With that being said, a couple avenues you may try:
  Limit the dynamic ports [3] to a more reasonable number and open all
  of these ports on your firewall; obviously, this might have
  unintended consequences like port exhaustion.
  Secure the network another way like through a private VPN; this may
  reduce Spark's performance.

If you have other workarounds, I am all ears --- please let me know!
Jacob

[1]
http://apache-spark-user-list.1001560.n3.nabble.com/Securing-Spark-s-Network-tp4832p4984.html
[2] http://en.wikipedia.org/wiki/Ephemeral_port
[3]
http://www.cyberciti.biz/tips/linux-increase-outgoing-network-sockets-range.html

Jacob D. Eisinger
IBM Emerging Technologies
jeis...@us.ibm.com - (512) 286-6075



From:   Andrew Lee 
To: "user@spark.apache.org" 
Date:   05/02/2014 03:15 PM
Subject:    RE: spark-shell driver interacting with Workers in YARN mode -
    firewall blocking communication



Hi Yana,

I did. I configured the the port in spark-env.sh, the problem is not the
driver port which is fixed.
it's the Workers port that are dynamic every time when they are launched in
the YARN container. :-(

Any idea how to restrict the 'Workers' port range?

Date: Fri, 2 May 2014 14:49:23 -0400
Subject: Re: spark-shell driver interacting with Workers in YARN mode -
firewall blocking communication
From: yana.kadiy...@gmail.com
To: user@spark.apache.org

I think what you want to do is set spark.driver.port to a fixed port.


On Fri, May 2, 2014 at 1:52 PM, Andrew Lee  wrote:
  Hi All,

  I encountered this problem when the firewall is enabled between the
  spark-shell and the Workers.

  When I launch spark-shell in yarn-client mode, I notice that Workers
  on the YARN containers are trying to talk to the driver
  (spark-shell), however, the firewall is not opened and caused
  timeout.

  For the Workers, it tries to open listening ports on 54xxx for each
  Worker? Is the port random in such case?
  What will be the better way to predict the ports so I can configure
  the firewall correctly between the driver (spark-shell) and the
  Workers? Is there a range of ports we can specify in the
  firewall/iptables?

  Any ideas?


RE: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-02 Thread Andrew Lee
Hi Yana, 
I did. I configured the the port in spark-env.sh, the problem is not the driver 
port which is fixed.it's the Workers port that are dynamic every time when they 
are launched in the YARN container. :-(
Any idea how to restrict the 'Workers' port range?

Date: Fri, 2 May 2014 14:49:23 -0400
Subject: Re: spark-shell driver interacting with Workers in YARN mode - 
firewall blocking communication
From: yana.kadiy...@gmail.com
To: user@spark.apache.org

I think what you want to do is set spark.driver.port to a fixed port.


On Fri, May 2, 2014 at 1:52 PM, Andrew Lee  wrote:




Hi All,
I encountered this problem when the firewall is enabled between the spark-shell 
and the Workers.
When I launch spark-shell in yarn-client mode, I notice that Workers on the 
YARN containers are trying to talk to the driver (spark-shell), however, the 
firewall is not opened and caused timeout.

For the Workers, it tries to open listening ports on 54xxx for each Worker? Is 
the port random in such case?What will be the better way to predict the ports 
so I can configure the firewall correctly between the driver (spark-shell) and 
the Workers? Is there a range of ports we can specify in the firewall/iptables?

Any ideas?

  

Re: spark-shell driver interacting with Workers in YARN mode - firewall blocking communication

2014-05-02 Thread Yana Kadiyska
I think what you want to do is set spark.driver.port to a fixed port.


On Fri, May 2, 2014 at 1:52 PM, Andrew Lee  wrote:

> Hi All,
>
> I encountered this problem when the firewall is enabled between the
> spark-shell and the Workers.
>
> When I launch spark-shell in yarn-client mode, I notice that Workers on
> the YARN containers are trying to talk to the driver (spark-shell),
> however, the firewall is not opened and caused timeout.
>
> For the Workers, it tries to open listening ports on 54xxx for each
> Worker? Is the port random in such case?
> What will be the better way to predict the ports so I can configure the
> firewall correctly between the driver (spark-shell) and the Workers? Is
> there a range of ports we can specify in the firewall/iptables?
>
> Any ideas?
>