Submitting Spark Applications - Do I need to leave ports open?

2015-10-26 Thread markluk
I want to submit interactive applications to a remote Spark cluster running
in standalone mode. 

I understand I need to connect to master's 7077 port. It also seems like the
master node need to open connections to my local machine. And the ports that
it needs to open are different every time. 

If I have firewall enabled on my local machine, spark-submit fails since the
ports it needs to open on my local machine are unreachable, so spark-submit
fails to connect to the master. 

I was able to get it to work if i disable firewall on my local machine. But
that's not a real solution. 

Is there some config that I'm not aware of that solves this problem?




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Submitting-Spark-Applications-Do-I-need-to-leave-ports-open-tp25207.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark cluster - use machine name in WorkerID, not IP address

2015-10-01 Thread markluk
I'm running a standalone Spark cluster of 1 master and 2 slaves.

My slaves file under /conf list the fully qualified domain name of the 2
slave machines

When I look on the Spark webpage ( on :8080), I see my 2 workers, but the
worker ID uses the IP address , like
worker-20151001153012-172.31.51.158-44699


 

That worker ID is not very human friendly. Is there a way to use the machine
name in the ID instead? like 
worker-20151001153012-node1-44699



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-cluster-use-machine-name-in-WorkerID-not-IP-address-tp24905.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Worker node timeout exception

2015-09-30 Thread markluk
I setup a new Spark cluster. My worker node is dying with the following
exception. 

Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[120 seconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at 
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcEnv.scala:241)
... 11 more


Any ideas what's wrong? This is happening both for a spark program and spark
shell. 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Worker-node-timeout-exception-tp24893.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Get variable into Spark's foreachRDD function

2015-09-28 Thread markluk
I have a streaming Spark process and I need to do some logging in the
`foreachRDD` function, but I'm having trouble accessing the logger as a
variable in the `foreachRDD` function

I would like to do the following

import logging

myLogger = logging.getLogger(LOGGER_NAME)
...
...
someData = 

someData.foreachRDD(lambda now, rdds : myLogger.info( ))

Inside the lambda, it cannot access `myLogger`. I get a giant stacktrace -
here is a snippet.


  File
"/juicero/press-mgmt/spark-1.5.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py",
line 537, in save_reduce
save(state)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 548, in save_tuple
save(element)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
  File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
self._batch_setitems(obj.iteritems())
  File "/usr/lib/python2.7/pickle.py", line 681, in _batch_setitems
save(v)
  File "/usr/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
  File
"/juicero/press-mgmt/spark-1.5.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py",
line 315, in save_builtin_function
return self.save_function(obj)
  File
"/juicero/press-mgmt/spark-1.5.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/cloudpickle.py",
line 191, in save_function
if islambda(obj) or obj.__code__.co_filename == '' or
themodule is None:
AttributeError: 'builtin_function_or_method' object has no attribute
'__code__'



I don't understand why I can't access `myLogger`. Does it have something to
do with Spark cannot serialize this logger object?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Get-variable-into-Spark-s-foreachRDD-function-tp24852.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org