Spark streaming on standalone cluster

Borja Garrido Bear Tue, 30 Jun 2015 08:00:46 -0700

Hi all,

I'm running a spark standalone cluster with one master and one slave
(different machines and both in version 1.4.0), the thing is I have a spark
streaming job that gets data from Kafka, and the just prints it.


To configure the cluster I just started the master and then the slaves
pointing to it, as everything appears in the web interface I assumed
everything was fine, but maybe I missed some configuration.

When I run it locally there is no problem, it works.
When I run it in the cluster the worker state appears as "loading"
 - If the job is a Scala one, when I stop it I receive all the output
 - If the job is Python, when I stop it I receive a bunch of these
exceptions

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

ERROR JobScheduler: Error running job streaming job 1435675420000 ms.0
py4j.Py4JException: An exception was raised by the Python Proxy. Return
Message: null
at py4j.Protocol.getReturnValue(Protocol.java:417)
at py4j.reflection.PythonProxyHandler.invoke(PythonProxyHandler.java:113)
at com.sun.proxy.$Proxy14.call(Unknown Source)
at
org.apache.spark.streaming.api.python.TransformFunction.apply(PythonDStream.scala:63)
at
org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
at
org.apache.spark.streaming.api.python.PythonDStream$$anonfun$callForeachRDD$1.apply(PythonDStream.scala:156)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:42)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:40)
at
org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:399)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:40)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at
org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:40)
at scala.util.Try$.apply(Try.scala:161)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:34)
at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:193)
at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:193)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:57)
at
org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:192)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\

Is there any known issue with spark streaming and the standalone mode? or
with Python?

Spark streaming on standalone cluster

Reply via email to