Re: Spark Shell strange worker Exception

2014-10-28 Thread Saket Kumar
Hi Paolo,

The custom classes and jars are distributed across the Spark cluster via an 
HTTP server on the master when the absolute path of the application fat jar is 
specified in the spark-submit script. The Advanced Dependency Management 
section on https://spark.apache.org/docs/latest/submitting-applications.html 
explains that.

Could that be the reason for the worker access the master? However I don’t know 
the cause of the error.

Thanks,
Saket 


On 27 Oct 2014, at 19:39, Paolo Platter  wrote:

> Hi all,
> 
> I’m submitting a simple task using the spark shell against a cassandraRDD ( 
> Datastax Environment ).
> I’m getting the following eception from one of the workers:
> 
> INFO 2014-10-27 14:08:03 akka.event.slf4j.Slf4jLogger: Slf4jLogger started
> INFO 2014-10-27 14:08:03 Remoting: Starting remoting
> INFO 2014-10-27 14:08:03 Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkExecutor@10.105.111.130:50234]
> INFO 2014-10-27 14:08:03 Remoting: Remoting now listens on addresses: 
> [akka.tcp://sparkExecutor@10.105.111.130:50234]
> INFO 2014-10-27 14:08:03 
> org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to driver: 
> akka.tcp://sp...@srv02.pocbgsia.ats-online.it:39797/user/CoarseGrainedScheduler
> INFO 2014-10-27 14:08:03 org.apache.spark.deploy.worker.WorkerWatcher: 
> Connecting to worker akka.tcp://sparkWorker@10.105.111.130:34467/user/Worker
> INFO 2014-10-27 14:08:04 org.apache.spark.deploy.worker.WorkerWatcher: 
> Successfully connected to 
> akka.tcp://sparkWorker@10.105.111.130:34467/user/Worker
> INFO 2014-10-27 14:08:04 
> org.apache.spark.executor.CoarseGrainedExecutorBackend: Successfully 
> registered with driver
> INFO 2014-10-27 14:08:04 org.apache.spark.executor.Executor: Using REPL class 
> URI: http://159.8.18.11:51705
> INFO 2014-10-27 14:08:04 akka.event.slf4j.Slf4jLogger: Slf4jLogger started
> INFO 2014-10-27 14:08:04 Remoting: Starting remoting
> INFO 2014-10-27 14:08:04 Remoting: Remoting started; listening on addresses 
> :[akka.tcp://spark@10.105.111.130:49243]
> INFO 2014-10-27 14:08:04 Remoting: Remoting now listens on addresses: 
> [akka.tcp://spark@10.105.111.130:49243]
> INFO 2014-10-27 14:08:04 org.apache.spark.SparkEnv: Connecting to 
> BlockManagerMaster: 
> akka.tcp://sp...@srv02.pocbgsia.ats-online.it:39797/user/BlockManagerMaster
> INFO 2014-10-27 14:08:04 org.apache.spark.storage.DiskBlockManager: Created 
> local directory at 
> /usr/share/dse/spark/tmp/executor/spark-local-20141027140804-4d84
> INFO 2014-10-27 14:08:04 org.apache.spark.storage.MemoryStore: MemoryStore 
> started with capacity 23.0 GB.
> INFO 2014-10-27 14:08:04 org.apache.spark.network.ConnectionManager: Bound 
> socket to port 50542 with id = ConnectionManagerId(10.105.111.130,50542)
> INFO 2014-10-27 14:08:04 org.apache.spark.storage.BlockManagerMaster: Trying 
> to register BlockManager
> INFO 2014-10-27 14:08:04 org.apache.spark.storage.BlockManagerMaster: 
> Registered BlockManager
> INFO 2014-10-27 14:08:04 org.apache.spark.SparkEnv: Connecting to 
> MapOutputTracker: 
> akka.tcp://sp...@srv02.pocbgsia.ats-online.it:39797/user/MapOutputTracker
> INFO 2014-10-27 14:08:04 org.apache.spark.HttpFileServer: HTTP File server 
> directory is 
> /usr/share/dse/spark/tmp/executor/spark-a23656dc-efce-494b-875a-a1cf092c3230
> INFO 2014-10-27 14:08:04 org.apache.spark.HttpServer: Starting HTTP Server
> INFO 2014-10-27 14:08:27 
> org.apache.spark.executor.CoarseGrainedExecutorBackend: Got assigned task 0
> INFO 2014-10-27 14:08:28 org.apache.spark.executor.Executor: Running task ID 0
> ERROR 2014-10-27 14:08:28 org.apache.spark.executor.Executor: Exception in 
> task ID 0
> java.lang.ClassNotFoundException: com.datastax.bdp.spark.CassandraRDD
>   at 
> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:49)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.lang.ClassLoader.loadClass(Unknown Source)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Unknown Source)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
>   at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
>   at java.io.ObjectInputStream.readClassDesc(Unknown Source)
>   at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>   at java.io.ObjectInputStream.readObject0(Unknown Source)
>   at java.io.ObjectInputStream.readObject(Unknown Source)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
>   at 
> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
>   at 
> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
>   at java.io.ObjectInputStream.readExternalData(Unknown Source)
>   at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>   at java.io.ObjectInputStream.readObjec

Spark Shell strange worker Exception

2014-10-27 Thread Paolo Platter
Hi all,

I’m submitting a simple task using the spark shell against a cassandraRDD ( 
Datastax Environment ).
I’m getting the following eception from one of the workers:


INFO 2014-10-27 14:08:03 akka.event.slf4j.Slf4jLogger: Slf4jLogger started
INFO 2014-10-27 14:08:03 Remoting: Starting remoting
INFO 2014-10-27 14:08:03 Remoting: Remoting started; listening on addresses 
:[akka.tcp://sparkExecutor@10.105.111.130:50234]
INFO 2014-10-27 14:08:03 Remoting: Remoting now listens on addresses: 
[akka.tcp://sparkExecutor@10.105.111.130:50234]
INFO 2014-10-27 14:08:03 
org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to driver: 
akka.tcp://sp...@srv02.pocbgsia.ats-online.it:39797/user/CoarseGrainedScheduler
INFO 2014-10-27 14:08:03 org.apache.spark.deploy.worker.WorkerWatcher: 
Connecting to worker akka.tcp://sparkWorker@10.105.111.130:34467/user/Worker
INFO 2014-10-27 14:08:04 org.apache.spark.deploy.worker.WorkerWatcher: 
Successfully connected to 
akka.tcp://sparkWorker@10.105.111.130:34467/user/Worker
INFO 2014-10-27 14:08:04 
org.apache.spark.executor.CoarseGrainedExecutorBackend: Successfully registered 
with driver
INFO 2014-10-27 14:08:04 org.apache.spark.executor.Executor: Using REPL class 
URI: http://159.8.18.11:51705
INFO 2014-10-27 14:08:04 akka.event.slf4j.Slf4jLogger: Slf4jLogger started
INFO 2014-10-27 14:08:04 Remoting: Starting remoting
INFO 2014-10-27 14:08:04 Remoting: Remoting started; listening on addresses 
:[akka.tcp://spark@10.105.111.130:49243]
INFO 2014-10-27 14:08:04 Remoting: Remoting now listens on addresses: 
[akka.tcp://spark@10.105.111.130:49243]
INFO 2014-10-27 14:08:04 org.apache.spark.SparkEnv: Connecting to 
BlockManagerMaster: 
akka.tcp://sp...@srv02.pocbgsia.ats-online.it:39797/user/BlockManagerMaster
INFO 2014-10-27 14:08:04 org.apache.spark.storage.DiskBlockManager: Created 
local directory at 
/usr/share/dse/spark/tmp/executor/spark-local-20141027140804-4d84
INFO 2014-10-27 14:08:04 org.apache.spark.storage.MemoryStore: MemoryStore 
started with capacity 23.0 GB.
INFO 2014-10-27 14:08:04 org.apache.spark.network.ConnectionManager: Bound 
socket to port 50542 with id = ConnectionManagerId(10.105.111.130,50542)
INFO 2014-10-27 14:08:04 org.apache.spark.storage.BlockManagerMaster: Trying to 
register BlockManager
INFO 2014-10-27 14:08:04 org.apache.spark.storage.BlockManagerMaster: 
Registered BlockManager
INFO 2014-10-27 14:08:04 org.apache.spark.SparkEnv: Connecting to 
MapOutputTracker: 
akka.tcp://sp...@srv02.pocbgsia.ats-online.it:39797/user/MapOutputTracker
INFO 2014-10-27 14:08:04 org.apache.spark.HttpFileServer: HTTP File server 
directory is 
/usr/share/dse/spark/tmp/executor/spark-a23656dc-efce-494b-875a-a1cf092c3230
INFO 2014-10-27 14:08:04 org.apache.spark.HttpServer: Starting HTTP Server
INFO 2014-10-27 14:08:27 
org.apache.spark.executor.CoarseGrainedExecutorBackend: Got assigned task 0
INFO 2014-10-27 14:08:28 org.apache.spark.executor.Executor: Running task ID 0
ERROR 2014-10-27 14:08:28 org.apache.spark.executor.Executor: Exception in task 
ID 0
java.lang.ClassNotFoundException: com.datastax.bdp.spark.CassandraRDD
at 
org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:49)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
at java.io.ObjectInputStream.readClassDesc(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
at 
org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
at 
org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
at java.io.ObjectInputStream.readExternalData(Unknown Source)
at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
at java.io.ObjectInputStream.readObject0(Unknown Source)
at java.io.ObjectInputStream.readObject(Unknown Source)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Sour