Hi Paolo,

The custom classes and jars are distributed across the Spark cluster via an 
HTTP server on the master when the absolute path of the application fat jar is 
specified in the spark-submit script. The Advanced Dependency Management 
section on https://spark.apache.org/docs/latest/submitting-applications.html 
explains that.

Could that be the reason for the worker access the master? However I don’t know 
the cause of the error.

Thanks,
Saket 


On 27 Oct 2014, at 19:39, Paolo Platter <paolo.plat...@agilelab.it> wrote:

> Hi all,
> 
> I’m submitting a simple task using the spark shell against a cassandraRDD ( 
> Datastax Environment ).
> I’m getting the following eception from one of the workers:
> 
> INFO 2014-10-27 14:08:03 akka.event.slf4j.Slf4jLogger: Slf4jLogger started
> INFO 2014-10-27 14:08:03 Remoting: Starting remoting
> INFO 2014-10-27 14:08:03 Remoting: Remoting started; listening on addresses 
> :[akka.tcp://sparkExecutor@10.105.111.130:50234]
> INFO 2014-10-27 14:08:03 Remoting: Remoting now listens on addresses: 
> [akka.tcp://sparkExecutor@10.105.111.130:50234]
> INFO 2014-10-27 14:08:03 
> org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to driver: 
> akka.tcp://sp...@srv02.pocbgsia.ats-online.it:39797/user/CoarseGrainedScheduler
> INFO 2014-10-27 14:08:03 org.apache.spark.deploy.worker.WorkerWatcher: 
> Connecting to worker akka.tcp://sparkWorker@10.105.111.130:34467/user/Worker
> INFO 2014-10-27 14:08:04 org.apache.spark.deploy.worker.WorkerWatcher: 
> Successfully connected to 
> akka.tcp://sparkWorker@10.105.111.130:34467/user/Worker
> INFO 2014-10-27 14:08:04 
> org.apache.spark.executor.CoarseGrainedExecutorBackend: Successfully 
> registered with driver
> INFO 2014-10-27 14:08:04 org.apache.spark.executor.Executor: Using REPL class 
> URI: http://159.8.18.11:51705
> INFO 2014-10-27 14:08:04 akka.event.slf4j.Slf4jLogger: Slf4jLogger started
> INFO 2014-10-27 14:08:04 Remoting: Starting remoting
> INFO 2014-10-27 14:08:04 Remoting: Remoting started; listening on addresses 
> :[akka.tcp://spark@10.105.111.130:49243]
> INFO 2014-10-27 14:08:04 Remoting: Remoting now listens on addresses: 
> [akka.tcp://spark@10.105.111.130:49243]
> INFO 2014-10-27 14:08:04 org.apache.spark.SparkEnv: Connecting to 
> BlockManagerMaster: 
> akka.tcp://sp...@srv02.pocbgsia.ats-online.it:39797/user/BlockManagerMaster
> INFO 2014-10-27 14:08:04 org.apache.spark.storage.DiskBlockManager: Created 
> local directory at 
> /usr/share/dse/spark/tmp/executor/spark-local-20141027140804-4d84
> INFO 2014-10-27 14:08:04 org.apache.spark.storage.MemoryStore: MemoryStore 
> started with capacity 23.0 GB.
> INFO 2014-10-27 14:08:04 org.apache.spark.network.ConnectionManager: Bound 
> socket to port 50542 with id = ConnectionManagerId(10.105.111.130,50542)
> INFO 2014-10-27 14:08:04 org.apache.spark.storage.BlockManagerMaster: Trying 
> to register BlockManager
> INFO 2014-10-27 14:08:04 org.apache.spark.storage.BlockManagerMaster: 
> Registered BlockManager
> INFO 2014-10-27 14:08:04 org.apache.spark.SparkEnv: Connecting to 
> MapOutputTracker: 
> akka.tcp://sp...@srv02.pocbgsia.ats-online.it:39797/user/MapOutputTracker
> INFO 2014-10-27 14:08:04 org.apache.spark.HttpFileServer: HTTP File server 
> directory is 
> /usr/share/dse/spark/tmp/executor/spark-a23656dc-efce-494b-875a-a1cf092c3230
> INFO 2014-10-27 14:08:04 org.apache.spark.HttpServer: Starting HTTP Server
> INFO 2014-10-27 14:08:27 
> org.apache.spark.executor.CoarseGrainedExecutorBackend: Got assigned task 0
> INFO 2014-10-27 14:08:28 org.apache.spark.executor.Executor: Running task ID 0
> ERROR 2014-10-27 14:08:28 org.apache.spark.executor.Executor: Exception in 
> task ID 0
> java.lang.ClassNotFoundException: com.datastax.bdp.spark.CassandraRDD
>       at 
> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:49)
>       at java.lang.ClassLoader.loadClass(Unknown Source)
>       at java.lang.ClassLoader.loadClass(Unknown Source)
>       at java.lang.Class.forName0(Native Method)
>       at java.lang.Class.forName(Unknown Source)
>       at 
> org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
>       at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
>       at java.io.ObjectInputStream.readClassDesc(Unknown Source)
>       at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>       at java.io.ObjectInputStream.readObject0(Unknown Source)
>       at java.io.ObjectInputStream.readObject(Unknown Source)
>       at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
>       at 
> org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
>       at 
> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
>       at java.io.ObjectInputStream.readExternalData(Unknown Source)
>       at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>       at java.io.ObjectInputStream.readObject0(Unknown Source)
>       at java.io.ObjectInputStream.readObject(Unknown Source)
>       at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
>       at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
>       at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)
>       at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
>       at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>       at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>       at java.lang.Thread.run(Unknown Source)
> Caused by: java.io.FileNotFoundException: 
> http://159.8.18.11:51705/com/datastax/bdp/spark/CassandraRDD.class
>       at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown 
> Source)
>       at java.net.URL.openStream(Unknown Source)
>       at 
> org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:55)
>       ... 25 more
> 
> I don’t understand why a worker (private address: 10.105.111.130  
> srv02.pocbgsia.ats-online.it ) search a .class file on a public url of the 
> master node 
> (http://159.8.18.11:51705/com/datastax/bdp/spark/CassandraRDD.class)
> 
> What I’m missing ?
> 
> Thanks in advance
> 
> Paolo​

Reply via email to