Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-09 Thread Jeroen Vlek
Hi,

I posted a question with regards to Phoenix and Spark Streaming on 
StackOverflow [1]. Please find a copy of the question to this email below the 
first stack trace. I also already contacted the Phoenix mailing list and tried 
the suggestion of setting spark.driver.userClassPathFirst. Unfortunately that 
only pushed me further into the dependency hell, which I tried to resolve 
until I hit a wall with an UnsatisfiedLinkError on Snappy.

What I am trying to achieve: To save a stream from Kafka into  Phoenix/Hbase 
via Spark Streaming. I'm using MapR as a platform and the original exception 
happens both on a 3-node cluster, as on the MapR Sandbox (a VM for 
experimentation), in YARN and stand-alone mode. Further experimentation (like 
the saveAsNewHadoopApiFile below), was done only on the sandbox in standalone 
mode.

Phoenix only supports Spark from 4.4.0 onwards, but I thought I could 
use a naive implementation that creates a new connection for 
every RDD from the DStream in 4.3.1.  This resulted in the 
ClassNotFoundException described in [1], so I switched to 4.4.0.

Unfortunately the saveToPhoenix method is only available in Scala. So I did 
find the suggestion to try it via the saveAsNewHadoopApiFile method [2] and an 
example implementation [3], which I adapted to my own needs. 

However, 4.4.0 + saveAsNewHadoopApiFile  raises the same 
ClassNotFoundExeption, just a slightly different stacktrace:

  java.lang.RuntimeException: java.sql.SQLException: ERROR 103 
(08004): Unable to establish connection.
at 
org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:58)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:995)
at 
org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
at 
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.sql.SQLException: ERROR 103 (08004): Unable to 
establish connection.
at 
org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:386)
at 
org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:288)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:171)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1881)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1860)
at 
org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1860)
at 
org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:162)
at 
org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:131)
at 
org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133)
at 
java.sql.DriverManager.getConnection(DriverManager.java:571)
at 
java.sql.DriverManager.getConnection(DriverManager.java:187)
at 
org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:92)
at 
org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:80)
at 
org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:68)
at 
org.apache.phoenix.mapreduce.PhoenixRecordWriter.init(PhoenixRecordWriter.java:49)
at 
org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:55)
... 8 more
Caused by: java.io.IOException: 
java.lang.reflect.InvocationTargetException
at 
org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:457)
at 
org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:350)
at 
org.apache.phoenix.query.HConnectionFactory$HConnectionFactoryImpl.createConnection(HConnectionFactory.java:47)
at 
org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:286)
... 23 more
Caused by: java.lang.reflect.InvocationTargetException
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java

Re: Apache Phoenix (4.3.1 and 4.4.0-HBase-0.98) on Spark 1.3.1 ClassNotFoundException

2015-06-09 Thread Josh Mahonin
This may or may not be helpful for your classpath issues, but I wanted to
verify that basic functionality worked, so I made a sample app here:

https://github.com/jmahonin/spark-streaming-phoenix

This consumes events off a Kafka topic using spark streaming, and writes
out event counts to Phoenix using the new phoenix-spark functionality:
http://phoenix.apache.org/phoenix_spark.html

It's definitely overkill, and would probably be more efficient to use the
JDBC driver directly, but it serves as a proof-of-concept.

I've only tested this in local mode. To convert it to a full jobs JAR, I
suspect that keeping all of the spark and phoenix dependencies marked as
'provided', and including the Phoenix client JAR in the Spark classpath
would work as well.

Good luck,

Josh

On Tue, Jun 9, 2015 at 4:40 AM, Jeroen Vlek j.v...@anchormen.nl wrote:

 Hi,

 I posted a question with regards to Phoenix and Spark Streaming on
 StackOverflow [1]. Please find a copy of the question to this email below
 the
 first stack trace. I also already contacted the Phoenix mailing list and
 tried
 the suggestion of setting spark.driver.userClassPathFirst. Unfortunately
 that
 only pushed me further into the dependency hell, which I tried to resolve
 until I hit a wall with an UnsatisfiedLinkError on Snappy.

 What I am trying to achieve: To save a stream from Kafka into
 Phoenix/Hbase
 via Spark Streaming. I'm using MapR as a platform and the original
 exception
 happens both on a 3-node cluster, as on the MapR Sandbox (a VM for
 experimentation), in YARN and stand-alone mode. Further experimentation
 (like
 the saveAsNewHadoopApiFile below), was done only on the sandbox in
 standalone
 mode.

 Phoenix only supports Spark from 4.4.0 onwards, but I thought I could
 use a naive implementation that creates a new connection for
 every RDD from the DStream in 4.3.1.  This resulted in the
 ClassNotFoundException described in [1], so I switched to 4.4.0.

 Unfortunately the saveToPhoenix method is only available in Scala. So I did
 find the suggestion to try it via the saveAsNewHadoopApiFile method [2]
 and an
 example implementation [3], which I adapted to my own needs.

 However, 4.4.0 + saveAsNewHadoopApiFile  raises the same
 ClassNotFoundExeption, just a slightly different stacktrace:

   java.lang.RuntimeException: java.sql.SQLException: ERROR 103
 (08004): Unable to establish connection.
 at

 org.apache.phoenix.mapreduce.PhoenixOutputFormat.getRecordWriter(PhoenixOutputFormat.java:58)
 at

 org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:995)
 at

 org.apache.spark.rdd.PairRDDFunctions$$anonfun$12.apply(PairRDDFunctions.scala:979)
 at
 org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
 at org.apache.spark.scheduler.Task.run(Task.scala:64)
 at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:203)
 at

 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at

 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)
 Caused by: java.sql.SQLException: ERROR 103 (08004): Unable to
 establish connection.
 at

 org.apache.phoenix.exception.SQLExceptionCode$Factory$1.newException(SQLExceptionCode.java:386)
 at

 org.apache.phoenix.exception.SQLExceptionInfo.buildException(SQLExceptionInfo.java:145)
 at

 org.apache.phoenix.query.ConnectionQueryServicesImpl.openConnection(ConnectionQueryServicesImpl.java:288)
 at

 org.apache.phoenix.query.ConnectionQueryServicesImpl.access$300(ConnectionQueryServicesImpl.java:171)
 at

 org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1881)
 at

 org.apache.phoenix.query.ConnectionQueryServicesImpl$12.call(ConnectionQueryServicesImpl.java:1860)
 at

 org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:77)
 at

 org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:1860)
 at

 org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:162)
 at

 org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.connect(PhoenixEmbeddedDriver.java:131)
 at
 org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:133)
 at
 java.sql.DriverManager.getConnection(DriverManager.java:571)
 at
 java.sql.DriverManager.getConnection(DriverManager.java:187)
 at

 org.apache.phoenix.mapreduce.util.ConnectionUtil.getConnection(ConnectionUtil.java:92)
 at

 org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:80)
 at

 org.apache.phoenix.mapreduce.util.ConnectionUtil.getOutputConnection(ConnectionUtil.java:68)
 at

 org.apache.phoenix.mapreduce.PhoenixRecordWriter.init(PhoenixRecordWriter.java:49

Re: ClassNotFoundException for Kryo serialization

2015-05-02 Thread Akshat Aranya
Now I am running up against some other problem while trying to schedule tasks:

15/05/01 22:32:03 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalStateException: unread block data
at 
java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2419)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1380)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:180)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)


I verified that the same configuration works without using Kryo serialization.


On Fri, May 1, 2015 at 9:44 AM, Akshat Aranya aara...@gmail.com wrote:
 I cherry-picked the fix for SPARK-5470 and the problem has gone away.

 On Fri, May 1, 2015 at 9:15 AM, Akshat Aranya aara...@gmail.com wrote:
 Yes, this class is present in the jar that was loaded in the classpath
 of the executor Java process -- it wasn't even lazily added as a part
 of the task execution.  Schema$MyRow is a protobuf-generated class.

 After doing some digging around, I think I might be hitting up against
 SPARK-5470, the fix for which hasn't been merged into 1.2, as far as I
 can tell.

 On Fri, May 1, 2015 at 9:05 AM, Ted Yu yuzhih...@gmail.com wrote:
 bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow

 So the above class is in the jar which was in the classpath ?
 Can you tell us a bit more about Schema$MyRow ?

 On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 I'm getting a ClassNotFoundException at the executor when trying to
 register a class for Kryo serialization:

 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
   at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243)
   at
 org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254)
   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257)
   at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182)
   at org.apache.spark.executor.Executor.init(Executor.scala:87)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
   at
 scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: org.apache.spark.SparkException: Failed to load class to
 register with Kryo
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2

ClassNotFoundException for Kryo serialization

2015-05-01 Thread Akshat Aranya
Hi,

I'm getting a ClassNotFoundException at the executor when trying to
register a class for Kryo serialization:

java.lang.reflect.InvocationTargetException
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
  at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243)
  at
org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254)
  at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257)
  at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182)
  at org.apache.spark.executor.Executor.init(Executor.scala:87)
  at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61)
  at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
  at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
  at
scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
  at
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
  at
org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
  at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
  at
org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
  at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
  at
org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36)
  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
  at akka.actor.ActorCell.invoke(ActorCell.scala:487)
  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
  at akka.dispatch.Mailbox.run(Mailbox.scala:220)
  at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
  at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
  at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
  at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
  at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.spark.SparkException: Failed to load class to
register with Kryo
  at
org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:66)
  at
org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:61)
  at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
  at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
  at
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
  at
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
  at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
  at
org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:61)
  ... 28 more
Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow
  at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
  at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
  at java.security.AccessController.doPrivileged(Native Method)
  at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  at java.lang.Class.forName0(Native Method)
  at java.lang.Class.forName(Class.java:190)
  at
org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:63)

I have verified that when the executor process is launched, my jar is in
the classpath of the command line of the executor.  I expect the class to
be found by the default classloader being used at KryoSerializer.scala:63

Any ideas?


Re: ClassNotFoundException for Kryo serialization

2015-05-01 Thread Ted Yu
bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow

So the above class is in the jar which was in the classpath ?
Can you tell us a bit more about Schema$MyRow ?

On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 I'm getting a ClassNotFoundException at the executor when trying to
 register a class for Kryo serialization:

 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
   at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243)
   at
 org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254)
   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257)
   at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182)
   at org.apache.spark.executor.Executor.init(Executor.scala:87)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: org.apache.spark.SparkException: Failed to load class to
 register with Kryo
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:66)
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:61)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at
 scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
   at
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
   at
 org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:61)
   ... 28 more
 Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:190)
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:63)

 I have verified that when the executor process is launched, my jar is in
 the classpath of the command line of the executor.  I expect the class to
 be found by the default classloader being used at KryoSerializer.scala:63

 Any ideas?



Re: ClassNotFoundException for Kryo serialization

2015-05-01 Thread Akshat Aranya
Yes, this class is present in the jar that was loaded in the classpath
of the executor Java process -- it wasn't even lazily added as a part
of the task execution.  Schema$MyRow is a protobuf-generated class.

After doing some digging around, I think I might be hitting up against
SPARK-5470, the fix for which hasn't been merged into 1.2, as far as I
can tell.

On Fri, May 1, 2015 at 9:05 AM, Ted Yu yuzhih...@gmail.com wrote:
 bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow

 So the above class is in the jar which was in the classpath ?
 Can you tell us a bit more about Schema$MyRow ?

 On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 I'm getting a ClassNotFoundException at the executor when trying to
 register a class for Kryo serialization:

 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
   at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243)
   at
 org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254)
   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257)
   at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182)
   at org.apache.spark.executor.Executor.init(Executor.scala:87)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
   at
 scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: org.apache.spark.SparkException: Failed to load class to
 register with Kryo
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:66)
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:61)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at
 scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
   at
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
   at
 org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:61)
   ... 28 more
 Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:190)
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:63)

 I have verified that when the executor process is launched, my jar is in
 the classpath

Re: ClassNotFoundException for Kryo serialization

2015-05-01 Thread Akshat Aranya
I cherry-picked the fix for SPARK-5470 and the problem has gone away.

On Fri, May 1, 2015 at 9:15 AM, Akshat Aranya aara...@gmail.com wrote:
 Yes, this class is present in the jar that was loaded in the classpath
 of the executor Java process -- it wasn't even lazily added as a part
 of the task execution.  Schema$MyRow is a protobuf-generated class.

 After doing some digging around, I think I might be hitting up against
 SPARK-5470, the fix for which hasn't been merged into 1.2, as far as I
 can tell.

 On Fri, May 1, 2015 at 9:05 AM, Ted Yu yuzhih...@gmail.com wrote:
 bq. Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow

 So the above class is in the jar which was in the classpath ?
 Can you tell us a bit more about Schema$MyRow ?

 On Fri, May 1, 2015 at 8:05 AM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 I'm getting a ClassNotFoundException at the executor when trying to
 register a class for Kryo serialization:

 java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
   at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
   at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
   at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:243)
   at
 org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:254)
   at org.apache.spark.SparkEnv$.create(SparkEnv.scala:257)
   at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:182)
   at org.apache.spark.executor.Executor.init(Executor.scala:87)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receiveWithLogging$1.applyOrElse(CoarseGrainedExecutorBackend.scala:61)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
   at
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:53)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42)
   at
 scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
   at
 org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at
 org.apache.spark.executor.CoarseGrainedExecutorBackend.aroundReceive(CoarseGrainedExecutorBackend.scala:36)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
   at akka.dispatch.Mailbox.run(Mailbox.scala:220)
   at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
   at
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: org.apache.spark.SparkException: Failed to load class to
 register with Kryo
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:66)
   at
 org.apache.spark.serializer.KryoSerializer$$anonfun$2.apply(KryoSerializer.scala:61)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
   at
 scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
   at
 scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
   at
 scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
   at
 org.apache.spark.serializer.KryoSerializer.init(KryoSerializer.scala:61)
   ... 28 more
 Caused by: java.lang.ClassNotFoundException: com.example.Schema$MyRow
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:190

Re: Spark 1.3 UDF ClassNotFoundException

2015-04-03 Thread Markus Ganter
My apologizes. I was running this locally and the JAR I was building
using Intellij had some issues.
This was not related to UDFs. All works fine now.

On Thu, Apr 2, 2015 at 2:58 PM, Ted Yu yuzhih...@gmail.com wrote:
 Can you show more code in CreateMasterData ?

 How do you run your code ?

 Thanks

 On Thu, Apr 2, 2015 at 11:06 AM, ganterm gant...@gmail.com wrote:

 Hello,

 I started to use the dataframe API in Spark 1.3 with Scala.
 I am trying to implement a UDF and am following the sample here:

 https://spark.apache.org/docs/1.3.0/api/scala/index.html#org.apache.spark.sql.UserDefinedFunction
 meaning
 val predict = udf((score: Double) = if (score  0.5) true else false)
 df.select( predict(df(score)) )
 All compiles just fine but when I run it, I get a ClassNotFoundException
 (see more details below)
 I am sure that I load the data correctly and that I have a field called
 score with the correct data type.
 Do I need to do anything else like registering the function?

 Thanks!
 Markus

 Exception in thread main org.apache.spark.SparkException: Job aborted
 due
 to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure:
 Lost task 0.3 in stage 6.0 (TID 11, BillSmithPC):
 java.lang.ClassNotFoundException: test.CreateMasterData$$anonfun$1
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:270)
 at

 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
 ...




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-3-UDF-ClassNotFoundException-tp22361.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org



-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark 1.3 UDF ClassNotFoundException

2015-04-02 Thread ganterm
Hello,

I started to use the dataframe API in Spark 1.3 with Scala.  
I am trying to implement a UDF and am following the sample here: 
https://spark.apache.org/docs/1.3.0/api/scala/index.html#org.apache.spark.sql.UserDefinedFunction
meaning 
val predict = udf((score: Double) = if (score  0.5) true else false)
df.select( predict(df(score)) )
All compiles just fine but when I run it, I get a ClassNotFoundException
(see more details below)
I am sure that I load the data correctly and that I have a field called
score with the correct data type. 
Do I need to do anything else like registering the function?

Thanks!
Markus 

Exception in thread main org.apache.spark.SparkException: Job aborted due
to stage failure: Task 0 in stage 6.0 failed 4 times, most recent failure:
Lost task 0.3 in stage 6.0 (TID 11, BillSmithPC):
java.lang.ClassNotFoundException: test.CreateMasterData$$anonfun$1
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:65)
...




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-1-3-UDF-ClassNotFoundException-tp22361.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: ClassNotFoundException

2015-03-17 Thread Ralph Bergmann
Hi Kevin,


yes I can test it means I have to build Spark from git repository?


Ralph

Am 17.03.15 um 02:59 schrieb Kevin (Sangwoo) Kim:
 Hi Ralph,
 
 It seems like https://issues.apache.org/jira/browse/SPARK-6299 issue,
 which is I'm working on. 
 I submitted a PR for it, would you test it?
 
 Regards,
 Kevin


-- 

Ralph Bergmann


www  http://www.dasralph.de | http://www.the4thFloor.eu
mail ra...@dasralph.de
skypedasralph

facebook https://www.facebook.com/dasralph
google+  https://plus.google.com/+RalphBergmann
xing https://www.xing.com/profile/Ralph_Bergmann3
linkedin https://www.linkedin.com/in/ralphbergmann
gulp https://www.gulp.de/Profil/RalphBergmann.html
github   https://github.com/the4thfloor


pgp key id   0x421F9B78
pgp fingerprint  CEE3 7AE9 07BE 98DF CD5A E69C F131 4A8E 421F 9B78

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: ClassNotFoundException

2015-03-16 Thread Kevin (Sangwoo) Kim
Hi Ralph,

It seems like https://issues.apache.org/jira/browse/SPARK-6299 issue, which
is I'm working on.
I submitted a PR for it, would you test it?

Regards,
Kevin

On Tue, Mar 17, 2015 at 1:11 AM Ralph Bergmann ra...@dasralph.de wrote:

 Hi,


 I want to try the JavaSparkPi example[1] on a remote Spark server but I
 get a ClassNotFoundException.

 When I run it local it works but not remote.

 I added the spark-core lib as dependency. Do I need more?

 Any ideas?

 Thanks Ralph


 [1] ...
 https://github.com/apache/spark/blob/master/examples/
 src/main/java/org/apache/spark/examples/JavaSparkPi.java

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


ClassNotFoundException

2015-03-16 Thread Ralph Bergmann
Hi,


I want to try the JavaSparkPi example[1] on a remote Spark server but I
get a ClassNotFoundException.

When I run it local it works but not remote.

I added the spark-core lib as dependency. Do I need more?

Any ideas?

Thanks Ralph


[1] ...
https://github.com/apache/spark/blob/master/examples/src/main/java/org/apache/spark/examples/JavaSparkPi.java
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/03/16 17:02:45 INFO CoarseGrainedExecutorBackend: Registered signal handlers 
for [TERM, HUP, INT]
2015-03-16 17:02:45.624 java[5730:1133038] Unable to load realm info from 
SCDynamicStore
15/03/16 17:02:45 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
15/03/16 17:02:45 INFO SecurityManager: Changing view acls to: dasralph
15/03/16 17:02:45 INFO SecurityManager: Changing modify acls to: dasralph
15/03/16 17:02:45 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(dasralph); users 
with modify permissions: Set(dasralph)
15/03/16 17:02:46 INFO Slf4jLogger: Slf4jLogger started
15/03/16 17:02:46 INFO Remoting: Starting remoting
15/03/16 17:02:46 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://driverPropsFetcher@10.0.0.10:54973]
15/03/16 17:02:46 INFO Utils: Successfully started service 'driverPropsFetcher' 
on port 54973.
15/03/16 17:02:46 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down 
remote daemon.
15/03/16 17:02:46 INFO SecurityManager: Changing view acls to: dasralph
15/03/16 17:02:46 INFO SecurityManager: Changing modify acls to: dasralph
15/03/16 17:02:46 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon 
shut down; proceeding with flushing remote transports.
15/03/16 17:02:46 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(dasralph); users 
with modify permissions: Set(dasralph)
15/03/16 17:02:46 INFO Slf4jLogger: Slf4jLogger started
15/03/16 17:02:46 INFO Remoting: Starting remoting
15/03/16 17:02:46 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut 
down.
15/03/16 17:02:46 INFO Remoting: Remoting started; listening on addresses 
:[akka.tcp://sparkExecutor@10.0.0.10:54977]
15/03/16 17:02:46 INFO Utils: Successfully started service 'sparkExecutor' on 
port 54977.
15/03/16 17:02:46 INFO AkkaUtils: Connecting to MapOutputTracker: 
akka.tcp://sparkDriver@10.0.0.10:54945/user/MapOutputTracker
15/03/16 17:02:46 INFO AkkaUtils: Connecting to BlockManagerMaster: 
akka.tcp://sparkDriver@10.0.0.10:54945/user/BlockManagerMaster
15/03/16 17:02:46 INFO DiskBlockManager: Created local directory at 
/var/folders/5p/s1k2jrqx38ncxkm4wlflgfvwgn/T/spark-185c8652-0244-42ff-90b4-fd9c7dbde7b3/spark-8a9ba955-de6f-4ab1-8995-1474ac1ba3a9/spark-2533211f-c399-467f-851d-6b9ec89defdc/blockmgr-82e98f66-d592-42a7-a4fa-1ab78780814b
15/03/16 17:02:46 INFO MemoryStore: MemoryStore started with capacity 265.4 MB
15/03/16 17:02:46 INFO AkkaUtils: Connecting to OutputCommitCoordinator: 
akka.tcp://sparkDriver@10.0.0.10:54945/user/OutputCommitCoordinator
15/03/16 17:02:46 INFO CoarseGrainedExecutorBackend: Connecting to driver: 
akka.tcp://sparkDriver@10.0.0.10:54945/user/CoarseGrainedScheduler
15/03/16 17:02:46 INFO WorkerWatcher: Connecting to worker 
akka.tcp://sparkWorker@10.0.0.10:58715/user/Worker
15/03/16 17:02:46 INFO WorkerWatcher: Successfully connected to 
akka.tcp://sparkWorker@10.0.0.10:58715/user/Worker
15/03/16 17:02:46 INFO CoarseGrainedExecutorBackend: Successfully registered 
with driver
15/03/16 17:02:46 INFO Executor: Starting executor ID 0 on host 10.0.0.10
15/03/16 17:02:47 INFO NettyBlockTransferService: Server created on 54983
15/03/16 17:02:47 INFO BlockManagerMaster: Trying to register BlockManager
15/03/16 17:02:47 INFO BlockManagerMaster: Registered BlockManager
15/03/16 17:02:47 INFO AkkaUtils: Connecting to HeartbeatReceiver: 
akka.tcp://sparkDriver@10.0.0.10:54945/user/HeartbeatReceiver
15/03/16 17:02:47 INFO CoarseGrainedExecutorBackend: Got assigned task 0
15/03/16 17:02:47 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/03/16 17:02:47 INFO CoarseGrainedExecutorBackend: Got assigned task 1
15/03/16 17:02:47 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
15/03/16 17:02:47 INFO TorrentBroadcast: Started reading broadcast variable 0
15/03/16 17:02:47 INFO MemoryStore: ensureFreeSpace(1679) called with curMem=0, 
maxMem=278302556
15/03/16 17:02:47 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 1679.0 B, free 265.4 MB)
15/03/16 17:02:47 INFO BlockManagerMaster: Updated info of block 
broadcast_0_piece0
15/03/16 17:02:47 INFO TorrentBroadcast: Reading broadcast variable 0 took 188 
ms
15/03/16 17:02:47 INFO MemoryStore: ensureFreeSpace(2312) called with 
curMem=1679, maxMem=278302556
15/03/16 17:02:47 INFO MemoryStore: Block broadcast_0 stored as values

Spark 1.2.1: ClassNotFoundException when running hello world example in scala 2.11

2015-02-19 Thread Luis Solano
I'm having an issue with spark 1.2.1 and scala 2.11. I detailed the
symptoms in this stackoverflow question.

http://stackoverflow.com/questions/28612837/spark-classnotfoundexception-when-running-hello-world-example-in-scala-2-11

Has anyone experienced anything similar?

Thank you!


Re: Spark 1.2.1: ClassNotFoundException when running hello world example in scala 2.11

2015-02-19 Thread Akhil Das
Can you downgrade your scala dependency to 2.10 and give it a try?

Thanks
Best Regards

On Fri, Feb 20, 2015 at 12:40 AM, Luis Solano l...@pixable.com wrote:

 I'm having an issue with spark 1.2.1 and scala 2.11. I detailed the
 symptoms in this stackoverflow question.


 http://stackoverflow.com/questions/28612837/spark-classnotfoundexception-when-running-hello-world-example-in-scala-2-11

 Has anyone experienced anything similar?

 Thank you!



Re: Spark Master Build Failing to run on cluster in standalone ClassNotFoundException: javax.servlet.FilterRegistration

2015-02-03 Thread Sean Owen
Already come up several times today:
https://issues.apache.org/jira/browse/SPARK-5557

On Tue, Feb 3, 2015 at 8:04 AM, Night Wolf nightwolf...@gmail.com wrote:
 Hi,

 I just built Spark 1.3 master using maven via make-distribution.sh;

 ./make-distribution.sh --name mapr3 --skip-java-test --tgz -Pmapr3 -Phive
 -Phive-thriftserver -Phive-0.12.0

 When trying to start the standalone spark master on a cluster I get the
 following stack trace;


 15/02/04 08:53:56 INFO slf4j.Slf4jLogger: Slf4jLogger started
 15/02/04 08:53:56 INFO Remoting: Starting remoting
 15/02/04 08:53:56 INFO Remoting: Remoting started; listening on addresses
 :[akka.tcp://sparkMaster@hadoop-009:7077]
 15/02/04 08:53:56 INFO Remoting: Remoting now listens on addresses:
 [akka.tcp://sparkMaster@hadoop-009:7077]
 ...skipping...
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
 at
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
 at
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
 at akka.util.Reflect$.instantiate(Reflect.scala:66)
 at akka.actor.ArgsReflectConstructor.produce(Props.scala:352)
 at akka.actor.Props.newActor(Props.scala:252)
 at akka.actor.ActorCell.newActor(ActorCell.scala:552)
 at akka.actor.ActorCell.create(ActorCell.scala:578)
 ... 9 more
 Caused by: java.lang.NoClassDefFoundError: javax/servlet/FilterRegistration
 at
 org.spark-project.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:136)
 at
 org.spark-project.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:129)
 at
 org.spark-project.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:98)
 at
 org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:96)
 at
 org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:87)
 at org.apache.spark.ui.WebUI.attachPage(WebUI.scala:67)
 at
 org.apache.spark.deploy.master.ui.MasterWebUI.initialize(MasterWebUI.scala:40)
 at
 org.apache.spark.deploy.master.ui.MasterWebUI.init(MasterWebUI.scala:36)
 at org.apache.spark.deploy.master.Master.init(Master.scala:95)
 ... 18 more
 Caused by: java.lang.ClassNotFoundException:
 javax.servlet.FilterRegistration
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 ... 27 more

 The distro seems about the right size (260MB, so I dont imagine any of the
 libraries are missing. The above command worked on 1.2...

 Any ideas whats going wrong?

 Cheers,
 N

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark Master Build Failing to run on cluster in standalone ClassNotFoundException: javax.servlet.FilterRegistration

2015-02-03 Thread Night Wolf
Hi,

I just built Spark 1.3 master using maven via make-distribution.sh;

./make-distribution.sh --name mapr3 --skip-java-test --tgz -Pmapr3 -Phive
-Phive-thriftserver -Phive-0.12.0

When trying to start the standalone spark master on a cluster I get the
following stack trace;


15/02/04 08:53:56 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/02/04 08:53:56 INFO Remoting: Starting remoting
15/02/04 08:53:56 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkMaster@hadoop-009:7077]
15/02/04 08:53:56 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://sparkMaster@hadoop-009:7077]
...skipping...
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at akka.util.Reflect$.instantiate(Reflect.scala:66)
at akka.actor.ArgsReflectConstructor.produce(Props.scala:352)
at akka.actor.Props.newActor(Props.scala:252)
at akka.actor.ActorCell.newActor(ActorCell.scala:552)
at akka.actor.ActorCell.create(ActorCell.scala:578)
... 9 more
Caused by: java.lang.NoClassDefFoundError: javax/servlet/FilterRegistration
at
org.spark-project.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:136)
at
org.spark-project.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:129)
at
org.spark-project.jetty.servlet.ServletContextHandler.init(ServletContextHandler.java:98)
at
org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:96)
at
org.apache.spark.ui.JettyUtils$.createServletHandler(JettyUtils.scala:87)
at org.apache.spark.ui.WebUI.attachPage(WebUI.scala:67)
at
org.apache.spark.deploy.master.ui.MasterWebUI.initialize(MasterWebUI.scala:40)
at
org.apache.spark.deploy.master.ui.MasterWebUI.init(MasterWebUI.scala:36)
at org.apache.spark.deploy.master.Master.init(Master.scala:95)
... 18 more
Caused by: java.lang.ClassNotFoundException:
javax.servlet.FilterRegistration
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 27 more

The distro seems about the right size (260MB, so I dont imagine any of the
libraries are missing. The above command worked on 1.2...

Any ideas whats going wrong?

Cheers,
N


ClassNotFoundException when registering classes with Kryo

2015-02-01 Thread Arun Lists
Here is the relevant snippet of code in my main program:

===

sparkConf.set(spark.serializer,
  org.apache.spark.serializer.KryoSerializer)
sparkConf.set(spark.kryo.registrationRequired, true)
val summaryDataClass = classOf[SummaryData]
val summaryViewClass = classOf[SummaryView]
sparkConf.registerKryoClasses(Array(

  summaryDataClass, summaryViewClass))

===

I get the following error:

Exception in thread main java.lang.reflect.InvocationTargetException
...

Caused by: org.apache.spark.SparkException: Failed to load class to
register with Kryo
...

Caused by: java.lang.ClassNotFoundException:
com.dtex.analysis.transform.SummaryData


Note that the class in question SummaryData is in the same package as the
main program and hence in the same jar.

What do I need to do to make this work?

Thanks,
arun


Re: ClassNotFoundException when registering classes with Kryo

2015-02-01 Thread Arun Lists
Thanks for the notification!

For now, I'll use the Kryo serializer without registering classes until the
bug fix has been merged into the next version of Spark (I guess that will
be 1.3, right?).

arun


On Sun, Feb 1, 2015 at 10:58 PM, Shixiong Zhu zsxw...@gmail.com wrote:

 It's a bug that has been fixed in
 https://github.com/apache/spark/pull/4258 but not yet been merged.

 Best Regards,
 Shixiong Zhu

 2015-02-02 10:08 GMT+08:00 Arun Lists lists.a...@gmail.com:

 Here is the relevant snippet of code in my main program:

 ===

 sparkConf.set(spark.serializer,
   org.apache.spark.serializer.KryoSerializer)
 sparkConf.set(spark.kryo.registrationRequired, true)
 val summaryDataClass = classOf[SummaryData]
 val summaryViewClass = classOf[SummaryView]
 sparkConf.registerKryoClasses(Array(

   summaryDataClass, summaryViewClass))

 ===

 I get the following error:

 Exception in thread main java.lang.reflect.InvocationTargetException
 ...

 Caused by: org.apache.spark.SparkException: Failed to load class to
 register with Kryo
 ...

 Caused by: java.lang.ClassNotFoundException:
 com.dtex.analysis.transform.SummaryData


 Note that the class in question SummaryData is in the same package as the
 main program and hence in the same jar.

 What do I need to do to make this work?

 Thanks,
 arun






Re: ClassNotFoundException when registering classes with Kryo

2015-02-01 Thread Shixiong Zhu
It's a bug that has been fixed in https://github.com/apache/spark/pull/4258
but not yet been merged.

Best Regards,
Shixiong Zhu

2015-02-02 10:08 GMT+08:00 Arun Lists lists.a...@gmail.com:

 Here is the relevant snippet of code in my main program:

 ===

 sparkConf.set(spark.serializer,
   org.apache.spark.serializer.KryoSerializer)
 sparkConf.set(spark.kryo.registrationRequired, true)
 val summaryDataClass = classOf[SummaryData]
 val summaryViewClass = classOf[SummaryView]
 sparkConf.registerKryoClasses(Array(

   summaryDataClass, summaryViewClass))

 ===

 I get the following error:

 Exception in thread main java.lang.reflect.InvocationTargetException
 ...

 Caused by: org.apache.spark.SparkException: Failed to load class to
 register with Kryo
 ...

 Caused by: java.lang.ClassNotFoundException:
 com.dtex.analysis.transform.SummaryData


 Note that the class in question SummaryData is in the same package as the
 main program and hence in the same jar.

 What do I need to do to make this work?

 Thanks,
 arun





Spark SQL - Unable to use Hive UDF because of ClassNotFoundException

2015-01-30 Thread Capitão
)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at
scala.collection.immutable.$colon$colon.readObject(List.scala:362)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at
java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:57)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.hive.ql.exec.UDF
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)


First I investigated all the jar and classpath options with the spark-submit
command: no luck.
Then I tried to instantiate the UDF and UDFFromUnixTime classes using the
environment that is set up for the executor. To do that, after letting the
spark app fail, I went to one of the container's directory
/mnt/sda/yarn/nm/usercache/altaia/appcache/application_1422464005963_0047/container_1422464005963_0047_01_04/
(for example) and from there I can see these files:

container_tokens
GeneralTest20140929-1.0.0-SNAPSHOT.jar   === this is my uber-jar
launch_container.sh
__spark__.jar
tmp

Opening the launch_container.sh I checked all the classpath setup and to
test if that classpath was correct when launching the JVM I replaced the
org.apache.spark.executor.CoarseGrainedExecutorBackend class with a class
of mine whose job is to print the classpath and instantiate, by reflection,
the UDF and UDFFromUnixTime and all went well.

I already tested having all the dependencies' jars in one directory on all
hosts and adding that to the spark.executor.extraClassPath and
spark.driver.extraClassPath: no luck either.
At this stage I just think that the uber-jar and classpath are OK. I have no
more clues of what can be happening. Maybe some classloader issue with Spark
SQL?
The ClassNotFoundException occurs when returning data back to the driver
(because of the ResultTask seen in the stacktrace).

Does anyone had such a similar issue?

Regards

Re: Spark SQL - Unable to use Hive UDF because of ClassNotFoundException

2015-01-30 Thread Marcelo Vanzin
 the classpath setup and to
 test if that classpath was correct when launching the JVM I replaced the
 org.apache.spark.executor.CoarseGrainedExecutorBackend class with a class
 of mine whose job is to print the classpath and instantiate, by reflection,
 the UDF and UDFFromUnixTime and all went well.

 I already tested having all the dependencies' jars in one directory on all
 hosts and adding that to the spark.executor.extraClassPath and
 spark.driver.extraClassPath: no luck either.
 At this stage I just think that the uber-jar and classpath are OK. I have no
 more clues of what can be happening. Maybe some classloader issue with Spark
 SQL?
 The ClassNotFoundException occurs when returning data back to the driver
 (because of the ResultTask seen in the stacktrace).

 Does anyone had such a similar issue?

 Regards.






 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-Unable-to-use-Hive-UDF-because-of-ClassNotFoundException-tp21443.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




-- 
Marcelo

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



RE: ClassNotFoundException in standalone mode

2014-11-24 Thread Benoit Pasquereau
I finally managed to get the example working, here are the details that may 
help other users.

I have 2 windows nodes for the test system, PN01 and PN02. Both have the same 
shared drive S: (it is mapped to C:\source on PN02).

If I run the worker and master from S:\spark-1.1.0-bin-hadoop2.4, then running 
simple test fails on the ClassNotFoundException (even if there is only one node 
which hosts both the master and the worker).

If I run the workers and masters from the local drive 
(c:\source\spark-1.1.0-bin-hadoop2.4), then the simple test runs ok (with one 
or two nodes)

I haven’t found why the class fails to load with the shared drive (I checked 
the permissions and they look ok) but at least the cluster is working now.

If anyone has experience getting Spark with windows shared drive, any advice 
welcome !

Thanks,
Benoit.


PS: Yes thanks Angel, I did check that
s:\spark\simple%JAVA_HOME%\bin\jar tvf 
s:\spark\simple\target\scala-2.10\simple-project_2.10-1.0.jar
   299 Thu Nov 20 17:29:40 GMT 2014 META-INF/MANIFEST.MF
  1070 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$2.class
  1350 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$main$1.class
  2581 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$.class
  1070 Thu Nov 20 17:29:40 GMT 2014 SimpleApp$$anonfun$1.class
   710 Thu Nov 20 17:29:40 GMT 2014 SimpleApp.class


From: angel2014 [mailto:angel.alvarez.pas...@gmail.com]
Sent: Friday, November 21, 2014 3:16 AM
To: u...@spark.incubator.apache.org
Subject: Re: ClassNotFoundException in standalone mode

Can you make sure the class SimpleApp$$anonfun$1 is included in your app jar?

2014-11-20 18:19 GMT+01:00 Benoit Pasquereau [via Apache Spark User List] 
[hidden email]/user/SendEmail.jtp?type=nodenode=19443i=0:
Hi Guys,

I’m having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows Server 
2008).

A very simple program runs fine in local mode but fails in standalone mode.

Here is the error:

14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at SimpleApp.scala:22
Exception in thread main org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost 
task
0.3 in stage 0.0 (TID 6, 
UK-RND-PN02.actixhost.euhttp://UK-RND-PN02.actixhost.eu): 
java.lang.ClassNotFoundException: SimpleApp$$anonfun$1
java.net.URLClassLoader$1.run(URLClassLoader.java:202)

I have added the jar to the SparkConf() to be on the safe side and it appears 
in standard output (copied after the code):

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

import java.net.URLClassLoader

object SimpleApp {
  def main(args: Array[String]) {
val logFile = S:\\spark-1.1.0-bin-hadoop2.4\\README.md
val conf = new 
SparkConf()//.setJars(Seq(s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar))
 
.setMaster(spark://UK-RND-PN02.actixhost.eu:7077http://UK-RND-PN02.actixhost.eu:7077)
 //.setMaster(local[4])
 .setAppName(Simple Application)
val sc = new SparkContext(conf)

val cl = ClassLoader.getSystemClassLoader
val urls = cl.asInstanceOf[URLClassLoader].getURLs
urls.foreach(url = println(Executor classpath is: + url.getFile))

val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line = line.contains(a)).count()
val numBs = logData.filter(line = line.contains(b)).count()
println(Lines with a: %s, Lines with b: %s.format(numAs, numBs))
sc.stop()
  }
}

Simple-project is in the executor classpath list:
14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
Executor classpath is:/S:/spark/simple/
Executor classpath 
is:/S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar
Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/
Executor classpath 
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar
Executor classpath is:/S:/spark/simple/
Executor classpath 
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar
Executor classpath 
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar
Executor classpath 
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar
Executor classpath is:/S:/spark/simple/

Would you have any idea how I could investigate further ?

Thanks !
Benoit.


PS: I could attach a debugger to the Worker where the ClassNotFoundException 
happens but it is a bit painful
This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement, you may review at 
http://www.amdocs.com/email_disclaimer.asp

If you reply to this email, your message will be added to the discussion below:
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391.html
To start a new topic under Apache Spark User List

ClassNotFoundException in standalone mode

2014-11-20 Thread Benoit Pasquereau
Hi Guys,

I'm having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows Server 
2008).

A very simple program runs fine in local mode but fails in standalone mode.

Here is the error:

14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at SimpleApp.scala:22
Exception in thread main org.apache.spark.SparkException: Job aborted due to 
stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost 
task
0.3 in stage 0.0 (TID 6, UK-RND-PN02.actixhost.eu): 
java.lang.ClassNotFoundException: SimpleApp$$anonfun$1
java.net.URLClassLoader$1.run(URLClassLoader.java:202)

I have added the jar to the SparkConf() to be on the safe side and it appears 
in standard output (copied after the code):

/* SimpleApp.scala */
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

import java.net.URLClassLoader

object SimpleApp {
  def main(args: Array[String]) {
val logFile = S:\\spark-1.1.0-bin-hadoop2.4\\README.md
val conf = new 
SparkConf()//.setJars(Seq(s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar))
 .setMaster(spark://UK-RND-PN02.actixhost.eu:7077)
 //.setMaster(local[4])
 .setAppName(Simple Application)
val sc = new SparkContext(conf)

val cl = ClassLoader.getSystemClassLoader
val urls = cl.asInstanceOf[URLClassLoader].getURLs
urls.foreach(url = println(Executor classpath is: + url.getFile))

val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line = line.contains(a)).count()
val numBs = logData.filter(line = line.contains(b)).count()
println(Lines with a: %s, Lines with b: %s.format(numAs, numBs))
sc.stop()
  }
}

Simple-project is in the executor classpath list:
14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
Executor classpath is:/S:/spark/simple/
Executor classpath 
is:/S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar
Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/
Executor classpath 
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar
Executor classpath is:/S:/spark/simple/
Executor classpath 
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar
Executor classpath 
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar
Executor classpath 
is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar
Executor classpath is:/S:/spark/simple/

Would you have any idea how I could investigate further ?

Thanks !
Benoit.


PS: I could attach a debugger to the Worker where the ClassNotFoundException 
happens but it is a bit painful

This message and the information contained herein is proprietary and 
confidential and subject to the Amdocs policy statement,
you may review at http://www.amdocs.com/email_disclaimer.asp


Re: ClassNotFoundException in standalone mode

2014-11-20 Thread angel2014
Can you make sure the class SimpleApp$$anonfun$1 is included in your app
jar?

2014-11-20 18:19 GMT+01:00 Benoit Pasquereau [via Apache Spark User List] 
ml-node+s1001560n19391...@n3.nabble.com:

  Hi Guys,



 I’m having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows
 Server 2008).



 A very simple program runs fine in local mode but fails in standalone
 mode.



 Here is the error:



 14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at
 SimpleApp.scala:22

 Exception in thread main org.apache.spark.SparkException: Job aborted
 due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent
 failure: Lost task

 0.3 in stage 0.0 (TID 6, UK-RND-PN02.actixhost.eu):
 java.lang.ClassNotFoundException: SimpleApp$$anonfun$1

 java.net.URLClassLoader$1.run(URLClassLoader.java:202)



 I have added the jar to the SparkConf() to be on the safe side and it
 appears in standard output (copied after the code):



 /* SimpleApp.scala */

 import org.apache.spark.SparkContext

 import org.apache.spark.SparkContext._

 import org.apache.spark.SparkConf



 import java.net.URLClassLoader



 object SimpleApp {

   def main(args: Array[String]) {

 val logFile = S:\\spark-1.1.0-bin-hadoop2.4\\README.md

 val conf = new
 SparkConf()//.setJars(Seq(s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar))

  .setMaster(spark://UK-RND-PN02.actixhost.eu:7077)

  //.setMaster(local[4])

  .setAppName(Simple Application)

 val sc = new SparkContext(conf)



 val cl = ClassLoader.getSystemClassLoader

 val urls = cl.asInstanceOf[URLClassLoader].getURLs

 urls.foreach(url = println(Executor classpath is: + url.getFile))



 val logData = sc.textFile(logFile, 2).cache()

 val numAs = logData.filter(line = line.contains(a)).count()

 val numBs = logData.filter(line = line.contains(b)).count()

 println(Lines with a: %s, Lines with b: %s.format(numAs, numBs))

 sc.stop()

   }

 }



 Simple-project is in the executor classpath list:

 14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is
 ready for scheduling beginning after reached minRegisteredResourcesRatio:
 0.0

 Executor classpath is:/S:/spark/simple/

 Executor classpath is:
 */S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar*

 Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar

 Executor classpath is:/S:/spark/simple/

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar

 Executor classpath is:/S:/spark/simple/



 Would you have any idea how I could investigate further ?



 Thanks !

 Benoit.





 PS: I could attach a debugger to the Worker where the
 ClassNotFoundException happens but it is a bit painful
  This message and the information contained herein is proprietary and
 confidential and subject to the Amdocs policy statement, you may review at
 http://www.amdocs.com/email_disclaimer.asp

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=YW5nZWwuYWx2YXJlei5wYXNjdWFAZ21haWwuY29tfDF8ODAzOTc5ODky
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391p19443.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: ClassNotFoundException in standalone mode

2014-11-20 Thread Yanbo Liang
Looks like it can not found class or jar in your Driver machine.
Are you sure that the corresponding jar file exist in Driver machine rather
than your develop machine?

2014-11-21 11:16 GMT+08:00 angel2014 angel.alvarez.pas...@gmail.com:

 Can you make sure the class SimpleApp$$anonfun$1 is included in your
 app jar?

 2014-11-20 18:19 GMT+01:00 Benoit Pasquereau [via Apache Spark User List]
 [hidden email] http://user/SendEmail.jtp?type=nodenode=19443i=0:

  Hi Guys,



 I’m having an issue in standalone mode (Spark 1.1, Hadoop 2.4, Windows
 Server 2008).



 A very simple program runs fine in local mode but fails in standalone
 mode.



 Here is the error:



 14/11/20 17:01:53 INFO DAGScheduler: Failed to run count at
 SimpleApp.scala:22

 Exception in thread main org.apache.spark.SparkException: Job aborted
 due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent
 failure: Lost task

 0.3 in stage 0.0 (TID 6, UK-RND-PN02.actixhost.eu):
 java.lang.ClassNotFoundException: SimpleApp$$anonfun$1

 java.net.URLClassLoader$1.run(URLClassLoader.java:202)



 I have added the jar to the SparkConf() to be on the safe side and it
 appears in standard output (copied after the code):



 /* SimpleApp.scala */

 import org.apache.spark.SparkContext

 import org.apache.spark.SparkContext._

 import org.apache.spark.SparkConf



 import java.net.URLClassLoader



 object SimpleApp {

   def main(args: Array[String]) {

 val logFile = S:\\spark-1.1.0-bin-hadoop2.4\\README.md

 val conf = new
 SparkConf()//.setJars(Seq(s:\\spark\\simple\\target\\scala-2.10\\simple-project_2.10-1.0.jar))

  .setMaster(spark://UK-RND-PN02.actixhost.eu:7077)

  //.setMaster(local[4])

  .setAppName(Simple Application)

 val sc = new SparkContext(conf)



 val cl = ClassLoader.getSystemClassLoader

 val urls = cl.asInstanceOf[URLClassLoader].getURLs

 urls.foreach(url = println(Executor classpath is: + url.getFile))



 val logData = sc.textFile(logFile, 2).cache()

 val numAs = logData.filter(line = line.contains(a)).count()

 val numBs = logData.filter(line = line.contains(b)).count()

 println(Lines with a: %s, Lines with b: %s.format(numAs, numBs))

 sc.stop()

   }

 }



 Simple-project is in the executor classpath list:

 14/11/20 17:01:48 INFO SparkDeploySchedulerBackend: SchedulerBackend is
 ready for scheduling beginning after reached minRegisteredResourcesRatio:
 0.0

 Executor classpath is:/S:/spark/simple/

 Executor classpath is:
 */S:/spark/simple/target/scala-2.10/simple-project_2.10-1.0.jar*

 Executor classpath is:/S:/spark-1.1.0-bin-hadoop2.4/conf/

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/spark-assembly-1.1.0-hadoop2.4.0.jar

 Executor classpath is:/S:/spark/simple/

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.1.jar

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-core-3.2.2.jar

 Executor classpath
 is:/S:/spark-1.1.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.1.jar

 Executor classpath is:/S:/spark/simple/



 Would you have any idea how I could investigate further ?



 Thanks !

 Benoit.





 PS: I could attach a debugger to the Worker where the
 ClassNotFoundException happens but it is a bit painful
  This message and the information contained herein is proprietary and
 confidential and subject to the Amdocs policy statement, you may review at
 http://www.amdocs.com/email_disclaimer.asp

 --
  If you reply to this email, your message will be added to the
 discussion below:

 http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391.html
  To start a new topic under Apache Spark User List, email [hidden email]
 http://user/SendEmail.jtp?type=nodenode=19443i=1
 To unsubscribe from Apache Spark User List, click here.
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml



 --
 View this message in context: Re: ClassNotFoundException in standalone
 mode
 http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-in-standalone-mode-tp19391p19443.html
 Sent from the Apache Spark User List mailing list archive
 http://apache-spark-user-list.1001560.n3.nabble.com/ at Nabble.com.



Spark 1.1.0 ClassNotFoundException issue when submit with multi jars using CLUSTER MODE

2014-10-26 Thread xing_bing
HI

 

   I am using Spark 1.1.0  config with STANDALONE clusterManager and
CLUSTER deployMode. The logic is I want to submit multi jars with
spark-submit , using the �C-jars optional, I got an ClassNotFoundException
,  by the way in my code I also use thread  context class loader to load
custom class . 

 

Strange things is that when I use CLIENT deployMode. the exception is not
throws. 

 

Can anyone explain the class loader logic of spark or the issue when using
cluster mode ?   

 

 

 

 

/10/24 14:18:30 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)

java.lang.RuntimeException: Cannot load class:
cn.cekasp.al.demo.SimpleInputFormat3

at
cn.cekasp.algorithm.util.ReflectUtil.findClass(ReflectUtil.java:12)

at
cn.cekasp.algorithm.util.ReflectUtil.newInstance(ReflectUtil.java:18)

at
cn.cekasp.algorithm.reader.JdbcSourceReader$1.call(JdbcSourceReader.java:95)

at
cn.cekasp.algorithm.reader.JdbcSourceReader$1.call(JdbcSourceReader.java:90)

at
org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaP
airRDD.scala:923)

at
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

at
org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1167)

at
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)

at
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)

at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:112
1)

at
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:112
1)

at
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)

at org.apache.spark.scheduler.Task.run(Task.scala:54)

at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)

at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:11
45)

at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:6
15)

at java.lang.Thread.run(Thread.java:745)

   Caused by: java.lang.ClassNotFoundException:
cn.cekasp.al.demo.SimpleInputFormat3
at
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native
Method)
at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at
cn.cekasp.algorithm.util.ReflectUtil.findClass(ReflectUtil.java:10)
... 16 more

 

 

 



Spark-submit ClassNotFoundException with JAR!

2014-09-08 Thread Peter Aberline
Hi,

I'm having problems with a ClassNotFoundException using this simple example:


import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

import java.net.URLClassLoader

import scala.util.Marshal

class ClassToRoundTrip(val id: Int) extends scala.Serializable {
}

object RoundTripTester {

  def test(id : Int) : ClassToRoundTrip = {

// Get the current classpath and output. Can we see simpleapp jar?
val cl = ClassLoader.getSystemClassLoader
val urls = cl.asInstanceOf[URLClassLoader].getURLs
urls.foreach(url = println(Executor classpath is: + url.getFile))

// Simply instantiating an instance of object and using it works fine.
val testObj = new ClassToRoundTrip(id)
println(testObj.id:  + testObj.id)

val testObjBytes = Marshal.dump(testObj)
val testObjRoundTrip =
Marshal.load[ClassToRoundTrip](testObjBytes)  // --
ClassNotFoundException here
testObjRoundTrip
  }
}

object SimpleApp {
  def main(args: Array[String]) {

val conf = new SparkConf().setAppName(Simple Application)
val sc = new SparkContext(conf)

val cl = ClassLoader.getSystemClassLoader
val urls = cl.asInstanceOf[URLClassLoader].getURLs
urls.foreach(url = println(Driver classpath is:  + url.getFile))

val data = Array(1, 2, 3, 4, 5)
val distData = sc.parallelize(data)
distData.foreach(x= RoundTripTester.test(x))
  }
}

In local mode, submitting as per the docs generates a ClassNotFound
exception on line 31, where the ClassToRoundTrip object is
deserialized. Strangely, the earlier use on line 28 is okay:
spark-submit --class SimpleApp \
 --master local[4] \
 target/scala-2.10/simpleapp_2.10-1.0.jar


However, if I add extra parameters for driver-class-path, and
-jars, it works fine, on local.
spark-submit --class SimpleApp \
 --master local[4] \
 --driver-class-path
/home/xxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar
\
 --jars
/home/xxx/workspace/SimpleApp/target/scala-2.10/SimpleApp.jar \
 target/scala-2.10/simpleapp_2.10-1.0.jar

However, submitting to a local dev master, still generates the same issue:
spark-submit --class SimpleApp \
 --master spark://localhost.localdomain:7077 \
 --driver-class-path
/home/xxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar
\
 --jars
/home/xxx/workspace/SimpleApp/target/scala-2.10/simpleapp_2.10-1.0.jar
\
 target/scala-2.10/simpleapp_2.10-1.0.jar

I can see from the output that the JAR file is being fetched by the executor.

Logs for one of the executor's are here:

stdout: http://pastebin.com/raw.php?i=DQvvGhKm

stderr: http://pastebin.com/raw.php?i=MPZZVa0Q

I'm using Spark 1.0.2. The ClassToRoundTrip is included in the JAR.
I have a work around of copying the JAR to each of the machines and
setting the spark.executor.extraClassPath parameter but I would
rather not have to do that.

This is such a simple case, I must be doing something obviously wrong.
Can anyone help?


Thanks
Peter

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell

2014-07-18 Thread Svend
Hi, 

Yes, the error still occurs when we replace the lambdas with named
functions: 



(same error traces as in previous posts)




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-line11-read-when-loading-an-HDFS-text-file-with-SparkQL-in-spark-shell-tp9954p10154.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell

2014-07-16 Thread Svend
Hi all, 


I just installed a mesos 0.19 cluster. I am failing to execute basic SparkQL
operations on text files with Spark 1.0.1 with the spark-shell.  


I have one Mesos master without zookeeper and 4 mesos slaves. 
All nodes are running JDK 1.7.51 and Scala 2.10.4. 
The spark package is uploaded to hdfs and the user running the mesos slave
has permission to access to it. 
I am runnning HDFS from the latest CDH5. 
I tried both with the pre-built CDH5 spark package available from
http://spark.apache.org/downloads.html and by packaging spark with sbt
0.13.2, JDK 1.7.51 and scala 2.10.4 as explained here
http://mesosphere.io/learn/run-spark-on-mesos/


No matter what I try, when I execute the following code on the spark-shell : 



The job fails with the following error reported by the mesos slave nodes: 






Note that runnning a simple map+reduce job on the same hdfs files with the
same installation works fine:




The hdfs files contain just plain csv files: 




spark-env.sh look like this: 






Any help, comment or pointer would be greatly appreciated!

Thanks in advance


Svend







--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-line11-read-when-loading-an-HDFS-text-file-with-SparkQL-in-spark-shell-tp9954.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell

2014-07-16 Thread Michael Armbrust
 Note that runnning a simple map+reduce job on the same hdfs files with the
 same installation works fine:


Did you call collect() on the totalLength?  Otherwise nothing has actually
executed.


Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell

2014-07-16 Thread Michael Armbrust
Oh, I'm sorry... reduce is also an operation


On Wed, Jul 16, 2014 at 3:37 PM, Michael Armbrust mich...@databricks.com
wrote:


 Note that runnning a simple map+reduce job on the same hdfs files with the
 same installation works fine:


 Did you call collect() on the totalLength?  Otherwise nothing has
 actually executed.



Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell

2014-07-16 Thread Svend
Hi Michael, 

Thanks for your reply. Yes, the reduce triggered the actual execution, I got
a total length (totalLength: 95068762, for the record). 





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-line11-read-when-loading-an-HDFS-text-file-with-SparkQL-in-spark-shell-tp9954p9984.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: ClassNotFoundException: $line11.$read$ when loading an HDFS text file with SparkQL in spark-shell

2014-07-16 Thread Michael Armbrust
H, it could be some weirdness with classloaders / Mesos / spark sql?

I'm curious if you would hit an error if there were no lambda functions
involved.  Perhaps if you load the data using jsonFile or parquetFile.

Either way, I'd file a JIRA.  Thanks!
On Jul 16, 2014 6:48 PM, Svend svend.vanderve...@gmail.com wrote:

 Hi Michael,

 Thanks for your reply. Yes, the reduce triggered the actual execution, I
 got
 a total length (totalLength: 95068762, for the record).





 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-line11-read-when-loading-an-HDFS-text-file-with-SparkQL-in-spark-shell-tp9954p9984.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



Failing to run standalone streaming app: IOException; classNotFoundException; and more

2014-06-14 Thread pns
Hi,I'm attempting to run the following simple standalone app on mac os and
spark 1.0 using sbt:val sparkConf = new
SparkConf().setAppName(ProcessEvents).setMaster(local[*]).setSparkHome(/Users/me/Downloads/spark)val
ssc = new StreamingContext(sparkConf, Seconds(10))val lines =
ssc.textFileStream(/Users/me/Downloads/test/)lines.foreachRDD(rdd =
rdd.foreach(println(_)))ssc.start()ssc.awaitTermination()However, when
running it with sbt run, I get quite a few errors:23:27:42.182 [run-main]
DEBUG org.apache.hadoop.conf.Configuration - java.io.IOException: config()
at org.apache.hadoop.conf.Configuration.(Configuration.java:227)at
org.apache.hadoop.conf.Configuration.(Configuration.java:214)org.apache.spark.SparkException:
Job aborted due to stage failure: Task 0.0:0 failed 1 times, most recent
failure: Exception failure in TID 0 on host localhost:
java.lang.ClassNotFoundException: scala.None$   
java.net.URLClassLoader$1.run(URLClassLoader.java:366)   
java.net.URLClassLoader$1.run(URLClassLoader.java:355)Any ideas? Let me know
what other info you need to figure this out.Thanks!



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Failing-to-run-standalone-streaming-app-IOException-classNotFoundException-and-more-tp7632.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Kafka streaming - ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver

2014-06-11 Thread gaurav.dasgupta
)
  at
 
 org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:72)

  at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

  at java.lang.reflect.Method.invoke(Method.java:597)
  at
  java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
  at
 java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
  at
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
  at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
  at
  org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:145)
  at
  java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1791)
  at
 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1750)
  at
 java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
  at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
  at
 
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)

  at
 
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)

  at
 
 org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)

  at
 
 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)

  at
 
 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)

  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:396)
  at
 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)

  at
 
 org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
  at
 org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

  at java.lang.Thread.run(Thread.java:662)
 
  What might be the problem? Can someone help me solving this issue?
 
  Regards,
  Gaurav


 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Kafka-streaming-ClassNotFoundException-org-apache-spark-streaming-kafka-KafkaReceiver-tp7045p7216.html
  To start a new topic under Apache Spark User List, email
 ml-node+s1001560n1...@n3.nabble.com
 To unsubscribe from Apache Spark User List, click here
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=1code=Z2F1cmF2LmRnMTlAZ21haWwuY29tfDF8LTk5NzA0ODAy
 .
 NAML
 http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Kafka-streaming-ClassNotFoundException-org-apache-spark-streaming-kafka-KafkaReceiver-tp7045p7387.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Spark Kafka streaming - ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver

2014-06-08 Thread Tobias Pfeiffer
Gaurav,

I am not sure that the * expands to what you expect it to do.
Normally the bash expands * to a space-separated string, not
colon-separated. Try specifying all the jars manually, maybe?

Tobias

On Thu, Jun 5, 2014 at 6:45 PM, Gaurav Dasgupta gaurav.d...@gmail.com wrote:
 Hi,

 I have written my own custom Spark streaming code which connects to Kafka
 server and fetch data. I have tested the code on local mode and it is
 working fine. But when I am executing the same code on YARN mode, I am
 getting KafkaReceiver class not found exception. I am providing the Spark
 Kafka jar in the classpath and ensured that the path is correct for all the
 nodes in my cluster.

 I am using Spark 0.9.1 hadoop pre-built and is deployed on all the nodes (10
 node cluster) in the YARN cluster.
 I am using the following command to run my code on YARN mode:

 SPARK_YARN_MODE=true
 SPARK_JAR=assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar
 SPARK_YARN_APP_JAR=/usr/local/SparkStreamExample.jar java -cp
 /usr/local/SparkStreamExample.jar:assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar:external/kafka/target/spark-streaming-kafka_2.10-0.9.1.jar:/usr/local/kafka/kafka_2.10-0.8.1.1/libs/*:/usr/lib/hbase/lib/*:/etc/hadoop/conf/:/etc/hbase/conf/
 SparkStreamExample yarn-client 10.10.5.32 myFirstGroup testTopic
 NewTestTable 1

 Below is the error message I am getting:

 14/06/05 04:29:12 INFO cluster.YarnClientClusterScheduler: Adding task set
 2.0 with 1 tasks
 14/06/05 04:29:12 INFO scheduler.TaskSetManager: Starting task 2.0:0 as TID
 70 on executor 2: manny6.musigma.com (PROCESS_LOCAL)
 14/06/05 04:29:12 INFO scheduler.TaskSetManager: Serialized task 2.0:0 as
 2971 bytes in 2 ms
 14/06/05 04:29:12 WARN scheduler.TaskSetManager: Lost TID 70 (task 2.0:0)
 14/06/05 04:29:12 WARN scheduler.TaskSetManager: Loss was due to
 java.lang.ClassNotFoundException
 java.lang.ClassNotFoundException:
 org.apache.spark.streaming.kafka.KafkaReceiver
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:247)
 at
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
 at
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1574)
 at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1495)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1731)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 at java.io.ObjectInputStream.readArray(ObjectInputStream.java:1666)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1322)
 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 at
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
 at
 java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:479)
 at
 org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:72)
 at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
 at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
 at
 org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:145)
 at
 java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1791)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1750)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)
 at
 org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
 at
 org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
 at
 org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)
 at
 org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
 at
 

Spark Kafka streaming - ClassNotFoundException: org.apache.spark.streaming.kafka.KafkaReceiver

2014-06-05 Thread Gaurav Dasgupta
Hi,

I have written my own custom Spark streaming code which connects to Kafka
server and fetch data. I have tested the code on local mode and it is
working fine. But when I am executing the same code on YARN mode, I am
getting KafkaReceiver class not found exception. I am providing the Spark
Kafka jar in the classpath and ensured that the path is correct for all the
nodes in my cluster.

I am using Spark 0.9.1 hadoop pre-built and is deployed on all the nodes
(10 node cluster) in the YARN cluster.
I am using the following command to run my code on YARN mode:

*SPARK_YARN_MODE=true
SPARK_JAR=assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar
SPARK_YARN_APP_JAR=/usr/local/SparkStreamExample.jar java -cp
/usr/local/SparkStreamExample.jar:assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop2.2.0.jar:external/kafka/target/spark-streaming-kafka_2.10-0.9.1.jar:/usr/local/kafka/kafka_2.10-0.8.1.1/libs/*:/usr/lib/hbase/lib/*:/etc/hadoop/conf/:/etc/hbase/conf/
SparkStreamExample yarn-client 10.10.5.32 myFirstGroup testTopic
NewTestTable 1*

Below is the error message I am getting:





















































*14/06/05 04:29:12 INFO cluster.YarnClientClusterScheduler: Adding task set
2.0 with 1 tasks14/06/05 04:29:12 INFO scheduler.TaskSetManager: Starting
task 2.0:0 as TID 70 on executor 2: manny6.musigma.com
http://manny6.musigma.com (PROCESS_LOCAL)14/06/05 04:29:12 INFO
scheduler.TaskSetManager: Serialized task 2.0:0 as 2971 bytes in 2
ms14/06/05 04:29:12 WARN scheduler.TaskSetManager: Lost TID 70 (task
2.0:0)14/06/05 04:29:12 WARN scheduler.TaskSetManager: Loss was due to
java.lang.ClassNotFoundExceptionjava.lang.ClassNotFoundException:
org.apache.spark.streaming.kafka.KafkaReceiverat
java.net.URLClassLoader$1.run(URLClassLoader.java:202)at
java.security.AccessController.doPrivileged(Native Method)at
java.net.URLClassLoader.findClass(URLClassLoader.java:190)at
java.lang.ClassLoader.loadClass(ClassLoader.java:306)at
java.lang.ClassLoader.loadClass(ClassLoader.java:247)at
java.lang.Class.forName0(Native Method)at
java.lang.Class.forName(Class.java:247)at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1574)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1495)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1731)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)at
java.io.ObjectInputStream.readArray(ObjectInputStream.java:1666)at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1322)at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1870)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)at
java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1946)
at
java.io.ObjectInputStream.defaultReadObject(ObjectInputStream.java:479)
at
org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:72)
at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)at
java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1848)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1752)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)at
org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:145)
at
java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1791)
at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1750)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1328)at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:350)at
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
at
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
at
org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:42)
at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:41)
at java.security.AccessController.doPrivileged(Native Method)at
javax.security.auth.Subject.doAs(Subject.java:396)at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
at
org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:41)
at 

ClassNotFoundException with Spark/Mesos (spark-shell works fine)

2014-05-21 Thread Tobias Pfeiffer
Hi,

I have set up a cluster with Mesos (backed by Zookeeper) with three
master and three slave instances. I set up Spark (git HEAD) for use
with Mesos according to this manual:
http://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html

Using the spark-shell, I can connect to this cluster and do simple RDD
operations, but the same code in a Scala class and executed via sbt
run-main works only partially. (That is, count() works, count() after
flatMap() does not.)

Here is my code: https://gist.github.com/tgpfeiffer/7d20a4d59ee6e0088f91
The file SparkExamplesScript.scala, when pasted into spark-shell,
outputs the correct count() for the parallelized list comprehension,
as well as for the flatMapped RDD.

The file SparkExamplesMinimal.scala contains exactly the same code,
and also the MASTER configuration and the Spark Executor are the same.
However, while the count() for the parallelized list is displayed
correctly, I receive the following error when asking for the count()
of the flatMapped RDD:

-

14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting Stage 1
(FlatMappedRDD[1] at flatMap at SparkExamplesMinimal.scala:34), which
has no missing parents
14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting 8 missing
tasks from Stage 1 (FlatMappedRDD[1] at flatMap at
SparkExamplesMinimal.scala:34)
14/05/21 09:47:49 INFO scheduler.TaskSchedulerImpl: Adding task set
1.0 with 8 tasks
14/05/21 09:47:49 INFO scheduler.TaskSetManager: Starting task 1.0:0
as TID 8 on executor 20140520-102159-2154735808-5050-1108-1: mesos9-1
(PROCESS_LOCAL)
14/05/21 09:47:49 INFO scheduler.TaskSetManager: Serialized task 1.0:0
as 1779147 bytes in 37 ms
14/05/21 09:47:49 WARN scheduler.TaskSetManager: Lost TID 8 (task 1.0:0)
14/05/21 09:47:49 WARN scheduler.TaskSetManager: Loss was due to
java.lang.ClassNotFoundException
java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at 
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:61)
at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:141)
at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1837)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at 
org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:63)
at 
org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:85)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:169)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

-

Can anyone explain to me where this comes from or how I might further
track the problem down?

Thanks,
Tobias


Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)

2014-05-21 Thread Gerard Maas
Hi Tobias,

Regarding my comment on closure serialization:

I was discussing it with my fellow Sparkers here and I totally overlooked
the fact that you need the class files to de-serialize the closures (or
whatever) on the workers, so you always need the jar file delivered to the
workers in order for it to work.

The SparkREPL  works differently. It uses some dark magic to send the
working session to the workers.

-kr, Gerard.





On Wed, May 21, 2014 at 2:47 PM, Gerard Maas gerard.m...@gmail.com wrote:

 Hi Tobias,

 I was curious about this issue and tried to run your example on my local
 Mesos. I was able to reproduce your issue using your current config:

 [error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task
 1.0:4 failed 4 times (most recent failure: Exception failure:
 java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2)
 org.apache.spark.SparkException: Job aborted: Task 1.0:4 failed 4 times
 (most recent failure: Exception failure: java.lang.ClassNotFoundException:
 spark.SparkExamplesMinimal$$anonfun$2)
  at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)

 Creating a simple jar from the job and providing it through the
 configuration seems to solve it:

 val conf = new SparkConf()
   .setMaster(mesos://my_ip:5050/)
 *
 .setJars(Seq(/sparkexample/target/scala-2.10/sparkexample_2.10-0.1.jar))*
   .setAppName(SparkExamplesMinimal)

 Resulting in:
  14/05/21 12:03:45 INFO scheduler.DAGScheduler: Completed ResultTask(1, 1)
 14/05/21 12:03:45 INFO scheduler.DAGScheduler: Stage 1 (count at
 SparkExamplesMinimal.scala:50) finished in 1.120 s
 14/05/21 12:03:45 INFO spark.SparkContext: Job finished: count at
 SparkExamplesMinimal.scala:50, took 1.177091435 s
 count: 100

 Why the closure serialization does not work with Mesos is beyond my
 current knowledge.
 Would be great to hear from the experts (cross-posting to dev for that)

 -kr, Gerard.













 On Wed, May 21, 2014 at 11:51 AM, Tobias Pfeiffer t...@preferred.jpwrote:

 Hi,

 I have set up a cluster with Mesos (backed by Zookeeper) with three
 master and three slave instances. I set up Spark (git HEAD) for use
 with Mesos according to this manual:
 http://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html

 Using the spark-shell, I can connect to this cluster and do simple RDD
 operations, but the same code in a Scala class and executed via sbt
 run-main works only partially. (That is, count() works, count() after
 flatMap() does not.)

 Here is my code: https://gist.github.com/tgpfeiffer/7d20a4d59ee6e0088f91
 The file SparkExamplesScript.scala, when pasted into spark-shell,
 outputs the correct count() for the parallelized list comprehension,
 as well as for the flatMapped RDD.

 The file SparkExamplesMinimal.scala contains exactly the same code,
 and also the MASTER configuration and the Spark Executor are the same.
 However, while the count() for the parallelized list is displayed
 correctly, I receive the following error when asking for the count()
 of the flatMapped RDD:

 -

 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting Stage 1
 (FlatMappedRDD[1] at flatMap at SparkExamplesMinimal.scala:34), which
 has no missing parents
 14/05/21 09:47:49 INFO scheduler.DAGScheduler: Submitting 8 missing
 tasks from Stage 1 (FlatMappedRDD[1] at flatMap at
 SparkExamplesMinimal.scala:34)
 14/05/21 09:47:49 INFO scheduler.TaskSchedulerImpl: Adding task set
 1.0 with 8 tasks
 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Starting task 1.0:0
 as TID 8 on executor 20140520-102159-2154735808-5050-1108-1: mesos9-1
 (PROCESS_LOCAL)
 14/05/21 09:47:49 INFO scheduler.TaskSetManager: Serialized task 1.0:0
 as 1779147 bytes in 37 ms
 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Lost TID 8 (task 1.0:0)
 14/05/21 09:47:49 WARN scheduler.TaskSetManager: Loss was due to
 java.lang.ClassNotFoundException
 java.lang.ClassNotFoundException: spark.SparkExamplesMinimal$$anonfun$2
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:270)
 at
 org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:60)
 at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
 at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
 at
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
 at
 

Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)

2014-05-21 Thread Gerard Maas
Hi Tobias,

On Wed, May 21, 2014 at 5:45 PM, Tobias Pfeiffer t...@preferred.jp wrote:
first, thanks for your explanations regarding the jar files!
No prob :-)


 On Thu, May 22, 2014 at 12:32 AM, Gerard Maas gerard.m...@gmail.com
 wrote:
  I was discussing it with my fellow Sparkers here and I totally overlooked
  the fact that you need the class files to de-serialize the closures (or
  whatever) on the workers, so you always need the jar file delivered to
 the
  workers in order for it to work.

 So the closure as a function is serialized, sent across the wire,
 deserialized there, and *still* you need the class files? (I am not
 sure I understand what is actually sent over the network then. Does
 that serialization only contain the values that I close over?)


I also had that mental lapse. Serialization refers to converting object
(not class) state (current values)  into a byte stream and de-serialization
restores the bytes from the wire into an seemingly identical object at the
receiving side (except for transient variables), for that, it requires the
class definition of that object to know what it needs to instantiate, so
yes, the compiled classes need to be given to the Spark driver and it will
take care of dispatching them to the workers (much better than in the old
RMI days ;-)


 If I understand correctly what you are saying, then the documentation
 at https://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html
 (list item 8) needs to be extended quite a bit, right?


The mesos docs have been recently updated here:
https://github.com/apache/spark/pull/756/files
Don't know where the latest version from master is built/available.

-kr, Gerard.


Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)

2014-05-21 Thread Andrew Ash
Here's the 1.0.0rc9 version of the docs:
https://people.apache.org/~pwendell/spark-1.0.0-rc9-docs/running-on-mesos.html
I refreshed them with the goal of steering users more towards prebuilt
packages than relying on compiling from source plus improving overall
formatting and clarity, but not otherwise modifying the content. I don't
expect any changes for rc10.

It does seem like an issue though that classpath issues are preventing that
from running.  Just to check, have you given the exact some jar a shot when
running against a standalone cluster?  If it works in standalone, I think
that's good evidence that there's an issue with the Mesos classloaders in
master.

I'm running into a similar issue with classpaths failing on Mesos but
working in standalone, but I haven't coherently written up my observations
yet so haven't gotten that to this list.

I'd almost gotten to the point where I thought that my custom code needed
to be included in the SPARK_EXECUTOR_URI but that can't possibly be
correct.  The Spark workers that are launched on Mesos slaves should start
with the Spark core jars and then transparently get classes from custom
code over the network, or at least that's who I thought it should work.
 For those who have been using Mesos in previous releases, you've never had
to do that before have you?




On Wed, May 21, 2014 at 3:30 PM, Gerard Maas gerard.m...@gmail.com wrote:

 Hi Tobias,

 On Wed, May 21, 2014 at 5:45 PM, Tobias Pfeiffer t...@preferred.jp wrote:
 first, thanks for your explanations regarding the jar files!
 No prob :-)


 On Thu, May 22, 2014 at 12:32 AM, Gerard Maas gerard.m...@gmail.com
 wrote:
  I was discussing it with my fellow Sparkers here and I totally
 overlooked
  the fact that you need the class files to de-serialize the closures (or
  whatever) on the workers, so you always need the jar file delivered to
 the
  workers in order for it to work.

 So the closure as a function is serialized, sent across the wire,
 deserialized there, and *still* you need the class files? (I am not
 sure I understand what is actually sent over the network then. Does
 that serialization only contain the values that I close over?)


 I also had that mental lapse. Serialization refers to converting object
 (not class) state (current values)  into a byte stream and de-serialization
 restores the bytes from the wire into an seemingly identical object at the
 receiving side (except for transient variables), for that, it requires the
 class definition of that object to know what it needs to instantiate, so
 yes, the compiled classes need to be given to the Spark driver and it will
 take care of dispatching them to the workers (much better than in the old
 RMI days ;-)


 If I understand correctly what you are saying, then the documentation
 at
 https://people.apache.org/~pwendell/catalyst-docs/running-on-mesos.html
 (list item 8) needs to be extended quite a bit, right?


 The mesos docs have been recently updated here:
 https://github.com/apache/spark/pull/756/files
 Don't know where the latest version from master is built/available.

 -kr, Gerard.



Re: ClassNotFoundException with Spark/Mesos (spark-shell works fine)

2014-05-21 Thread Gerard Maas
Hi Andrew,

Thanks for the current doc.

I'd almost gotten to the point where I thought that my custom code needed
 to be included in the SPARK_EXECUTOR_URI but that can't possibly be
 correct.  The Spark workers that are launched on Mesos slaves should start
 with the Spark core jars and then transparently get classes from custom
 code over the network, or at least that's who I thought it should work.
  For those who have been using Mesos in previous releases, you've never had
 to do that before have you?


Regarding the delivery of the custom job code to Mesos, we have been using
'ADD_JARS' (in the command line) or 'SparkConfig.setJars(Seq[String]) with
a fat jar packing all dependencies.
That works as well on the Spark 'standalone' cluster, but we deploy mostly
on Mesos, so I couldn't say about classloading difference between the two.

-greetz, Gerard.


Re: ClassNotFoundException

2014-05-04 Thread pedro
I just ran into the same problem. I will respond if I find how to fix.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-tp5182p5342.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: spark 0.9.1: ClassNotFoundException

2014-05-04 Thread phoenix bai
check if the jar file that includes your example code is under
examples/target/scala-2.10/.



On Sat, May 3, 2014 at 5:58 AM, SK skrishna...@gmail.com wrote:

 I am using Spark 0.9.1 in standalone mode. In the
 SPARK_HOME/examples/src/main/scala/org/apache/spark/ folder, I created my
 directory called mycode in which I have placed some standalone scala
 code.
 I was able to compile. I ran the code using:

 ./bin/run-example org.apache.spark.mycode.MyClass local

 However, I get a ClassNotFound exception, although I do see the compiled
 classes in
 examples/target/scala-2.10/classes/org/apache/spark/mycode

 When I place the same code in the same folder structure in the spark 0.9.0
 version, I am able to run it. Where should I place my standalone code with
 respect to SPARK_HOME, in spark0.9.1 so that the classes can be found?

 thanks



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/spark-0-9-1-ClassNotFoundException-tp5256.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.



spark 0.9.1: ClassNotFoundException

2014-05-02 Thread SK
I am using Spark 0.9.1 in standalone mode. In the 
SPARK_HOME/examples/src/main/scala/org/apache/spark/ folder, I created my
directory called mycode in which I have placed some standalone scala code.
I was able to compile. I ran the code using:

./bin/run-example org.apache.spark.mycode.MyClass local

However, I get a ClassNotFound exception, although I do see the compiled
classes in 
examples/target/scala-2.10/classes/org/apache/spark/mycode

When I place the same code in the same folder structure in the spark 0.9.0
version, I am able to run it. Where should I place my standalone code with
respect to SPARK_HOME, in spark0.9.1 so that the classes can be found? 

thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-0-9-1-ClassNotFoundException-tp5256.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


ClassNotFoundException

2014-05-01 Thread Joe L
Hi, I am getting the following error. How could I fix this problem?

Joe

14/05/02 03:51:48 WARN TaskSetManager: Lost TID 12 (task 2.0:1)
14/05/02 03:51:48 INFO TaskSetManager: Loss was due to
java.lang.ClassNotFoundException:
org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$4 [duplicate 6]
14/05/02 03:51:48 ERROR TaskSetManager: Task 2.0:1 failed 4 times; aborting
job
14/05/02 03:51:48 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks
have all completed, from pool 
14/05/02 03:51:48 INFO TaskSetManager: Loss was due to
java.lang.ClassNotFoundException:
org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$4 [duplicate 7]
14/05/02 03:51:48 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks
have all completed, from pool 
14/05/02 03:51:48 INFO DAGScheduler: Failed to run count at
reasoner.scala:70
[error] (run-main-0) org.apache.spark.SparkException: Job aborted: Task
2.0:1 failed 4 times (most recent failure: Exception failure:
java.lang.ClassNotFoundException:
org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$4)
org.apache.spark.SparkException: Job aborted: Task 2.0:1 failed 4 times
(most recent failure: Exception failure: java.lang.ClassNotFoundException:
org.apache.spark.rdd.PairRDDFunctions$$anonfun$combineByKey$4)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1018)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:604)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:604)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/ClassNotFoundException-tp5182.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.


Re: Shark: ClassNotFoundException org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat

2014-04-17 Thread Arpit Tak
Just for curiosity , as you are using Cloudera-Manager hadoop and spark..
How you build shark .for it??

are you able to read any file from hdfs ...did you tried that out..???


Regards,
Arpit Tak


On Thu, Apr 17, 2014 at 7:07 PM, ge ko koenig@gmail.com wrote:

 Hi,

 the error java.lang.ClassNotFoundException:
 org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat has been
 resolved by adding
 parquet-hive-bundle-1.4.1.jar to shark's lib folder.
 Now the Hive metastore can be read successfully (also the parquet based
 table).

 But if I want to select from that table I receive:

 org.apache.spark.SparkException: Job aborted: Task 0.0:0 failed 4 times
 (most recent failure: Exception failure: java.lang.ClassNotFoundException:
 org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1020)
 at
 org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1018)

 This is really strange, since the class
 org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe is included in
 the parquet-hive-bundle-1.4.1.jar ?!?!
 ...getting more and more confused ;)

 any help ?

 regards, Gerd


 On 17 April 2014 11:55, ge ko koenig@gmail.com wrote:

 Hi,

 I want to select from a parquet based table in shark, but receive the
 error:

 shark select * from wl_parquet;
 14/04/17 11:33:49 INFO shark.SharkCliDriver: Execution Mode: shark
 14/04/17 11:33:49 INFO ql.Driver: PERFLOG method=Driver.run
 14/04/17 11:33:49 INFO ql.Driver: PERFLOG method=TimeToSubmit
 14/04/17 11:33:49 INFO ql.Driver: PERFLOG method=compile
 14/04/17 11:33:49 INFO parse.ParseDriver: Parsing command: select * from
 wl_parquet
 14/04/17 11:33:49 INFO parse.ParseDriver: Parse Completed
 14/04/17 11:33:49 INFO parse.SharkSemanticAnalyzer: Get metadata for
 source tables
 FAILED: Hive Internal Error:
 java.lang.RuntimeException(java.lang.ClassNotFoundException:
 org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
 14/04/17 11:33:50 ERROR shark.SharkDriver: FAILED: Hive Internal Error:
 java.lang.RuntimeException(java.lang.ClassNotFoundException:
 org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat)
 java.lang.RuntimeException: java.lang.ClassNotFoundException:
 org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
 at
 org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:306)
 at org.apache.hadoop.hive.ql.metadata.Table.init(Table.java:99)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:988)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:891)
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1083)
 at
 org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:1059)
 at
 shark.parse.SharkSemanticAnalyzer.analyzeInternal(SharkSemanticAnalyzer.scala:137)
 at
 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:279)
 at shark.SharkDriver.compile(SharkDriver.scala:215)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
 at shark.SharkCliDriver.processCmd(SharkCliDriver.scala:338)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
 at shark.SharkCliDriver$.main(SharkCliDriver.scala:235)
 at shark.SharkCliDriver.main(SharkCliDriver.scala)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
 at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:270)
 at
 org.apache.hadoop.hive.ql.metadata.Table.getInputFormatClass(Table.java:302)
 ... 14 more

 I can successfully select from that table with Hive and Impala, but shark
 doesn't work. I am using CDH5 incl. Spark parcel and Shark 0.9.1.

 In what jar is this class hidden, how can I get rid of this exception
 ?!?!

 The lib folder of shark contains:
 [root@hadoop-pg-9 shark-0.9.1]# ll lib
 total 180
 lrwxrwxrwx 1 root root67 16. Apr 14:17 hive-serdes-1.0-SNAPSHOT.jar
 - /opt/cloudera/parcels/CDH/lib/hive/lib/hive-serdes-1.0-SNAPSHOT.jar
 -rwxrwxr-x 1 root root 23086  9. Apr 10:57 JavaEWAH-0.4.2.jar
 lrwxrwxrwx 1 root root53 14. Apr 21:46 parquet-avro.jar -
 

<    1   2