Re: spark-shell giving me error of unread block data
Didn't really edit the configs as much .. but here's what the spark-env.sh is: #!/usr/bin/env bash ## # Generated by Cloudera Manager and should not be modified directly ## export SPARK_HOME=/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/spark export STANDALONE_SPARK_MASTER_HOST=cloudera-1.testdomain.net export SPARK_MASTER_PORT=7077 export DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/lib/hadoop ### Path of Spark assembly jar in HDFS export SPARK_JAR_HDFS_PATH=${SPARK_JAR_HDFS_PATH:-/user/spark/share/lib/spark-assembly.jar} ### Let's run everything with JVM runtime, instead of Scala export SPARK_LAUNCH_WITH_SCALA=0 export SPARK_LIBRARY_PATH=${SPARK_HOME}/lib export SCALA_LIBRARY_PATH=${SPARK_HOME}/lib export SPARK_MASTER_IP=$STANDALONE_SPARK_MASTER_HOST export HADOOP_HOME=${HADOOP_HOME:-$DEFAULT_HADOOP_HOME} if [ -n $HADOOP_HOME ]; then export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:${HADOOP_HOME}/lib/native fi export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-/etc/hadoop/conf} And here's the spark-defaults.conf: spark.eventLog.dir=hdfs:// cloudera-2.testdomain.net:8020/user/spark/applicationHistory spark.eventLog.enabled=true spark.master=spark://cloudera-1.testdomain.net:7077 On Wed Nov 19 2014 at 8:06:40 PM Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: As Marcelo mentioned, the issue occurs mostly when incompatible classes are used by executors or drivers. Try out if the output is coming on spark-shell. If yes, then most probably in your case, there might be some issue with your configuration files. It will be helpful if you can paste the contents of the config files you edited. On Thu, Nov 20, 2014 at 5:45 AM, Anson Abraham anson.abra...@gmail.com wrote: Sorry meant cdh 5.2 w/ spark 1.1. On Wed, Nov 19, 2014, 17:41 Anson Abraham anson.abra...@gmail.com wrote: yeah CDH distribution (1.1). On Wed Nov 19 2014 at 5:29:39 PM Marcelo Vanzin van...@cloudera.com wrote: On Wed, Nov 19, 2014 at 2:13 PM, Anson Abraham anson.abra...@gmail.com wrote: yeah but in this case i'm not building any files. just deployed out config files in CDH5.2 and initiated a spark-shell to just read and output a file. In that case it is a little bit weird. Just to be sure, you are using CDH's version of Spark, not trying to run an Apache Spark release on top of CDH, right? (If that's the case, then we could probably move this conversation to cdh-us...@cloudera.org, since it would be CDH-specific.) On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin van...@cloudera.com wrote: Hi Anson, We've seen this error when incompatible classes are used in the driver and executors (e.g., same class name, but the classes are different and thus the serialized data is different). This can happen for example if you're including some 3rd party libraries in your app's jar, or changing the driver/executor class paths to include these conflicting libraries. Can you clarify whether any of the above apply to your case? (For example, one easy way to trigger this is to add the spark-examples jar shipped with CDH5.2 in the classpath of your driver. That's one of the reasons I filed SPARK-4048, but I digress.) On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham anson.abra...@gmail.com wrote: I'm essentially loading a file and saving output to another location: val source = sc.textFile(/tmp/testfile.txt) source.saveAsTextFile(/tmp/testsparkoutput) when i do so, i'm hitting this error: 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at console:15 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateExceptio n: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode( ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea m.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream. java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre am.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream. java:370) org.apache.spark.serializer.JavaDeserializationStream.readOb ject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deseriali ze(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor. scala:162) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool Executor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo lExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at
Re: spark-shell giving me error of unread block data
Question ... when you mean different versions, different versions of dependency files? what are the dependency files for spark? On Tue Nov 18 2014 at 5:27:18 PM Anson Abraham anson.abra...@gmail.com wrote: when cdh cluster was running, i did not set up spark role. When I did for the first time, it was working ie, the same load of test file gave me output. But in this case, how can there be different versions? This is all done through cloudera manager parcels how does one find out version installed? I did do an rsync from master to the worker nodes, and that did not help me much. And we're talking about the spark-assembly jar files correct? or is there another set of jar files i should be checking for? On Tue Nov 18 2014 at 5:16:57 PM Ritesh Kumar Singh riteshoneinamill...@gmail.com wrote: It can be a serialization issue. Happens when there are different versions installed on the same system. What do you mean by the first time you installed and tested it out? On Wed, Nov 19, 2014 at 3:29 AM, Anson Abraham anson.abra...@gmail.com wrote: I'm essentially loading a file and saving output to another location: val source = sc.textFile(/tmp/testfile.txt) source.saveAsTextFile(/tmp/testsparkoutput) when i do so, i'm hitting this error: 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at console:15 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode( ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream. java:1382) java.io.ObjectInputStream.defaultReadFields( ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData( ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject( ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream. java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream. readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance. deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run( Executor.scala:162) java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$ scheduler$DAGScheduler$$failJobAndIndependentStages( DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach( ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage( DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$ handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$ handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$ $anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec( AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue. runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker( ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run( ForkJoinWorkerThread.java:107) Cant figure out what the issue is. I'm running in CDH5.2 w/ version of spark being 1.1. The file i'm loading is literally just 7 MB. I thought it was jar files mismatch, but i did a compare and see they're all identical. But seeing as how they were all installed through CDH parcels, not sure how there would be version mismatch on the nodes and master. Oh yeah 1 master node w/ 2 worker nodes and running in standalone not through yarn. So as a just in case, i copied the jars from the master to the 2 worker nodes as just in case, and still same issue. Weird
Re: spark-shell giving me error of unread block data
Hi Anson, We've seen this error when incompatible classes are used in the driver and executors (e.g., same class name, but the classes are different and thus the serialized data is different). This can happen for example if you're including some 3rd party libraries in your app's jar, or changing the driver/executor class paths to include these conflicting libraries. Can you clarify whether any of the above apply to your case? (For example, one easy way to trigger this is to add the spark-examples jar shipped with CDH5.2 in the classpath of your driver. That's one of the reasons I filed SPARK-4048, but I digress.) On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham anson.abra...@gmail.com wrote: I'm essentially loading a file and saving output to another location: val source = sc.textFile(/tmp/testfile.txt) source.saveAsTextFile(/tmp/testsparkoutput) when i do so, i'm hitting this error: 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at console:15 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:162) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Cant figure out what the issue is. I'm running in CDH5.2 w/ version of spark being 1.1. The file i'm loading is literally just 7 MB. I thought it was jar files mismatch, but i did a compare and see they're all identical. But seeing as how they were all installed through CDH parcels, not sure how there would be version mismatch on the nodes and master. Oh yeah 1 master node w/ 2 worker nodes and running in standalone not through yarn. So as a just in case, i copied the jars from the master to the 2 worker nodes as just in case, and still same issue. Weird thing is, first time i installed and tested it out, it worked, but now it doesn't. Any help here would be greatly appreciated. -- Marcelo - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: spark-shell giving me error of unread block data
yeah but in this case i'm not building any files. just deployed out config files in CDH5.2 and initiated a spark-shell to just read and output a file. On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin van...@cloudera.com wrote: Hi Anson, We've seen this error when incompatible classes are used in the driver and executors (e.g., same class name, but the classes are different and thus the serialized data is different). This can happen for example if you're including some 3rd party libraries in your app's jar, or changing the driver/executor class paths to include these conflicting libraries. Can you clarify whether any of the above apply to your case? (For example, one easy way to trigger this is to add the spark-examples jar shipped with CDH5.2 in the classpath of your driver. That's one of the reasons I filed SPARK-4048, but I digress.) On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham anson.abra...@gmail.com wrote: I'm essentially loading a file and saving output to another location: val source = sc.textFile(/tmp/testfile.txt) source.saveAsTextFile(/tmp/testsparkoutput) when i do so, i'm hitting this error: 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at console:15 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode( ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream. java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject( ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream. java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream. readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance. deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:162) java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$ scheduler$DAGScheduler$$failJobAndIndependentStages( DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach( ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage( DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$ handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$ handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$ $anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec( AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue. runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker( ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run( ForkJoinWorkerThread.java:107) Cant figure out what the issue is. I'm running in CDH5.2 w/ version of spark being 1.1. The file i'm loading is literally just 7 MB. I thought it was jar files mismatch, but i did a compare and see they're all identical. But seeing as how they were all installed through CDH parcels, not sure how there would be version mismatch on the nodes and master. Oh yeah 1 master node w/ 2 worker nodes and running in standalone not through yarn. So as a just in case, i copied the jars from the master to the 2 worker nodes as just in case, and still same issue. Weird thing is, first time i installed and tested it out, it worked, but now it doesn't. Any help here would be
Re: spark-shell giving me error of unread block data
On Wed, Nov 19, 2014 at 2:13 PM, Anson Abraham anson.abra...@gmail.com wrote: yeah but in this case i'm not building any files. just deployed out config files in CDH5.2 and initiated a spark-shell to just read and output a file. In that case it is a little bit weird. Just to be sure, you are using CDH's version of Spark, not trying to run an Apache Spark release on top of CDH, right? (If that's the case, then we could probably move this conversation to cdh-us...@cloudera.org, since it would be CDH-specific.) On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin van...@cloudera.com wrote: Hi Anson, We've seen this error when incompatible classes are used in the driver and executors (e.g., same class name, but the classes are different and thus the serialized data is different). This can happen for example if you're including some 3rd party libraries in your app's jar, or changing the driver/executor class paths to include these conflicting libraries. Can you clarify whether any of the above apply to your case? (For example, one easy way to trigger this is to add the spark-examples jar shipped with CDH5.2 in the classpath of your driver. That's one of the reasons I filed SPARK-4048, but I digress.) On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham anson.abra...@gmail.com wrote: I'm essentially loading a file and saving output to another location: val source = sc.textFile(/tmp/testfile.txt) source.saveAsTextFile(/tmp/testsparkoutput) when i do so, i'm hitting this error: 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at console:15 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:162) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Cant figure out what the issue is. I'm running in CDH5.2 w/ version of spark being 1.1. The file i'm loading is literally just 7 MB. I thought it was jar files mismatch, but i did a compare and see they're all identical. But seeing as how they were all installed through CDH parcels, not sure how there would be version mismatch
Re: spark-shell giving me error of unread block data
yeah CDH distribution (1.1). On Wed Nov 19 2014 at 5:29:39 PM Marcelo Vanzin van...@cloudera.com wrote: On Wed, Nov 19, 2014 at 2:13 PM, Anson Abraham anson.abra...@gmail.com wrote: yeah but in this case i'm not building any files. just deployed out config files in CDH5.2 and initiated a spark-shell to just read and output a file. In that case it is a little bit weird. Just to be sure, you are using CDH's version of Spark, not trying to run an Apache Spark release on top of CDH, right? (If that's the case, then we could probably move this conversation to cdh-us...@cloudera.org, since it would be CDH-specific.) On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin van...@cloudera.com wrote: Hi Anson, We've seen this error when incompatible classes are used in the driver and executors (e.g., same class name, but the classes are different and thus the serialized data is different). This can happen for example if you're including some 3rd party libraries in your app's jar, or changing the driver/executor class paths to include these conflicting libraries. Can you clarify whether any of the above apply to your case? (For example, one easy way to trigger this is to add the spark-examples jar shipped with CDH5.2 in the classpath of your driver. That's one of the reasons I filed SPARK-4048, but I digress.) On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham anson.abra...@gmail.com wrote: I'm essentially loading a file and saving output to another location: val source = sc.textFile(/tmp/testfile.txt) source.saveAsTextFile(/tmp/testsparkoutput) when i do so, i'm hitting this error: 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at console:15 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode( ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields( ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject( ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream. java:370) org.apache.spark.serializer.JavaDeserializationStream. readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance. deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:162) java.util.concurrent.ThreadPoolExecutor.runWorker( ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$ scheduler$DAGScheduler$$failJobAndIndependentStages( DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach( ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage( DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$ handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$ handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$ $anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec( AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec( ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue. runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker( ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run( ForkJoinWorkerThread.java:107) Cant figure out what the issue is. I'm running in CDH5.2 w/
Re: spark-shell giving me error of unread block data
Sorry meant cdh 5.2 w/ spark 1.1. On Wed, Nov 19, 2014, 17:41 Anson Abraham anson.abra...@gmail.com wrote: yeah CDH distribution (1.1). On Wed Nov 19 2014 at 5:29:39 PM Marcelo Vanzin van...@cloudera.com wrote: On Wed, Nov 19, 2014 at 2:13 PM, Anson Abraham anson.abra...@gmail.com wrote: yeah but in this case i'm not building any files. just deployed out config files in CDH5.2 and initiated a spark-shell to just read and output a file. In that case it is a little bit weird. Just to be sure, you are using CDH's version of Spark, not trying to run an Apache Spark release on top of CDH, right? (If that's the case, then we could probably move this conversation to cdh-us...@cloudera.org, since it would be CDH-specific.) On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin van...@cloudera.com wrote: Hi Anson, We've seen this error when incompatible classes are used in the driver and executors (e.g., same class name, but the classes are different and thus the serialized data is different). This can happen for example if you're including some 3rd party libraries in your app's jar, or changing the driver/executor class paths to include these conflicting libraries. Can you clarify whether any of the above apply to your case? (For example, one easy way to trigger this is to add the spark-examples jar shipped with CDH5.2 in the classpath of your driver. That's one of the reasons I filed SPARK-4048, but I digress.) On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham anson.abra...@gmail.com wrote: I'm essentially loading a file and saving output to another location: val source = sc.textFile(/tmp/testfile.txt) source.saveAsTextFile(/tmp/testsparkoutput) when i do so, i'm hitting this error: 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at console:15 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode( ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea m.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream. java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre am.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java :370) org.apache.spark.serializer.JavaDeserializationStream.readOb ject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deseriali ze(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor. scala:162) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool Executor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo lExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$sch eduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(Resiza bleArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer. scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu ler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS etFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS etFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$ anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec( AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask. java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask( ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPoo l.java:1979) at
Re: spark-shell giving me error of unread block data
As Marcelo mentioned, the issue occurs mostly when incompatible classes are used by executors or drivers. Try out if the output is coming on spark-shell. If yes, then most probably in your case, there might be some issue with your configuration files. It will be helpful if you can paste the contents of the config files you edited. On Thu, Nov 20, 2014 at 5:45 AM, Anson Abraham anson.abra...@gmail.com wrote: Sorry meant cdh 5.2 w/ spark 1.1. On Wed, Nov 19, 2014, 17:41 Anson Abraham anson.abra...@gmail.com wrote: yeah CDH distribution (1.1). On Wed Nov 19 2014 at 5:29:39 PM Marcelo Vanzin van...@cloudera.com wrote: On Wed, Nov 19, 2014 at 2:13 PM, Anson Abraham anson.abra...@gmail.com wrote: yeah but in this case i'm not building any files. just deployed out config files in CDH5.2 and initiated a spark-shell to just read and output a file. In that case it is a little bit weird. Just to be sure, you are using CDH's version of Spark, not trying to run an Apache Spark release on top of CDH, right? (If that's the case, then we could probably move this conversation to cdh-us...@cloudera.org, since it would be CDH-specific.) On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin van...@cloudera.com wrote: Hi Anson, We've seen this error when incompatible classes are used in the driver and executors (e.g., same class name, but the classes are different and thus the serialized data is different). This can happen for example if you're including some 3rd party libraries in your app's jar, or changing the driver/executor class paths to include these conflicting libraries. Can you clarify whether any of the above apply to your case? (For example, one easy way to trigger this is to add the spark-examples jar shipped with CDH5.2 in the classpath of your driver. That's one of the reasons I filed SPARK-4048, but I digress.) On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham anson.abra...@gmail.com wrote: I'm essentially loading a file and saving output to another location: val source = sc.textFile(/tmp/testfile.txt) source.saveAsTextFile(/tmp/testsparkoutput) when i do so, i'm hitting this error: 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at console:15 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateExceptio n: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode( ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStrea m.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream. java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStre am.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java :370) org.apache.spark.serializer.JavaDeserializationStream.readOb ject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deseriali ze(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor. scala:162) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool Executor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo lExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$sch eduler$DAGScheduler$$failJobAndIndependentStages(DAGSchedule r.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$ 1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$ 1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(Resiza bleArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer. scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGSchedu ler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS etFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskS etFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$ anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at
spark-shell giving me error of unread block data
I'm essentially loading a file and saving output to another location: val source = sc.textFile(/tmp/testfile.txt) source.saveAsTextFile(/tmp/testsparkoutput) when i do so, i'm hitting this error: 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at console:15 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:162) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Cant figure out what the issue is. I'm running in CDH5.2 w/ version of spark being 1.1. The file i'm loading is literally just 7 MB. I thought it was jar files mismatch, but i did a compare and see they're all identical. But seeing as how they were all installed through CDH parcels, not sure how there would be version mismatch on the nodes and master. Oh yeah 1 master node w/ 2 worker nodes and running in standalone not through yarn. So as a just in case, i copied the jars from the master to the 2 worker nodes as just in case, and still same issue. Weird thing is, first time i installed and tested it out, it worked, but now it doesn't. Any help here would be greatly appreciated.
Re: spark-shell giving me error of unread block data
It can be a serialization issue. Happens when there are different versions installed on the same system. What do you mean by the first time you installed and tested it out? On Wed, Nov 19, 2014 at 3:29 AM, Anson Abraham anson.abra...@gmail.com wrote: I'm essentially loading a file and saving output to another location: val source = sc.textFile(/tmp/testfile.txt) source.saveAsTextFile(/tmp/testsparkoutput) when i do so, i'm hitting this error: 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at console:15 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:162) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org $apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Cant figure out what the issue is. I'm running in CDH5.2 w/ version of spark being 1.1. The file i'm loading is literally just 7 MB. I thought it was jar files mismatch, but i did a compare and see they're all identical. But seeing as how they were all installed through CDH parcels, not sure how there would be version mismatch on the nodes and master. Oh yeah 1 master node w/ 2 worker nodes and running in standalone not through yarn. So as a just in case, i copied the jars from the master to the 2 worker nodes as just in case, and still same issue. Weird thing is, first time i installed and tested it out, it worked, but now it doesn't. Any help here would be greatly appreciated.