yeah but in this case i'm not building any files. just deployed out config files in CDH5.2 and initiated a spark-shell to just read and output a file.
On Wed Nov 19 2014 at 4:52:51 PM Marcelo Vanzin <van...@cloudera.com> wrote: > Hi Anson, > > We've seen this error when incompatible classes are used in the driver > and executors (e.g., same class name, but the classes are different > and thus the serialized data is different). This can happen for > example if you're including some 3rd party libraries in your app's > jar, or changing the driver/executor class paths to include these > conflicting libraries. > > Can you clarify whether any of the above apply to your case? > > (For example, one easy way to trigger this is to add the > spark-examples jar shipped with CDH5.2 in the classpath of your > driver. That's one of the reasons I filed SPARK-4048, but I digress.) > > > On Tue, Nov 18, 2014 at 1:59 PM, Anson Abraham <anson.abra...@gmail.com> > wrote: > > I'm essentially loading a file and saving output to another location: > > > > val source = sc.textFile("/tmp/testfile.txt") > > source.saveAsTextFile("/tmp/testsparkoutput") > > > > when i do so, i'm hitting this error: > > 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at > > <console>:15 > > org.apache.spark.SparkException: Job aborted due to stage failure: Task > 0 in > > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > > (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException: > unread > > block data > > > > java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode( > ObjectInputStream.java:2421) > > java.io.ObjectInputStream.readObject0(ObjectInputStream. > java:1382) > > > > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > > > > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > > > > java.io.ObjectInputStream.readOrdinaryObject( > ObjectInputStream.java:1798) > > java.io.ObjectInputStream.readObject0(ObjectInputStream. > java:1350) > > java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > > > > org.apache.spark.serializer.JavaDeserializationStream. > readObject(JavaSerializer.scala:62) > > > > org.apache.spark.serializer.JavaSerializerInstance. > deserialize(JavaSerializer.scala:87) > > > > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:162) > > > > java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1145) > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:615) > > java.lang.Thread.run(Thread.java:744) > > Driver stacktrace: > > at > > org.apache.spark.scheduler.DAGScheduler.org$apache$spark$ > scheduler$DAGScheduler$$failJobAndIndependentStages( > DAGScheduler.scala:1185) > > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( > DAGScheduler.scala:1174) > > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( > DAGScheduler.scala:1173) > > at > > scala.collection.mutable.ResizableArray$class.foreach( > ResizableArray.scala:59) > > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > > at > > org.apache.spark.scheduler.DAGScheduler.abortStage( > DAGScheduler.scala:1173) > > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$ > handleTaskSetFailed$1.apply(DAGScheduler.scala:688) > > at > > org.apache.spark.scheduler.DAGScheduler$$anonfun$ > handleTaskSetFailed$1.apply(DAGScheduler.scala:688) > > at scala.Option.foreach(Option.scala:236) > > at > > org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( > DAGScheduler.scala:688) > > at > > org.apache.spark.scheduler.DAGSchedulerEventProcessActor$ > $anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) > > at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) > > at akka.actor.ActorCell.invoke(ActorCell.scala:456) > > at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) > > at akka.dispatch.Mailbox.run(Mailbox.scala:219) > > at > > akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec( > AbstractDispatcher.scala:386) > > at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) > > at > > scala.concurrent.forkjoin.ForkJoinPool$WorkQueue. > runTask(ForkJoinPool.java:1339) > > at scala.concurrent.forkjoin.ForkJoinPool.runWorker( > ForkJoinPool.java:1979) > > at > > scala.concurrent.forkjoin.ForkJoinWorkerThread.run( > ForkJoinWorkerThread.java:107) > > > > > > Cant figure out what the issue is. I'm running in CDH5.2 w/ version of > > spark being 1.1. The file i'm loading is literally just 7 MB. I > thought it > > was jar files mismatch, but i did a compare and see they're all > identical. > > But seeing as how they were all installed through CDH parcels, not sure > how > > there would be version mismatch on the nodes and master. Oh yeah 1 > master > > node w/ 2 worker nodes and running in standalone not through yarn. So > as a > > just in case, i copied the jars from the master to the 2 worker nodes as > > just in case, and still same issue. > > Weird thing is, first time i installed and tested it out, it worked, but > now > > it doesn't. > > > > Any help here would be greatly appreciated. > > > > -- > Marcelo >