Question ... when you mean different versions, different versions of dependency files? what are the dependency files for spark?
On Tue Nov 18 2014 at 5:27:18 PM Anson Abraham <anson.abra...@gmail.com> wrote: > when cdh cluster was running, i did not set up spark role. When I did for > the first time, it was working ie, the same load of test file gave me > output. But in this case, how can there be different versions? This is > all done through cloudera manager parcels how does one find out version > installed? I did do an rsync from master to the worker nodes, and that did > not help me much. And we're talking about the > > spark-assembly jar files correct? or is there another set of jar files i > should be checking for? > > On Tue Nov 18 2014 at 5:16:57 PM Ritesh Kumar Singh < > riteshoneinamill...@gmail.com> wrote: > >> It can be a serialization issue. >> Happens when there are different versions installed on the same system. >> What do you mean by the first time you installed and tested it out? >> >> On Wed, Nov 19, 2014 at 3:29 AM, Anson Abraham <anson.abra...@gmail.com> >> wrote: >> >>> I'm essentially loading a file and saving output to another location: >>> >>> val source = sc.textFile("/tmp/testfile.txt") >>> source.saveAsTextFile("/tmp/testsparkoutput") >>> >>> when i do so, i'm hitting this error: >>> 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at >>> <console>:15 >>> org.apache.spark.SparkException: Job aborted due to stage failure: Task >>> 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage >>> 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException: >>> unread block data >>> java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode( >>> ObjectInputStream.java:2421) >>> java.io.ObjectInputStream.readObject0(ObjectInputStream. >>> java:1382) >>> java.io.ObjectInputStream.defaultReadFields( >>> ObjectInputStream.java:1990) >>> java.io.ObjectInputStream.readSerialData( >>> ObjectInputStream.java:1915) >>> java.io.ObjectInputStream.readOrdinaryObject( >>> ObjectInputStream.java:1798) >>> java.io.ObjectInputStream.readObject0(ObjectInputStream. >>> java:1350) >>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) >>> org.apache.spark.serializer.JavaDeserializationStream. >>> readObject(JavaSerializer.scala:62) >>> org.apache.spark.serializer.JavaSerializerInstance. >>> deserialize(JavaSerializer.scala:87) >>> org.apache.spark.executor.Executor$TaskRunner.run( >>> Executor.scala:162) >>> java.util.concurrent.ThreadPoolExecutor.runWorker( >>> ThreadPoolExecutor.java:1145) >>> java.util.concurrent.ThreadPoolExecutor$Worker.run( >>> ThreadPoolExecutor.java:615) >>> java.lang.Thread.run(Thread.java:744) >>> Driver stacktrace: >>> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$ >>> scheduler$DAGScheduler$$failJobAndIndependentStages( >>> DAGScheduler.scala:1185) >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( >>> DAGScheduler.scala:1174) >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply( >>> DAGScheduler.scala:1173) >>> at scala.collection.mutable.ResizableArray$class.foreach( >>> ResizableArray.scala:59) >>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) >>> at org.apache.spark.scheduler.DAGScheduler.abortStage( >>> DAGScheduler.scala:1173) >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$ >>> handleTaskSetFailed$1.apply(DAGScheduler.scala:688) >>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$ >>> handleTaskSetFailed$1.apply(DAGScheduler.scala:688) >>> at scala.Option.foreach(Option.scala:236) >>> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed( >>> DAGScheduler.scala:688) >>> at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$ >>> $anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391) >>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) >>> at akka.actor.ActorCell.invoke(ActorCell.scala:456) >>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) >>> at akka.dispatch.Mailbox.run(Mailbox.scala:219) >>> at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec( >>> AbstractDispatcher.scala:386) >>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) >>> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue. >>> runTask(ForkJoinPool.java:1339) >>> at scala.concurrent.forkjoin.ForkJoinPool.runWorker( >>> ForkJoinPool.java:1979) >>> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run( >>> ForkJoinWorkerThread.java:107) >>> >>> >>> Cant figure out what the issue is. I'm running in CDH5.2 w/ version of >>> spark being 1.1. The file i'm loading is literally just 7 MB. I thought >>> it was jar files mismatch, but i did a compare and see they're all >>> identical. But seeing as how they were all installed through CDH parcels, >>> not sure how there would be version mismatch on the nodes and master. Oh >>> yeah 1 master node w/ 2 worker nodes and running in standalone not through >>> yarn. So as a just in case, i copied the jars from the master to the 2 >>> worker nodes as just in case, and still same issue. >>> Weird thing is, first time i installed and tested it out, it worked, but >>> now it doesn't. >>> >>> Any help here would be greatly appreciated. >>> >> >>