Re: spark-shell giving me error of unread block data

Anson Abraham Wed, 19 Nov 2014 13:30:11 -0800

Question ... when you mean different versions, different versions of
dependency files?  what are the dependency files for spark?


On Tue Nov 18 2014 at 5:27:18 PM Anson Abraham <anson.abra...@gmail.com>
wrote:

> when cdh cluster was running, i did not set up spark role.  When I did for
> the first time, it was working ie, the same load of test file gave me
> output.  But in this case, how can there be different versions?  This is
> all done through cloudera manager parcels  how does one find out version
> installed?  I did do an rsync from master to the worker nodes, and that did
> not help me much.   And we're talking about the
>
> spark-assembly jar files correct?  or is there another set of jar files i
> should be checking for?
>
> On Tue Nov 18 2014 at 5:16:57 PM Ritesh Kumar Singh <
> riteshoneinamill...@gmail.com> wrote:
>
>> It can be a serialization issue.
>> Happens when there are different versions installed on the same system.
>> What do you mean by the first time you installed and tested it out?
>>
>> On Wed, Nov 19, 2014 at 3:29 AM, Anson Abraham <anson.abra...@gmail.com>
>> wrote:
>>
>>> I'm essentially loading a file and saving output to another location:
>>>
>>> val source = sc.textFile("/tmp/testfile.txt")
>>> source.saveAsTextFile("/tmp/testsparkoutput")
>>>
>>> when i do so, i'm hitting this error:
>>> 14/11/18 21:15:08 INFO DAGScheduler: Failed to run saveAsTextFile at
>>> <console>:15
>>> org.apache.spark.SparkException: Job aborted due to stage failure: Task
>>> 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
>>> 0.0 (TID 6, cloudera-1.testdomain.net): java.lang.IllegalStateException:
>>> unread block data
>>>         java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(
>>> ObjectInputStream.java:2421)
>>>         java.io.ObjectInputStream.readObject0(ObjectInputStream.
>>> java:1382)
>>>         java.io.ObjectInputStream.defaultReadFields(
>>> ObjectInputStream.java:1990)
>>>         java.io.ObjectInputStream.readSerialData(
>>> ObjectInputStream.java:1915)
>>>         java.io.ObjectInputStream.readOrdinaryObject(
>>> ObjectInputStream.java:1798)
>>>         java.io.ObjectInputStream.readObject0(ObjectInputStream.
>>> java:1350)
>>>         java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>>>         org.apache.spark.serializer.JavaDeserializationStream.
>>> readObject(JavaSerializer.scala:62)
>>>         org.apache.spark.serializer.JavaSerializerInstance.
>>> deserialize(JavaSerializer.scala:87)
>>>         org.apache.spark.executor.Executor$TaskRunner.run(
>>> Executor.scala:162)
>>>         java.util.concurrent.ThreadPoolExecutor.runWorker(
>>> ThreadPoolExecutor.java:1145)
>>>         java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>> ThreadPoolExecutor.java:615)
>>>         java.lang.Thread.run(Thread.java:744)
>>> Driver stacktrace:
>>> at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$
>>> scheduler$DAGScheduler$$failJobAndIndependentStages(
>>> DAGScheduler.scala:1185)
>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(
>>> DAGScheduler.scala:1174)
>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(
>>> DAGScheduler.scala:1173)
>>> at scala.collection.mutable.ResizableArray$class.foreach(
>>> ResizableArray.scala:59)
>>> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>>> at org.apache.spark.scheduler.DAGScheduler.abortStage(
>>> DAGScheduler.scala:1173)
>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
>>> handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>>> at org.apache.spark.scheduler.DAGScheduler$$anonfun$
>>> handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
>>> at scala.Option.foreach(Option.scala:236)
>>> at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(
>>> DAGScheduler.scala:688)
>>> at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$
>>> $anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
>>> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>>> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>>> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>>> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>>> at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(
>>> AbstractDispatcher.scala:386)
>>> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
>>> runTask(ForkJoinPool.java:1339)
>>> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
>>> ForkJoinPool.java:1979)
>>> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
>>> ForkJoinWorkerThread.java:107)
>>>
>>>
>>> Cant figure out what the issue is.  I'm running in CDH5.2 w/ version of
>>> spark being 1.1.  The file i'm loading is literally just 7 MB.  I thought
>>> it was jar files mismatch, but i did a compare and see they're all
>>> identical.  But seeing as how they were all installed through CDH parcels,
>>> not sure how there would be version mismatch on the nodes and master.  Oh
>>> yeah 1 master node w/ 2 worker nodes and running in standalone not through
>>> yarn.  So as a just in case, i copied the jars from the master to the 2
>>> worker nodes as just in case, and still same issue.
>>> Weird thing is, first time i installed and tested it out, it worked, but
>>> now it doesn't.
>>>
>>> Any help here would be greatly appreciated.
>>>
>>
>>

Re: spark-shell giving me error of unread block data

Reply via email to