Do you happen to know if this still occurs when using the Hadoop
bindings for other versions of CDH or vanilla Hadoop? The error here
seems to be inside of Hadoop's own deserializer so it could be version
dependent.

Does this happen deterministiclly?

- Patrick

On Sat, Feb 1, 2014 at 7:15 PM, Sandy Ryza <[email protected]> wrote:
> Hi all,
>
> I'm running into an EOFException when I try to run a simple spark job that
> reads a text file and collects the results.  It looks like it's occurring
> when the executor tries to deserialize the task.  The setup is Spark 0.9
> against CDH5.
>
> The error occurs with both python and scala.  Maybe interestingly, it
> doesn't show up when I run sc.parallelize(Array(1,2,3,4)).collect().  Which
> could mean it's running into trouble deserializing the HadoopRDD?
>
> Any idea what could be going on?
>
> thanks for any help,
> Sandy
>
> ---
>
> java.io.EOFException
> at
> java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2742)
> at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1030)
> at
> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
> at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
> at org.apache.hadoop.io.UTF8.readChars(UTF8.java:260)
> at org.apache.hadoop.io.UTF8.readString(UTF8.java:252)
> at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
> at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
> at org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:77)
> at
> org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:145)
> at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1835)
> at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1794)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
> at
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
> at
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:195)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:46)
> at
> org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:45)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> at
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:724)
>

Reply via email to