Re: spark 0.8

Matei Zaharia Fri, 18 Oct 2013 09:35:12 -0700

Can you post the build.sbt for your program? It needs to include hadoop-client 
for CDH4.3, and that should *not* be listed as provided.


Matei

On Oct 18, 2013, at 8:23 AM, Koert Kuipers <[email protected]> wrote:

> ok this has nothing to do with hadoop access. even a simple program that uses 
> sc.parallelize blows up in this way.
> 
> so spark-shell works well on the same machine i launch this from.
> 
> if i launch a simple program without using kryo for serializer and closure 
> serialize i get a different error. see below.
> at this point it seems to me i have some issue with task serialization???
> 
> 
> 
> 13/10/18 11:20:37 INFO StandaloneExecutorBackend: Got assigned task 0
> 13/10/18 11:20:37 INFO StandaloneExecutorBackend: Got assigned task 1
> 13/10/18 11:20:37 INFO Executor: Running task ID 1
> 13/10/18 11:20:37 INFO Executor: Running task ID 0
> 13/10/18 11:20:37 INFO Executor: Fetching 
> http://192.168.3.171:41629/jars/simple-project_2.9.3-1.0.jar with timestamp 
> 1382109635095
> 13/10/18 11:20:37 INFO Utils: Fetching 
> http://192.168.3.171:41629/jars/simple-project_2.9.3-1.0.jar to 
> /tmp/fetchFileTemp378181753997570700.tmp
> 13/10/18 11:20:37 INFO Executor: Adding 
> file:/var/lib/spark/app-20131018112035-0014/1/./simple-project_2.9.3-1.0.jar 
> to class loader
> 13/10/18 11:20:37 INFO Executor: caught throwable with stacktrace 
> java.io.StreamCorruptedException: invalid type code: 00
>     at 
> java.io.ObjectInputStream$BlockDataInputStream.readBlockHeader(ObjectInputStream.java:2467)
>     at 
> java.io.ObjectInputStream$BlockDataInputStream.refill(ObjectInputStream.java:2502)
>     at 
> java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2661)
>     at 
> java.io.ObjectInputStream$BlockDataInputStream.read(ObjectInputStream.java:2583)
>     at java.io.DataInputStream.readFully(DataInputStream.java:178)
>     at java.io.DataInputStream.readLong(DataInputStream.java:399)
>     at 
> java.io.ObjectInputStream$BlockDataInputStream.readLong(ObjectInputStream.java:2803)
>     at java.io.ObjectInputStream.readLong(ObjectInputStream.java:958)
>     at 
> org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:72)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
>     at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1852)
>     at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>     at 
> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:135)
>     at java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1795)
>     at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1754)
>     at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>     at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>     at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
>     at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:61)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:153)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>     at java.lang.Thread.run(Thread.java:662)
> 
> 
> 
> On Fri, Oct 18, 2013 at 10:59 AM, Koert Kuipers <[email protected]> wrote:
> i created a tiny sbt project as described here: 
> apache.org/docs/latest/quick-start.html#a-standalone-app-in-scala 
> 
> it has the correct dependencies: spark-core and the correct hadoop-client for 
> my platform. i tried both the generic spark-core dependency and spark-core 
> dependency compiled against my platform. it runs fine in local mode, but when 
> i switch to the cluster i still always get the following exceptions on tasks:
> 
> 13/10/18 10:25:53 ERROR Executor: Uncaught exception in thread 
> Thread[pool-5-thread-1,5,main]
> 
> java.lang.NullPointerException
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>     at java.lang.Thread.run(Thread.java:662)
> 
> after adding some additional debugging to Executor i see the cause is this:
> 13/10/18 10:54:47 INFO Executor: caught throwable with stacktrace 
> java.lang.NullPointerException
>     at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$2.apply(Executor.scala:155)
>     at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$2.apply(Executor.scala:155)
>     at org.apache.spark.Logging$class.logInfo(Logging.scala:48)
>     at org.apache.spark.executor.Executor.logInfo(Executor.scala:36)
>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:155)
> 
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>     at java.lang.Thread.run(Thread.java:662)
> 
> so it seems the offending line is:
> logInfo("Its epoch is " + task.epoch)
> 
> i am guessing accessing epoch on the task is throwing the NPE. any ideas?
> 
> 
> 
> On Thu, Oct 17, 2013 at 8:12 PM, Koert Kuipers <[email protected]> wrote:
> sorry one more related question:
> i compile against a spark build for hadoop 1.0.4, but the actual installed 
> version of spark is build against cdh4.3.0-mr1. this also used to work, and i 
> prefer to do this so i compile against a generic spark build. could this be 
> the issue?
> 
> 
> On Thu, Oct 17, 2013 at 8:06 PM, Koert Kuipers <[email protected]> wrote:
> i have my spark and hadoop related dependencies as "provided" for my spark 
> job. this used to work with previous versions. are these now supposed to be 
> compile/runtime/default dependencies?
> 
> 
> On Thu, Oct 17, 2013 at 8:04 PM, Koert Kuipers <[email protected]> wrote:
> yes i did that and i can see the correct jars sitting in lib_managed
> 
> 
> On Thu, Oct 17, 2013 at 7:56 PM, Matei Zaharia <[email protected]> 
> wrote:
> Koert, did you link your Spark job to the right version of HDFS as well? In 
> Spark 0.8, you have to add a Maven dependency on "hadoop-client" for your 
> version of Hadoop. See 
> http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scala
>  for example.
> 
> Matei
> 
> On Oct 17, 2013, at 4:38 PM, Koert Kuipers <[email protected]> wrote:
> 
>> i got the job a little further along by also setting this:
>> System.setProperty("spark.closure.serializer", 
>> "org.apache.spark.serializer.KryoSerializer")
>> 
>> not sure why i need to... but anyhow, now my workers start and then they 
>> blow up on this:
>> 
>> 13/10/17 19:22:57 ERROR Executor: Uncaught exception in thread 
>> Thread[pool-5-thread-1,5,main]
>> java.lang.NullPointerException
>>     at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>     at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>     at java.lang.Thread.run(Thread.java:662)
>> 
>> 
>> which is:
>>  val metrics = attemptedTask.flatMap(t => t.metrics)
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On Thu, Oct 17, 2013 at 7:30 PM, dachuan <[email protected]> wrote:
>> thanks, Mark.
>> 
>> 
>> On Thu, Oct 17, 2013 at 6:36 PM, Mark Hamstra <[email protected]> 
>> wrote:
>> SNAPSHOTs are not fixed versions, but are floating names associated with 
>> whatever is the most recent code.  So, Spark 0.8.0 is the current released 
>> version of Spark, which is exactly the same today as it was yesterday, and 
>> will be the same thing forever.  Spark 0.8.1-SNAPSHOT is whatever is 
>> currently in branch-0.8.  It changes every time new code is committed to 
>> that branch (which should be just bug fixes and the few additional features 
>> that we wanted to get into 0.8.0, but that didn't quite make it.)  Not too 
>> long from now there will be a release of Spark 0.8.1, at which time the 
>> SNAPSHOT will got to 0.8.2 and 0.8.1 will be forever frozen.  Meanwhile, the 
>> wild new development is taking place on the master branch, and whatever is 
>> currently in that branch becomes 0.9.0-SNAPSHOT.  This could be quite 
>> different from day to day, and there are no guarantees that things won't be 
>> broken in 0.9.0-SNAPSHOT.  Several months from now there will be a release 
>> of Spark 0.9.0 (unless the decision is made to bump the version to 1.0.0), 
>> at which point the SNAPSHOT goes to 0.9.1 and the whole process advances to 
>> the next phase of development.
>> 
>> The short answer is that releases are stable, SNAPSHOTs are not, and 
>> SNAPSHOTs that aren't on maintenance branches can break things.  You make 
>> your choice of which to use and pay the consequences. 
>> 
>> 
>> On Thu, Oct 17, 2013 at 3:18 PM, dachuan <[email protected]> wrote:
>> yeah, I mean 0.9.0-SNAPSHOT. I use git clone and that's what I got.. what's 
>> the difference? I mean SNAPSHOT and non-SNAPSHOT.
>> 
>> 
>> On Thu, Oct 17, 2013 at 6:15 PM, Mark Hamstra <[email protected]> 
>> wrote:
>> Of course, you mean 0.9.0-SNAPSHOT.  There is no Spark 0.9.0, and won't be 
>> for several months.
>> 
>> 
>> 
>> On Thu, Oct 17, 2013 at 3:11 PM, dachuan <[email protected]> wrote:
>> I'm sorry if this doesn't answer your question directly, but I have tried 
>> spark 0.9.0 and hdfs 1.0.4 just now, it works..
>> 
>> 
>> On Thu, Oct 17, 2013 at 6:05 PM, Koert Kuipers <[email protected]> wrote:
>> after upgrading from spark 0.7 to spark 0.8 i can no longer access any files 
>> on HDFS.
>> i see the error below. any ideas?
>> 
>> i am running spark standalone on a cluster that also has CDH4.3.0 and 
>> rebuild spark accordingly. the jars in lib_managed look good to me.
>> 
>> i noticed similar errors in the mailing list but found no suggested 
>> solutions. 
>> 
>> thanks! koert
>> 
>> 
>> 13/10/17 17:43:23 ERROR Executor: Exception in task ID 0
>> java.io.EOFException
>>      at 
>> java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2703)
>>      at java.io.ObjectInputStream.readFully(ObjectInputStream.java:1008)
>>      at 
>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
>>      at 
>> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
>>      at org.apache.hadoop.io.UTF8.readChars(UTF8.java:258)
>>      at org.apache.hadoop.io.UTF8.readString(UTF8.java:250)
>>      at org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
>>      at 
>> org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)
>>      at 
>> org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)
>>      at 
>> org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
>>      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>      at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>      at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>      at java.lang.reflect.Method.invoke(Method.java:597)
>>      at 
>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
>>      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1852)
>>      at 
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
>>      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>      at 
>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1950)
>>      at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1874)
>>      at 
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
>>      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>      at 
>> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:135)
>>      at 
>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1795)
>>      at 
>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1754)
>>      at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>      at java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>      at 
>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
>>      at 
>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:61)
>>      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:153)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>      at java.lang.Thread.run(Thread.java:662)
>> 
>> 
>> 
>> -- 
>> Dachuan Huang
>> Cellphone: 614-390-7234
>> 2015 Neil Avenue
>> Ohio State University
>> Columbus, Ohio
>> U.S.A.
>> 43210
>> 
>> 
>> 
>> 
>> -- 
>> Dachuan Huang
>> Cellphone: 614-390-7234
>> 2015 Neil Avenue
>> Ohio State University
>> Columbus, Ohio
>> U.S.A.
>> 43210
>> 
>> 
>> 
>> 
>> -- 
>> Dachuan Huang
>> Cellphone: 614-390-7234
>> 2015 Neil Avenue
>> Ohio State University
>> Columbus, Ohio
>> U.S.A.
>> 43210
>> 
> 
> 
> 
> 
> 
>

Re: spark 0.8

Reply via email to