Re: spark 0.8

Koert Kuipers Thu, 17 Oct 2013 17:14:01 -0700

sorry one more related question:
i compile against a spark build for hadoop 1.0.4, but the actual installed
version of spark is build against cdh4.3.0-mr1. this also used to work, and
i prefer to do this so i compile against a generic spark build. could this
be the issue?



On Thu, Oct 17, 2013 at 8:06 PM, Koert Kuipers <[email protected]> wrote:

> i have my spark and hadoop related dependencies as "provided" for my spark
> job. this used to work with previous versions. are these now supposed to be
> compile/runtime/default dependencies?
>
>
> On Thu, Oct 17, 2013 at 8:04 PM, Koert Kuipers <[email protected]> wrote:
>
>> yes i did that and i can see the correct jars sitting in lib_managed
>>
>>
>> On Thu, Oct 17, 2013 at 7:56 PM, Matei Zaharia 
>> <[email protected]>wrote:
>>
>>> Koert, did you link your Spark job to the right version of HDFS as well?
>>> In Spark 0.8, you have to add a Maven dependency on "hadoop-client" for
>>> your version of Hadoop. See
>>> http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scala
>>>  for
>>> example.
>>>
>>> Matei
>>>
>>> On Oct 17, 2013, at 4:38 PM, Koert Kuipers <[email protected]> wrote:
>>>
>>> i got the job a little further along by also setting this:
>>> System.setProperty("spark.closure.serializer",
>>> "org.apache.spark.serializer.KryoSerializer")
>>>
>>> not sure why i need to... but anyhow, now my workers start and then they
>>> blow up on this:
>>>
>>> 13/10/17 19:22:57 ERROR Executor: Uncaught exception in thread
>>> Thread[pool-5-thread-1,5,main]
>>> java.lang.NullPointerException
>>>     at
>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187)
>>>     at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>>     at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>>     at java.lang.Thread.run(Thread.java:662)
>>>
>>>
>>> which is:
>>>  val metrics = attemptedTask.flatMap(t => t.metrics)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Oct 17, 2013 at 7:30 PM, dachuan <[email protected]> wrote:
>>>
>>>> thanks, Mark.
>>>>
>>>>
>>>> On Thu, Oct 17, 2013 at 6:36 PM, Mark Hamstra 
>>>> <[email protected]>wrote:
>>>>
>>>>> SNAPSHOTs are not fixed versions, but are floating names associated
>>>>> with whatever is the most recent code.  So, Spark 0.8.0 is the current
>>>>> released version of Spark, which is exactly the same today as it was
>>>>> yesterday, and will be the same thing forever.  Spark 0.8.1-SNAPSHOT is
>>>>> whatever is currently in branch-0.8.  It changes every time new code is
>>>>> committed to that branch (which should be just bug fixes and the few
>>>>> additional features that we wanted to get into 0.8.0, but that didn't 
>>>>> quite
>>>>> make it.)  Not too long from now there will be a release of Spark 0.8.1, 
>>>>> at
>>>>> which time the SNAPSHOT will got to 0.8.2 and 0.8.1 will be forever 
>>>>> frozen.
>>>>>  Meanwhile, the wild new development is taking place on the master branch,
>>>>> and whatever is currently in that branch becomes 0.9.0-SNAPSHOT.  This
>>>>> could be quite different from day to day, and there are no guarantees that
>>>>> things won't be broken in 0.9.0-SNAPSHOT.  Several months from now there
>>>>> will be a release of Spark 0.9.0 (unless the decision is made to bump the
>>>>> version to 1.0.0), at which point the SNAPSHOT goes to 0.9.1 and the whole
>>>>> process advances to the next phase of development.
>>>>>
>>>>> The short answer is that releases are stable, SNAPSHOTs are not, and
>>>>> SNAPSHOTs that aren't on maintenance branches can break things.  You make
>>>>> your choice of which to use and pay the consequences.
>>>>>
>>>>>
>>>>> On Thu, Oct 17, 2013 at 3:18 PM, dachuan <[email protected]> wrote:
>>>>>
>>>>>> yeah, I mean 0.9.0-SNAPSHOT. I use git clone and that's what I got..
>>>>>> what's the difference? I mean SNAPSHOT and non-SNAPSHOT.
>>>>>>
>>>>>>
>>>>>> On Thu, Oct 17, 2013 at 6:15 PM, Mark Hamstra <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Of course, you mean 0.9.0-SNAPSHOT.  There is no Spark 0.9.0, and
>>>>>>> won't be for several months.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Oct 17, 2013 at 3:11 PM, dachuan <[email protected]> wrote:
>>>>>>>
>>>>>>>> I'm sorry if this doesn't answer your question directly, but I have
>>>>>>>> tried spark 0.9.0 and hdfs 1.0.4 just now, it works..
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Oct 17, 2013 at 6:05 PM, Koert Kuipers 
>>>>>>>> <[email protected]>wrote:
>>>>>>>>
>>>>>>>>> after upgrading from spark 0.7 to spark 0.8 i can no longer access
>>>>>>>>> any files on HDFS.
>>>>>>>>>  i see the error below. any ideas?
>>>>>>>>>
>>>>>>>>> i am running spark standalone on a cluster that also has CDH4.3.0
>>>>>>>>> and rebuild spark accordingly. the jars in lib_managed look good to 
>>>>>>>>> me.
>>>>>>>>>
>>>>>>>>> i noticed similar errors in the mailing list but found no
>>>>>>>>> suggested solutions.
>>>>>>>>>
>>>>>>>>> thanks! koert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> 13/10/17 17:43:23 ERROR Executor: Exception in task ID 0
>>>>>>>>> java.io.EOFException
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2703)
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectInputStream.readFully(ObjectInputStream.java:1008)
>>>>>>>>>       at 
>>>>>>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
>>>>>>>>>       at 
>>>>>>>>> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
>>>>>>>>>       at org.apache.hadoop.io.UTF8.readChars(UTF8.java:258)
>>>>>>>>>       at org.apache.hadoop.io.UTF8.readString(UTF8.java:250)
>>>>>>>>>       at 
>>>>>>>>> org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87)
>>>>>>>>>       at 
>>>>>>>>> org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280)
>>>>>>>>>       at 
>>>>>>>>> org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75)
>>>>>>>>>       at 
>>>>>>>>> org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39)
>>>>>>>>>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>>>>>       at 
>>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>>>>>>       at 
>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>>>>>>       at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969)
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1852)
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1950)
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1874)
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756)
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>>>>>>>>       at 
>>>>>>>>> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:135)
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1795)
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1754)
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326)
>>>>>>>>>       at 
>>>>>>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348)
>>>>>>>>>       at 
>>>>>>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39)
>>>>>>>>>       at 
>>>>>>>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:61)
>>>>>>>>>       at 
>>>>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:153)
>>>>>>>>>       at 
>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>>>>>>>>>       at 
>>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>>>>>>>>>       at java.lang.Thread.run(Thread.java:662)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Dachuan Huang
>>>>>>>> Cellphone: 614-390-7234
>>>>>>>> 2015 Neil Avenue
>>>>>>>> Ohio State University
>>>>>>>> Columbus, Ohio
>>>>>>>> U.S.A.
>>>>>>>> 43210
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Dachuan Huang
>>>>>> Cellphone: 614-390-7234
>>>>>> 2015 Neil Avenue
>>>>>> Ohio State University
>>>>>> Columbus, Ohio
>>>>>> U.S.A.
>>>>>> 43210
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Dachuan Huang
>>>> Cellphone: 614-390-7234
>>>> 2015 Neil Avenue
>>>> Ohio State University
>>>> Columbus, Ohio
>>>> U.S.A.
>>>> 43210
>>>>
>>>
>>>
>>>
>>
>

Re: spark 0.8

Reply via email to