sorry one more related question: i compile against a spark build for hadoop 1.0.4, but the actual installed version of spark is build against cdh4.3.0-mr1. this also used to work, and i prefer to do this so i compile against a generic spark build. could this be the issue?
On Thu, Oct 17, 2013 at 8:06 PM, Koert Kuipers <[email protected]> wrote: > i have my spark and hadoop related dependencies as "provided" for my spark > job. this used to work with previous versions. are these now supposed to be > compile/runtime/default dependencies? > > > On Thu, Oct 17, 2013 at 8:04 PM, Koert Kuipers <[email protected]> wrote: > >> yes i did that and i can see the correct jars sitting in lib_managed >> >> >> On Thu, Oct 17, 2013 at 7:56 PM, Matei Zaharia >> <[email protected]>wrote: >> >>> Koert, did you link your Spark job to the right version of HDFS as well? >>> In Spark 0.8, you have to add a Maven dependency on "hadoop-client" for >>> your version of Hadoop. See >>> http://spark.incubator.apache.org/docs/latest/quick-start.html#a-standalone-app-in-scala >>> for >>> example. >>> >>> Matei >>> >>> On Oct 17, 2013, at 4:38 PM, Koert Kuipers <[email protected]> wrote: >>> >>> i got the job a little further along by also setting this: >>> System.setProperty("spark.closure.serializer", >>> "org.apache.spark.serializer.KryoSerializer") >>> >>> not sure why i need to... but anyhow, now my workers start and then they >>> blow up on this: >>> >>> 13/10/17 19:22:57 ERROR Executor: Uncaught exception in thread >>> Thread[pool-5-thread-1,5,main] >>> java.lang.NullPointerException >>> at >>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:187) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) >>> at >>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) >>> at java.lang.Thread.run(Thread.java:662) >>> >>> >>> which is: >>> val metrics = attemptedTask.flatMap(t => t.metrics) >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> On Thu, Oct 17, 2013 at 7:30 PM, dachuan <[email protected]> wrote: >>> >>>> thanks, Mark. >>>> >>>> >>>> On Thu, Oct 17, 2013 at 6:36 PM, Mark Hamstra >>>> <[email protected]>wrote: >>>> >>>>> SNAPSHOTs are not fixed versions, but are floating names associated >>>>> with whatever is the most recent code. So, Spark 0.8.0 is the current >>>>> released version of Spark, which is exactly the same today as it was >>>>> yesterday, and will be the same thing forever. Spark 0.8.1-SNAPSHOT is >>>>> whatever is currently in branch-0.8. It changes every time new code is >>>>> committed to that branch (which should be just bug fixes and the few >>>>> additional features that we wanted to get into 0.8.0, but that didn't >>>>> quite >>>>> make it.) Not too long from now there will be a release of Spark 0.8.1, >>>>> at >>>>> which time the SNAPSHOT will got to 0.8.2 and 0.8.1 will be forever >>>>> frozen. >>>>> Meanwhile, the wild new development is taking place on the master branch, >>>>> and whatever is currently in that branch becomes 0.9.0-SNAPSHOT. This >>>>> could be quite different from day to day, and there are no guarantees that >>>>> things won't be broken in 0.9.0-SNAPSHOT. Several months from now there >>>>> will be a release of Spark 0.9.0 (unless the decision is made to bump the >>>>> version to 1.0.0), at which point the SNAPSHOT goes to 0.9.1 and the whole >>>>> process advances to the next phase of development. >>>>> >>>>> The short answer is that releases are stable, SNAPSHOTs are not, and >>>>> SNAPSHOTs that aren't on maintenance branches can break things. You make >>>>> your choice of which to use and pay the consequences. >>>>> >>>>> >>>>> On Thu, Oct 17, 2013 at 3:18 PM, dachuan <[email protected]> wrote: >>>>> >>>>>> yeah, I mean 0.9.0-SNAPSHOT. I use git clone and that's what I got.. >>>>>> what's the difference? I mean SNAPSHOT and non-SNAPSHOT. >>>>>> >>>>>> >>>>>> On Thu, Oct 17, 2013 at 6:15 PM, Mark Hamstra < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Of course, you mean 0.9.0-SNAPSHOT. There is no Spark 0.9.0, and >>>>>>> won't be for several months. >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Thu, Oct 17, 2013 at 3:11 PM, dachuan <[email protected]> wrote: >>>>>>> >>>>>>>> I'm sorry if this doesn't answer your question directly, but I have >>>>>>>> tried spark 0.9.0 and hdfs 1.0.4 just now, it works.. >>>>>>>> >>>>>>>> >>>>>>>> On Thu, Oct 17, 2013 at 6:05 PM, Koert Kuipers >>>>>>>> <[email protected]>wrote: >>>>>>>> >>>>>>>>> after upgrading from spark 0.7 to spark 0.8 i can no longer access >>>>>>>>> any files on HDFS. >>>>>>>>> i see the error below. any ideas? >>>>>>>>> >>>>>>>>> i am running spark standalone on a cluster that also has CDH4.3.0 >>>>>>>>> and rebuild spark accordingly. the jars in lib_managed look good to >>>>>>>>> me. >>>>>>>>> >>>>>>>>> i noticed similar errors in the mailing list but found no >>>>>>>>> suggested solutions. >>>>>>>>> >>>>>>>>> thanks! koert >>>>>>>>> >>>>>>>>> >>>>>>>>> 13/10/17 17:43:23 ERROR Executor: Exception in task ID 0 >>>>>>>>> java.io.EOFException >>>>>>>>> at >>>>>>>>> java.io.ObjectInputStream$BlockDataInputStream.readFully(ObjectInputStream.java:2703) >>>>>>>>> at >>>>>>>>> java.io.ObjectInputStream.readFully(ObjectInputStream.java:1008) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106) >>>>>>>>> at org.apache.hadoop.io.UTF8.readChars(UTF8.java:258) >>>>>>>>> at org.apache.hadoop.io.UTF8.readString(UTF8.java:250) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.mapred.FileSplit.readFields(FileSplit.java:87) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:280) >>>>>>>>> at >>>>>>>>> org.apache.hadoop.io.ObjectWritable.readFields(ObjectWritable.java:75) >>>>>>>>> at >>>>>>>>> org.apache.spark.SerializableWritable.readObject(SerializableWritable.scala:39) >>>>>>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>>>>>> at >>>>>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >>>>>>>>> at >>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>>>>>>>> at java.lang.reflect.Method.invoke(Method.java:597) >>>>>>>>> at >>>>>>>>> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:969) >>>>>>>>> at >>>>>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1852) >>>>>>>>> at >>>>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756) >>>>>>>>> at >>>>>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326) >>>>>>>>> at >>>>>>>>> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1950) >>>>>>>>> at >>>>>>>>> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1874) >>>>>>>>> at >>>>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1756) >>>>>>>>> at >>>>>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326) >>>>>>>>> at >>>>>>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) >>>>>>>>> at >>>>>>>>> org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:135) >>>>>>>>> at >>>>>>>>> java.io.ObjectInputStream.readExternalData(ObjectInputStream.java:1795) >>>>>>>>> at >>>>>>>>> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1754) >>>>>>>>> at >>>>>>>>> java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1326) >>>>>>>>> at >>>>>>>>> java.io.ObjectInputStream.readObject(ObjectInputStream.java:348) >>>>>>>>> at >>>>>>>>> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:39) >>>>>>>>> at >>>>>>>>> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:61) >>>>>>>>> at >>>>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:153) >>>>>>>>> at >>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895) >>>>>>>>> at >>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918) >>>>>>>>> at java.lang.Thread.run(Thread.java:662) >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Dachuan Huang >>>>>>>> Cellphone: 614-390-7234 >>>>>>>> 2015 Neil Avenue >>>>>>>> Ohio State University >>>>>>>> Columbus, Ohio >>>>>>>> U.S.A. >>>>>>>> 43210 >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Dachuan Huang >>>>>> Cellphone: 614-390-7234 >>>>>> 2015 Neil Avenue >>>>>> Ohio State University >>>>>> Columbus, Ohio >>>>>> U.S.A. >>>>>> 43210 >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Dachuan Huang >>>> Cellphone: 614-390-7234 >>>> 2015 Neil Avenue >>>> Ohio State University >>>> Columbus, Ohio >>>> U.S.A. >>>> 43210 >>>> >>> >>> >>> >> >
