I can tell you what the environment and rough processes are like: CDH5 Yarn 15 executors (16GB for driver, 8GB for executors) Total cached data about 10GB Shuffled data size per iteration ~1GB. - map followed by groupby followed by map followed by collect I'd imagine that every time map/groupby is called, the environment data that get serialized to the mappers/groupbys are maxed at 250MB. Periodic checkpointing
On Fri, Oct 10, 2014 at 10:34 AM, Davies Liu <dav...@databricks.com> wrote: > Maybe, TorrentBroadcast is more complicated than HttpBroadcast, could > you tell us > how to reproduce this issue? That will help us a lot to improve > TorrentBroadcast. > > Thanks! > > On Fri, Oct 10, 2014 at 8:46 AM, Sung Hwan Chung > <coded...@cs.stanford.edu> wrote: > > I haven't seen this at all since switching to HttpBroadcast. It seems > > TorrentBroadcast might have some issues? > > > > On Thu, Oct 9, 2014 at 4:28 PM, Sung Hwan Chung < > coded...@cs.stanford.edu> > > wrote: > >> > >> I don't think that I saw any other error message. This is all I saw. > >> > >> I'm currently experimenting to see if this can be alleviated by using > >> HttpBroadcastFactory instead of TorrentBroadcast. So far, with > >> HttpBroadcast, I haven't seen this recurring as of yet. I'll keep you > >> posted. > >> > >> On Thu, Oct 9, 2014 at 4:21 PM, Davies Liu <dav...@databricks.com> > wrote: > >>> > >>> This exception should be caused by another one, could you paste all of > >>> them here? > >>> > >>> Also, that will be great if you can provide a script to reproduce this > >>> problem. > >>> > >>> Thanks! > >>> > >>> On Fri, Sep 26, 2014 at 6:11 AM, Arun Ahuja <aahuj...@gmail.com> > wrote: > >>> > Has anyone else seen this erorr in task deserialization? The task is > >>> > processing a small amount of data and doesn't seem to have much data > >>> > hanging > >>> > to the closure? I've only seen this with Spark 1.1 > >>> > > >>> > Job aborted due to stage failure: Task 975 in stage 8.0 failed 4 > times, > >>> > most > >>> > recent failure: Lost task 975.3 in stage 8.0 (TID 24777, host.com): > >>> > java.io.IOException: unexpected exception type > >>> > > >>> > > >>> > > java.io.ObjectStreamClass.throwMiscException(ObjectStreamClass.java:1538) > >>> > > >>> > > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1025) > >>> > > >>> > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893) > >>> > > >>> > > >>> > > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > >>> > > >>> > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > >>> > > >>> > > >>> > > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) > >>> > > >>> > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) > >>> > > >>> > > >>> > > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) > >>> > > >>> > java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) > >>> > > >>> > java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > >>> > > >>> > > >>> > > org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) > >>> > > >>> > > >>> > > org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87) > >>> > > >>> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:159) > >>> > > >>> > > >>> > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > >>> > > >>> > > >>> > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > >>> > java.lang.Thread.run(Thread.java:744) > >>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >>> For additional commands, e-mail: user-h...@spark.apache.org > >>> > >> > > >