Hi Patrick,
   I think this might be data related and about edge condition handling as
I only get a single partition repeatedly throw exception on
externalappendonlymap's iterator.  I will file a jira as soon as I can
isolate the problem. Btw, the test is intentionally abuse the external sort
to see its performance impact on real application, because I have trouble
to configure a right partition number for each dataset.

Best Regards,
Jiacheng Guo


On Mon, Jan 27, 2014 at 6:16 AM, Patrick Wendell <[email protected]> wrote:

> Hey There,
>
> So one thing you can do is disable the external sorting, this should
> preserve the behavior exactly was it was in previous releases.
>
> It's quite possible that the problem you are having relates to the
> fact that you have individual records that are 1GB in size. This is a
> pretty extreme case that may violate assumptions in the implementation
> of the external aggregation code.
>
> Would you mind opening a Jira for this? Also, if you are able to find
> an isolated way to recreate the behavior it will make it easier to
> debug and fix.
>
> IIRC, even with external aggregation Spark still materializes the
> final combined output *for a given key* in memory. If you are
> outputting GB of data for a single key, then you might also look into
> a different parallelization strategy for your algorithm. Not sure if
> this is also an issue though...
>
> - Patrick
>
> On Sun, Jan 26, 2014 at 2:27 AM, guojc <[email protected]> wrote:
> > Hi Patrick,
> >     I still get the exception on lastest master
> > 05be7047744c88e64e7e6bd973f9bcfacd00da5f. A bit more info on the subject.
> > I'm using KryoSerialzation with a custom serialization function, and the
> > exception come from a rdd operation
> >
> combineByKey(createDict,combineKey,mergeDict,partitioner,true,"org.apache.spark.serializer.KryoSerializer").
> > All previous operation seems ok. The only difference is that this
> operation
> > can generate some a large dict object around 1 gb size.  I hope this can
> > give you some clue what might go wrong.  I'm still having trouble figure
> out
> > the cause.
> >
> > Thanks,
> > Jiacheng Guo
> >
> >
> > On Wed, Jan 22, 2014 at 1:36 PM, Patrick Wendell <[email protected]>
> wrote:
> >>
> >> This code has been modified since you reported this so you may want to
> >> try the current master.
> >>
> >> - Patrick
> >>
> >> On Mon, Jan 20, 2014 at 4:22 AM, guojc <[email protected]> wrote:
> >> > Hi,
> >> >   I'm tring out lastest master branch of spark for the exciting
> external
> >> > hashmap feature. I have a code that is running correctly at spark
> 0.8.1
> >> > and
> >> > I only make a change for its easily to be spilled to disk. However, I
> >> > encounter a few task failure of
> >> > java.util.NoSuchElementException (java.util.NoSuchElementException)
> >> >
> >> >
> org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:277)org.apache.spark.util.collection.ExternalAppendOnlyMap$ExternalIterator.next(ExternalAppendOnlyMap.scala:212)org.apache.spark.InterruptibleIterator.next(InterruptibleIterator.scala:29)
> >> > And the job seems to fail to recover.
> >> > Can anyone give some suggestion on how to investigate the issue?
> >> > Thanks,Jiacheng Guo
> >
> >
>

Reply via email to