this jira seems to be connected to our issue https://issues.apache.org/jira/browse/SPARK-1018
On 2 June 2015 at 19:54, Josh Rosen <rosenvi...@gmail.com> wrote: > Ah, interesting. While working on my new Tungsten shuffle manager, I came > up with some nice testing interfaces for allowing me to manually trigger > spills in order to deterministically test those code paths without > requiring large amounts of data to be shuffled. Maybe I could make similar > test interface changes to the existing shuffle code, which might make it > easier to reproduce this in an isolated environment. > > On Mon, Jun 1, 2015 at 11:41 PM, Igor Berman <igor.ber...@gmail.com> > wrote: > >> Hi, >> small mock data doesn't reproduce the problem. IMHO problem is reproduced >> when we make shuffle big enough to split data into disk. >> We will work on it to understand and reproduce the problem(not first >> priority though...) >> >> >> On 1 June 2015 at 23:02, Josh Rosen <rosenvi...@gmail.com> wrote: >> >>> How much work is to produce a small standalone reproduction? Can you >>> create an Avro file with some mock data, maybe 10 or so records, then >>> reproduce this locally? >>> >>> On Mon, Jun 1, 2015 at 12:31 PM, Igor Berman <igor.ber...@gmail.com> >>> wrote: >>> >>>> switching to use simple pojos instead of using avro for spark >>>> serialization solved the problem(I mean reading avro from s3 and than >>>> mapping each avro object to it's pojo serializable counterpart with same >>>> fields, pojo is registered withing kryo) >>>> Any thought where to look for a problem/misconfiguration? >>>> >>>> On 31 May 2015 at 22:48, Igor Berman <igor.ber...@gmail.com> wrote: >>>> >>>>> Hi >>>>> We are using spark 1.3.1 >>>>> Avro-chill (tomorrow will check if its important) we register avro >>>>> classes from java >>>>> Avro 1.7.6 >>>>> On May 31, 2015 22:37, "Josh Rosen" <rosenvi...@gmail.com> wrote: >>>>> >>>>>> Which Spark version are you using? I'd like to understand whether >>>>>> this change could be caused by recent Kryo serializer re-use changes in >>>>>> master / Spark 1.4. >>>>>> >>>>>> On Sun, May 31, 2015 at 11:31 AM, igor.berman <igor.ber...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> after investigation the problem is somehow connected to avro >>>>>>> serialization >>>>>>> with kryo + chill-avro(mapping avro object to simple scala case >>>>>>> class and >>>>>>> running reduce on these case class objects solves the problem) >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> View this message in context: >>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/union-and-reduceByKey-wrong-shuffle-tp23092p23093.html >>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>> Nabble.com. >>>>>>> >>>>>>> --------------------------------------------------------------------- >>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>>> >>>>>>> >>>>>> >>>> >>> >> >