Re: union and reduceByKey wrong shuffle?
Ah, interesting. While working on my new Tungsten shuffle manager, I came up with some nice testing interfaces for allowing me to manually trigger spills in order to deterministically test those code paths without requiring large amounts of data to be shuffled. Maybe I could make similar test interface changes to the existing shuffle code, which might make it easier to reproduce this in an isolated environment. On Mon, Jun 1, 2015 at 11:41 PM, Igor Berman igor.ber...@gmail.com wrote: Hi, small mock data doesn't reproduce the problem. IMHO problem is reproduced when we make shuffle big enough to split data into disk. We will work on it to understand and reproduce the problem(not first priority though...) On 1 June 2015 at 23:02, Josh Rosen rosenvi...@gmail.com wrote: How much work is to produce a small standalone reproduction? Can you create an Avro file with some mock data, maybe 10 or so records, then reproduce this locally? On Mon, Jun 1, 2015 at 12:31 PM, Igor Berman igor.ber...@gmail.com wrote: switching to use simple pojos instead of using avro for spark serialization solved the problem(I mean reading avro from s3 and than mapping each avro object to it's pojo serializable counterpart with same fields, pojo is registered withing kryo) Any thought where to look for a problem/misconfiguration? On 31 May 2015 at 22:48, Igor Berman igor.ber...@gmail.com wrote: Hi We are using spark 1.3.1 Avro-chill (tomorrow will check if its important) we register avro classes from java Avro 1.7.6 On May 31, 2015 22:37, Josh Rosen rosenvi...@gmail.com wrote: Which Spark version are you using? I'd like to understand whether this change could be caused by recent Kryo serializer re-use changes in master / Spark 1.4. On Sun, May 31, 2015 at 11:31 AM, igor.berman igor.ber...@gmail.com wrote: after investigation the problem is somehow connected to avro serialization with kryo + chill-avro(mapping avro object to simple scala case class and running reduce on these case class objects solves the problem) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/union-and-reduceByKey-wrong-shuffle-tp23092p23093.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: union and reduceByKey wrong shuffle?
Hi, small mock data doesn't reproduce the problem. IMHO problem is reproduced when we make shuffle big enough to split data into disk. We will work on it to understand and reproduce the problem(not first priority though...) On 1 June 2015 at 23:02, Josh Rosen rosenvi...@gmail.com wrote: How much work is to produce a small standalone reproduction? Can you create an Avro file with some mock data, maybe 10 or so records, then reproduce this locally? On Mon, Jun 1, 2015 at 12:31 PM, Igor Berman igor.ber...@gmail.com wrote: switching to use simple pojos instead of using avro for spark serialization solved the problem(I mean reading avro from s3 and than mapping each avro object to it's pojo serializable counterpart with same fields, pojo is registered withing kryo) Any thought where to look for a problem/misconfiguration? On 31 May 2015 at 22:48, Igor Berman igor.ber...@gmail.com wrote: Hi We are using spark 1.3.1 Avro-chill (tomorrow will check if its important) we register avro classes from java Avro 1.7.6 On May 31, 2015 22:37, Josh Rosen rosenvi...@gmail.com wrote: Which Spark version are you using? I'd like to understand whether this change could be caused by recent Kryo serializer re-use changes in master / Spark 1.4. On Sun, May 31, 2015 at 11:31 AM, igor.berman igor.ber...@gmail.com wrote: after investigation the problem is somehow connected to avro serialization with kryo + chill-avro(mapping avro object to simple scala case class and running reduce on these case class objects solves the problem) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/union-and-reduceByKey-wrong-shuffle-tp23092p23093.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: union and reduceByKey wrong shuffle?
switching to use simple pojos instead of using avro for spark serialization solved the problem(I mean reading avro from s3 and than mapping each avro object to it's pojo serializable counterpart with same fields, pojo is registered withing kryo) Any thought where to look for a problem/misconfiguration? On 31 May 2015 at 22:48, Igor Berman igor.ber...@gmail.com wrote: Hi We are using spark 1.3.1 Avro-chill (tomorrow will check if its important) we register avro classes from java Avro 1.7.6 On May 31, 2015 22:37, Josh Rosen rosenvi...@gmail.com wrote: Which Spark version are you using? I'd like to understand whether this change could be caused by recent Kryo serializer re-use changes in master / Spark 1.4. On Sun, May 31, 2015 at 11:31 AM, igor.berman igor.ber...@gmail.com wrote: after investigation the problem is somehow connected to avro serialization with kryo + chill-avro(mapping avro object to simple scala case class and running reduce on these case class objects solves the problem) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/union-and-reduceByKey-wrong-shuffle-tp23092p23093.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: union and reduceByKey wrong shuffle?
How much work is to produce a small standalone reproduction? Can you create an Avro file with some mock data, maybe 10 or so records, then reproduce this locally? On Mon, Jun 1, 2015 at 12:31 PM, Igor Berman igor.ber...@gmail.com wrote: switching to use simple pojos instead of using avro for spark serialization solved the problem(I mean reading avro from s3 and than mapping each avro object to it's pojo serializable counterpart with same fields, pojo is registered withing kryo) Any thought where to look for a problem/misconfiguration? On 31 May 2015 at 22:48, Igor Berman igor.ber...@gmail.com wrote: Hi We are using spark 1.3.1 Avro-chill (tomorrow will check if its important) we register avro classes from java Avro 1.7.6 On May 31, 2015 22:37, Josh Rosen rosenvi...@gmail.com wrote: Which Spark version are you using? I'd like to understand whether this change could be caused by recent Kryo serializer re-use changes in master / Spark 1.4. On Sun, May 31, 2015 at 11:31 AM, igor.berman igor.ber...@gmail.com wrote: after investigation the problem is somehow connected to avro serialization with kryo + chill-avro(mapping avro object to simple scala case class and running reduce on these case class objects solves the problem) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/union-and-reduceByKey-wrong-shuffle-tp23092p23093.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: union and reduceByKey wrong shuffle?
Hi We are using spark 1.3.1 Avro-chill (tomorrow will check if its important) we register avro classes from java Avro 1.7.6 On May 31, 2015 22:37, Josh Rosen rosenvi...@gmail.com wrote: Which Spark version are you using? I'd like to understand whether this change could be caused by recent Kryo serializer re-use changes in master / Spark 1.4. On Sun, May 31, 2015 at 11:31 AM, igor.berman igor.ber...@gmail.com wrote: after investigation the problem is somehow connected to avro serialization with kryo + chill-avro(mapping avro object to simple scala case class and running reduce on these case class objects solves the problem) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/union-and-reduceByKey-wrong-shuffle-tp23092p23093.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org