Re: union and reduceByKey wrong shuffle?

2015-06-02 Thread Josh Rosen
Ah, interesting.  While working on my new Tungsten shuffle manager, I came
up with some nice testing interfaces for allowing me to manually trigger
spills in order to deterministically test those code paths without
requiring large amounts of data to be shuffled.  Maybe I could make similar
test interface changes to the existing shuffle code, which might make it
easier to reproduce this in an isolated environment.

On Mon, Jun 1, 2015 at 11:41 PM, Igor Berman igor.ber...@gmail.com wrote:

 Hi,
 small mock data doesn't reproduce the problem. IMHO problem is reproduced
 when we make shuffle big enough to split data into disk.
 We will work on it to understand and reproduce the problem(not first
 priority though...)


 On 1 June 2015 at 23:02, Josh Rosen rosenvi...@gmail.com wrote:

 How much work is to produce a small standalone reproduction?  Can you
 create an Avro file with some mock data, maybe 10 or so records, then
 reproduce this locally?

 On Mon, Jun 1, 2015 at 12:31 PM, Igor Berman igor.ber...@gmail.com
 wrote:

 switching to use simple pojos instead of using avro for spark
 serialization solved the problem(I mean reading avro from s3 and than
 mapping each avro object to it's pojo serializable counterpart with same
 fields, pojo is registered withing kryo)
 Any thought where to look for a problem/misconfiguration?

 On 31 May 2015 at 22:48, Igor Berman igor.ber...@gmail.com wrote:

 Hi
 We are using spark 1.3.1
 Avro-chill (tomorrow will check if its important) we register avro
 classes from java
 Avro 1.7.6
 On May 31, 2015 22:37, Josh Rosen rosenvi...@gmail.com wrote:

 Which Spark version are you using?  I'd like to understand whether
 this change could be caused by recent Kryo serializer re-use changes in
 master / Spark 1.4.

 On Sun, May 31, 2015 at 11:31 AM, igor.berman igor.ber...@gmail.com
 wrote:

 after investigation the problem is somehow connected to avro
 serialization
 with kryo + chill-avro(mapping avro object to simple scala case class
 and
 running reduce on these case class objects solves the problem)




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/union-and-reduceByKey-wrong-shuffle-tp23092p23093.html
 Sent from the Apache Spark User List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org








Re: union and reduceByKey wrong shuffle?

2015-06-02 Thread Igor Berman
Hi,
small mock data doesn't reproduce the problem. IMHO problem is reproduced
when we make shuffle big enough to split data into disk.
We will work on it to understand and reproduce the problem(not first
priority though...)


On 1 June 2015 at 23:02, Josh Rosen rosenvi...@gmail.com wrote:

 How much work is to produce a small standalone reproduction?  Can you
 create an Avro file with some mock data, maybe 10 or so records, then
 reproduce this locally?

 On Mon, Jun 1, 2015 at 12:31 PM, Igor Berman igor.ber...@gmail.com
 wrote:

 switching to use simple pojos instead of using avro for spark
 serialization solved the problem(I mean reading avro from s3 and than
 mapping each avro object to it's pojo serializable counterpart with same
 fields, pojo is registered withing kryo)
 Any thought where to look for a problem/misconfiguration?

 On 31 May 2015 at 22:48, Igor Berman igor.ber...@gmail.com wrote:

 Hi
 We are using spark 1.3.1
 Avro-chill (tomorrow will check if its important) we register avro
 classes from java
 Avro 1.7.6
 On May 31, 2015 22:37, Josh Rosen rosenvi...@gmail.com wrote:

 Which Spark version are you using?  I'd like to understand whether this
 change could be caused by recent Kryo serializer re-use changes in master /
 Spark 1.4.

 On Sun, May 31, 2015 at 11:31 AM, igor.berman igor.ber...@gmail.com
 wrote:

 after investigation the problem is somehow connected to avro
 serialization
 with kryo + chill-avro(mapping avro object to simple scala case class
 and
 running reduce on these case class objects solves the problem)




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/union-and-reduceByKey-wrong-shuffle-tp23092p23093.html
 Sent from the Apache Spark User List mailing list archive at
 Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org







Re: union and reduceByKey wrong shuffle?

2015-06-01 Thread Igor Berman
switching to use simple pojos instead of using avro for spark serialization
solved the problem(I mean reading avro from s3 and than mapping each avro
object to it's pojo serializable counterpart with same fields, pojo is
registered withing kryo)
Any thought where to look for a problem/misconfiguration?

On 31 May 2015 at 22:48, Igor Berman igor.ber...@gmail.com wrote:

 Hi
 We are using spark 1.3.1
 Avro-chill (tomorrow will check if its important) we register avro classes
 from java
 Avro 1.7.6
 On May 31, 2015 22:37, Josh Rosen rosenvi...@gmail.com wrote:

 Which Spark version are you using?  I'd like to understand whether this
 change could be caused by recent Kryo serializer re-use changes in master /
 Spark 1.4.

 On Sun, May 31, 2015 at 11:31 AM, igor.berman igor.ber...@gmail.com
 wrote:

 after investigation the problem is somehow connected to avro
 serialization
 with kryo + chill-avro(mapping avro object to simple scala case class and
 running reduce on these case class objects solves the problem)




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/union-and-reduceByKey-wrong-shuffle-tp23092p23093.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: union and reduceByKey wrong shuffle?

2015-06-01 Thread Josh Rosen
How much work is to produce a small standalone reproduction?  Can you
create an Avro file with some mock data, maybe 10 or so records, then
reproduce this locally?

On Mon, Jun 1, 2015 at 12:31 PM, Igor Berman igor.ber...@gmail.com wrote:

 switching to use simple pojos instead of using avro for spark
 serialization solved the problem(I mean reading avro from s3 and than
 mapping each avro object to it's pojo serializable counterpart with same
 fields, pojo is registered withing kryo)
 Any thought where to look for a problem/misconfiguration?

 On 31 May 2015 at 22:48, Igor Berman igor.ber...@gmail.com wrote:

 Hi
 We are using spark 1.3.1
 Avro-chill (tomorrow will check if its important) we register avro
 classes from java
 Avro 1.7.6
 On May 31, 2015 22:37, Josh Rosen rosenvi...@gmail.com wrote:

 Which Spark version are you using?  I'd like to understand whether this
 change could be caused by recent Kryo serializer re-use changes in master /
 Spark 1.4.

 On Sun, May 31, 2015 at 11:31 AM, igor.berman igor.ber...@gmail.com
 wrote:

 after investigation the problem is somehow connected to avro
 serialization
 with kryo + chill-avro(mapping avro object to simple scala case class
 and
 running reduce on these case class objects solves the problem)




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/union-and-reduceByKey-wrong-shuffle-tp23092p23093.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org






Re: union and reduceByKey wrong shuffle?

2015-05-31 Thread Igor Berman
Hi
We are using spark 1.3.1
Avro-chill (tomorrow will check if its important) we register avro classes
from java
Avro 1.7.6
On May 31, 2015 22:37, Josh Rosen rosenvi...@gmail.com wrote:

 Which Spark version are you using?  I'd like to understand whether this
 change could be caused by recent Kryo serializer re-use changes in master /
 Spark 1.4.

 On Sun, May 31, 2015 at 11:31 AM, igor.berman igor.ber...@gmail.com
 wrote:

 after investigation the problem is somehow connected to avro serialization
 with kryo + chill-avro(mapping avro object to simple scala case class and
 running reduce on these case class objects solves the problem)




 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/union-and-reduceByKey-wrong-shuffle-tp23092p23093.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org