Re: union and reduceByKey wrong shuffle?

Igor Berman Fri, 05 Jun 2015 06:24:13 -0700

this jira seems to be connected to our issue
https://issues.apache.org/jira/browse/SPARK-1018


On 2 June 2015 at 19:54, Josh Rosen <[email protected]> wrote:

> Ah, interesting.  While working on my new Tungsten shuffle manager, I came
> up with some nice testing interfaces for allowing me to manually trigger
> spills in order to deterministically test those code paths without
> requiring large amounts of data to be shuffled.  Maybe I could make similar
> test interface changes to the existing shuffle code, which might make it
> easier to reproduce this in an isolated environment.
>
> On Mon, Jun 1, 2015 at 11:41 PM, Igor Berman <[email protected]>
> wrote:
>
>> Hi,
>> small mock data doesn't reproduce the problem. IMHO problem is reproduced
>> when we make shuffle big enough to split data into disk.
>> We will work on it to understand and reproduce the problem(not first
>> priority though...)
>>
>>
>> On 1 June 2015 at 23:02, Josh Rosen <[email protected]> wrote:
>>
>>> How much work is to produce a small standalone reproduction?  Can you
>>> create an Avro file with some mock data, maybe 10 or so records, then
>>> reproduce this locally?
>>>
>>> On Mon, Jun 1, 2015 at 12:31 PM, Igor Berman <[email protected]>
>>> wrote:
>>>
>>>> switching to use simple pojos instead of using avro for spark
>>>> serialization solved the problem(I mean reading avro from s3 and than
>>>> mapping each avro object to it's pojo serializable counterpart with same
>>>> fields, pojo is registered withing kryo)
>>>> Any thought where to look for a problem/misconfiguration?
>>>>
>>>> On 31 May 2015 at 22:48, Igor Berman <[email protected]> wrote:
>>>>
>>>>> Hi
>>>>> We are using spark 1.3.1
>>>>> Avro-chill (tomorrow will check if its important) we register avro
>>>>> classes from java
>>>>> Avro 1.7.6
>>>>> On May 31, 2015 22:37, "Josh Rosen" <[email protected]> wrote:
>>>>>
>>>>>> Which Spark version are you using?  I'd like to understand whether
>>>>>> this change could be caused by recent Kryo serializer re-use changes in
>>>>>> master / Spark 1.4.
>>>>>>
>>>>>> On Sun, May 31, 2015 at 11:31 AM, igor.berman <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> after investigation the problem is somehow connected to avro
>>>>>>> serialization
>>>>>>> with kryo + chill-avro(mapping avro object to simple scala case
>>>>>>> class and
>>>>>>> running reduce on these case class objects solves the problem)
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> View this message in context:
>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/union-and-reduceByKey-wrong-shuffle-tp23092p23093.html
>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>> Nabble.com.
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: union and reduceByKey wrong shuffle?

Reply via email to