Thanks, good to know.

On Wednesday, August 24, 2016 at 12:52:23 AM UTC-7, P. Oscar Boykin wrote:
>
> Yes, by default scalding attempts map-side aggregation of any commutative 
> operation (which we assume ToList to be since there is no ordering here).
>
> Your solution is a fine one here. Another solution is to turn down the 
> size of the map-side cache (see Config.scala for options on this). Another 
> approach is to use a different map-side cache that automatically tunes its 
> own size based on memory usage and cache hit-rate.
>
> We could disable map-side caching for toList since the usually it will be 
> very unlikely to help (toList is not a information reducing operation). 
> Perhaps that is a good solution to reduce the chance of this problem.
> On Tue, Aug 23, 2016 at 19:56 Kostya Salomatin <[email protected] 
> <javascript:>> wrote:
>
>> Hey scalding pros,
>>
>> I've got a strange java heap space issue in my mapper. I've got a fix 
>> that helps, but I would like to understand better what is going on under 
>> the hood, why my fix helps and whether there is an alternative solution 
>> (e.g. changing job parameters). This the code in question
>>
>> pipe
>>   .map { candidateSet => (candidateSet.key, candidateSet.candidates) }
>>   .collect { case (Some(key), candidates) => (key, candidates) }
>>   .group
>>   //.forceToReducers - adding this line solves the problem
>>   .toList // this does not cause the issue, the rows have unique keys
>>
>>   .mapValues {_.flatten}
>>
>>
>> After this group the pipe is joined with another pipe using the same key, 
>> so I keep it as UnsortedGroupped[K,V]
>>
>> The data has unique keys, so there are no map side reductions, and 
>> .toList call is actually redundant. My guess is that mapper tries to 
>> execute some map-side sorting / data optimization and this is what causes 
>> problems. The default amount of memory is sufficient for all job overheads 
>> (works fine for lots of other jobs), just to be sure I increased the heap 
>> size significantly and it did not help.
>>
>> .forceToReducers solves the problem, it was my semi-intelligent guess, I 
>> expected this call to turn off some mapper logic that was redundant in case 
>> of unique keys, but still I don't understand why exactly it helped. Could 
>> be the way the input data is buffered and sorted in memory.
>>
>> Any ideas?
>>
>> Thanks,
>> Kostya
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Scalding Development" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Scalding Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to