Re: Problem with groupBy and OOM when just writing the group in a file

2015-03-30 Thread Sean Owen
The behavior is the same. I am not sure it's a problem as much as design decision. It does not require everything to stay in memory, but the values for one key at a time. Have a look at how the preceding shuffle works. Consider repartitionAndSortWithinPartition to *partition* by hour and then

Problem with groupBy and OOM when just writing the group in a file

2015-03-30 Thread Mario Pastorelli
we are experiencing some problems with the groupBy operations when used to group together data that will be written in the same file. The operation that we want to do is the following: given some data with a timestamp, we want to sort it by timestamp, group it by hour and write one file per

Re: Problem with groupBy and OOM when just writing the group in a file

2015-03-30 Thread Mario Pastorelli
I worked, thank you. On 30.03.2015 11:58, Sean Owen wrote: The behavior is the same. I am not sure it's a problem as much as design decision. It does not require everything to stay in memory, but the values for one key at a time. Have a look at how the preceding shuffle works. Consider