On Mon, Sep 22, 2014 at 10:21 AM, innowireless TaeYun Kim
<taeyun....@innowireless.co.kr> wrote:
> I have to merge the byte[]s that have the same key.
> If merging is done with reduceByKey(), a lot of intermediate byte[] 
> allocation and System.arraycopy() is executed, and it is too slow. So I had 
> to resort to groupByKey(), and in the callback allocate the byte[] that has 
> the total size of the byte[]s, and arraycopy() into it.
> groupByKey() works for this, since the size of the group is manageable in my 
> application.

The problem is that you will first collect and allocate many small
byte[] in memory, and then merge them. If the total size of the
byte[]s is very large, you run out of memory, as you observe. If you
want to do this, use more executor memory. You may find it's not worth
the tradeoff of having more, smaller executors merging pieces of the
overall byte[] array.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to