Hi Koert,

cogroup is a transformation on RDD and it creates a cogroupRDD and then
perform some transformations on it. When later an action is called, the
compute() method of the cogroupRDD will be called. Roughly speaking, each
element in cogroupRDD is fetched one at a time. Thus the contents of the
two iterables  do not need to fit in memory.

Hope this helps!
Liq

On Mon, Sep 29, 2014 at 4:02 PM, Koert Kuipers <ko...@tresata.com> wrote:

> apologies for asking yet again about spark memory assumptions, but i cant
> seem to keep it in my head.
>
> if i use PairRDDFunctions.cogroup, it returns for every key 2 iterables.
> do the contents of these iterables have to fit in memory? or is the data
> streamed?
>
>


-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst

Reply via email to