Yes for RDD -- both are materialized. No for DataFrame/SQL - one side streams.
On Thu, Sep 17, 2015 at 11:21 AM, Koert Kuipers <ko...@tresata.com> wrote: > in scalding we join with the smaller side on the left, since the smaller > side will get buffered while the bigger side streams through the join. > > looking at CoGroupedRDD i do not get the impression such a distiction is > made. it seems both sided are put into a map that can spill to disk. is > this correct? > > thanks >