> It wasn't so much the cogroup that was optimized here, but what is
> done to the result of cogroup.

Right.

> Yes, it was a matter of not materializing the entire result of a
> flatMap-like function after the cogroup, since this will accept just
> an Iterator (actually, TraversableOnce).

Yeah...I was trying to poke around, are the Iterables that Spark passes
into cogroup already materialized (e.g. the bug was making a copy of
an already-in-memory list) or are the Iterables streaming?

I know the Seq->Iterable change was made so that they could one day be
streaming, but I don't know whether that has happened yet or not...

> I think this may also be a case where Scala's lazy collections (with
> .view) could be useful?

Probably? Do you mean within user code, or that Spark would pass in an
already-lazy collection?

I think the original PR had Seq->Iterator (?) which seems like it would
be safer in this regard, but I don't understand the nuances yet.

- Stephen


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to