> It wasn't so much the cogroup that was optimized here, but what is > done to the result of cogroup.
Right. > Yes, it was a matter of not materializing the entire result of a > flatMap-like function after the cogroup, since this will accept just > an Iterator (actually, TraversableOnce). Yeah...I was trying to poke around, are the Iterables that Spark passes into cogroup already materialized (e.g. the bug was making a copy of an already-in-memory list) or are the Iterables streaming? I know the Seq->Iterable change was made so that they could one day be streaming, but I don't know whether that has happened yet or not... > I think this may also be a case where Scala's lazy collections (with > .view) could be useful? Probably? Do you mean within user code, or that Spark would pass in an already-lazy collection? I think the original PR had Seq->Iterator (?) which seems like it would be safer in this regard, but I don't understand the nuances yet. - Stephen --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org