Hey, I saw this commit go by, and find it fairly fascinating:
https://github.com/apache/spark/commit/c233ab3d8d75a33495298964fe73dbf7dd8fe305 For two reasons: 1) we have a report that is bogging down exactly in a .join with lots of elements, so, glad to see the fix, but, more interesting I think: 2) If such a subtle bug was lurking in spark-core, it leaves me worried that every time we use .map in our own cogroup code, that we'll be committing the same perf error. Has anyone thought more deeply about whether this is a big deal or not? Should ".iterator.map" vs. ".map" be strongly preferred/best practice for cogroup code? Thanks, Stephen --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org