Thanks a lot for the explanation Matei.

As a matter of fact, I was just reading up on the paper on the Narrow and
Wide Dependencies and saw that groupByKey is indeed a wide dependency which,
as you explained, is the problem.

Maybe it wouldn't be a bad thing to have a section in the docs on the
wide/narrow dependencies? And maybe for each transformation the dependency
it creates. Although it's mostly obvious, it will stress the fact better
that you need to choose your transformations very carefully and that some
are much more preferred than others.





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/extremely-slow-k-means-version-tp4489p4493.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to