Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/14940#discussion_r205839792
--- Diff:
graphx/src/main/scala/org/apache/spark/graphx/lib/LabelPropagation.scala ---
@@ -58,7 +58,7 @@ object LabelPropagation {
}.toMap
}
def vertexProgram(vid: VertexId, attr: Long, message: Map[VertexId,
Long]): VertexId = {
- if (message.isEmpty) attr else message.maxBy(_._2)._1
+ (Map(attr -> 1L) ++ message).maxBy(m => (m._2, m._1))._1
--- End diff --
Nit, is `message :+ (attr -> 1L)` simpler for the first expression?
I don't know enough to evaluate the implications of this change. It sounds
like the current behavior is on purpose or according to some paper, but I'm not
sure. Is there a reference for this being the more correct thing to do?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]