xiepengjie created SPARK-36420: ---------------------------------- Summary: Use `isEmpty` to improve performance in Pregel's superstep Key: SPARK-36420 URL: https://issues.apache.org/jira/browse/SPARK-36420 Project: Spark Issue Type: Improvement Components: GraphX Affects Versions: 2.4.7 Reporter: xiepengjie Fix For: 3.3.0
When I was running `Graphx.connectedComponents` with 20+ billion vertices and edges, I found that count is very slow. {code:java} object Pregel extends Logging { ... def apply[VD: ClassTag, ED: ClassTag, A: ClassTag] (...): Graph[VD, ED] = { ... // Maybe messages.isEmpty() is better than messages.count() var activeMessages = messages.count() // Loop var prevG: Graph[VD, ED] = null var i = 0 while (activeMessages > 0 && i < maxIterations) { ... activeMessages = messages.count() ... } ... g } // end of apply } // end of class Pregel {code} Maybe we only need an action operator here and active-messages are not empty, so we don’t need to use count, it’s better to use isEmpty. I verified it and it worked very well. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org