Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11290#discussion_r53566037
  
    --- Diff: 
graphx/src/main/scala/org/apache/spark/graphx/lib/TriangleCount.scala ---
    @@ -27,25 +28,47 @@ import org.apache.spark.graphx._
      * The algorithm is relatively straightforward and can be computed in 
three steps:
      *
      * <ul>
    - * <li>Compute the set of neighbors for each vertex
    - * <li>For each edge compute the intersection of the sets and send the 
count to both vertices.
    + * <li> Compute the set of neighbors for each vertex
    + * <li> For each edge compute the intersection of the sets and send the 
count to both vertices.
      * <li> Compute the sum at each vertex and divide by two since each 
triangle is counted twice.
      * </ul>
      *
    - * Note that the input graph should have its edges in canonical direction
    - * (i.e. the `sourceId` less than `destId`). Also the graph must have been 
partitioned
    - * using [[org.apache.spark.graphx.Graph#partitionBy]].
    + * There are two implementations.  The default `TriangleCount.run` 
implementation first removes
    + * self cycles and canonicalizes the graph to ensure that the following 
conditions hold:
    + * <ul>
    + * <li> There are no self edges
    + * <li> All edges are oriented src > dst
    + * <li> There are no duplicate edges
    + * </ul>
    + * However, the canonicalization procedure is costly as it requires 
repartitioning the graph.
    + * If the input data is already in "canonical form" with self cycles 
removed then the
    + * `TriangleCount.runPreCanonicalized` should be used instead.
    + *
    + * {{{
    + * val canonicalGraph = graph.mapEdges(e => 
1).removeSelfEdges().canonicalizeEdges()
    + * val counts = TriangleCount.runPreCanonicalized(canonicalGraph).vertices
    + * }}}
    + *
      */
     object TriangleCount {
     
       def run[VD: ClassTag, ED: ClassTag](graph: Graph[VD, ED]): Graph[Int, 
ED] = {
    -    // Remove redundant edges
    -    val g = graph.groupEdges((a, b) => a).cache()
    +    // Transform the edge data something cheap to shuffle and then 
canonicalize
    +    val canonicalGraph = graph.mapEdges(e => 
true).removeSelfEdges().convertToCanonicalEdges()
    +    // Get the triangle counts
    +    val counters = runPreCanonicalized(canonicalGraph).vertices
    +    // Join them bath with the original graph
    +    graph.outerJoinVertices(counters) { (vid, _, optCounter: Option[Int]) 
=>
    --- End diff --
    
    Nit: just `{ (_, _, optCounter) => ` right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to