[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-gearpump/pull/223 ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user huafengw commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r139345072 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -318,12 +360,8 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * http://www.drdobbs.com/database/topological-sorting/184410262 */ def topologicalOrderWithCirclesIterator: Iterator[N] = { -if (hasCycle()) { - val topo = getAcyclicCopy().topologicalOrderIterator - topo.flatMap(_.sortBy(_indexs(_)).iterator) -} else { - topologicalOrderIterator -} +val topo = getAcyclicCopy().topologicalOrderIterator +topo.flatMap(_.sortBy(indexs(_)).iterator) --- End diff -- It's not. `topo` is an `Iterator[List[N]]` so the items of topo is sorted but the item it self is a list and it's not sorted. ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r139280373 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -318,12 +360,8 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * http://www.drdobbs.com/database/topological-sorting/184410262 */ def topologicalOrderWithCirclesIterator: Iterator[N] = { -if (hasCycle()) { - val topo = getAcyclicCopy().topologicalOrderIterator - topo.flatMap(_.sortBy(_indexs(_)).iterator) -} else { - topologicalOrderIterator -} +val topo = getAcyclicCopy().topologicalOrderIterator +topo.flatMap(_.sortBy(indexs(_)).iterator) --- End diff -- Is `topologicalOrderIterator` already sorted ? Why do we sort again ? ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r139061240 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -140,64 +155,64 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * Current Graph is not changed. */ def mapVertex[NewNode](fun: N => NewNode): Graph[NewNode, E] = { -val vertexes = vertices.map(node => (node, fun(node))) +val newVertices = getVertices.map(node => (node, fun(node))) -val vertexMap: Map[N, NewNode] = vertexes.toMap +val vertexMap: Map[N, NewNode] = newVertices.toMap -val newEdges = edges.map { edge => +val newEdges = getEdges.map { edge => (vertexMap(edge._1), edge._2, vertexMap(edge._3)) } -new Graph(vertexes.map(_._2), newEdges) +new Graph(newVertices.map(_._2), newEdges) } /** * Map a graph to a new graph, with edge converted to new type * Current graph is not changed. */ def mapEdge[NewEdge](fun: (N, E, N) => NewEdge): Graph[N, NewEdge] = { -val newEdges = edges.map { edge => +val newEdges = getEdges.map { edge => (edge._1, fun(edge._1, edge._2, edge._3), edge._3) } -new Graph(vertices, newEdges) +new Graph(getVertices, newEdges) } /** * edges connected to node */ def edgesOf(node: N): List[(N, E, N)] = { -(incomingEdgesOf(node) ++ outgoingEdgesOf(node)).toSet[(N, E, N)].toList.sortBy(_indexs(_)) +(incomingEdgesOf(node) ++ outgoingEdgesOf(node)).toList --- End diff -- no need to sort now ? ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r139061587 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -243,13 +258,36 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * The node returned by Iterator is stable sorted. */ def topologicalOrderIterator: Iterator[N] = { -val newGraph = copy -var output = List.empty[N] +tryTopologicalOrderIterator match { + case Success(intertor) => intertor --- End diff -- intertor ? ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r139061668 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -243,13 +258,36 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * The node returned by Iterator is stable sorted. */ def topologicalOrderIterator: Iterator[N] = { -val newGraph = copy -var output = List.empty[N] +tryTopologicalOrderIterator match { + case Success(intertor) => intertor + case Failure(_) => topologicalOrderWithCirclesIterator --- End diff -- should we add a warning here ? ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user huafengw commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r139049494 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -81,28 +86,35 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * out degree */ def outDegreeOf(node: N): Int = { -edges.count(_._1 == node) +outgoingEdgesOf(node).size } /** * in degree */ def inDegreeOf(node: N): Int = { -edges.count(_._3 == node) +incomingEdgesOf(node).size } /** * out going edges. */ def outgoingEdgesOf(node: N): List[(N, E, N)] = { -edges.filter(_._1 == node) +_outEdges.getOrElse(node, List.empty) } /** * incoming edges. */ def incomingEdgesOf(node: N): List[(N, E, N)] = { -edges.filter(_._3 == node) +_inEdges.getOrElse(node, List.empty) + } + + /** + * adjacent vertices. + */ + def adjacentVertices(node: N): List[N] = { --- End diff -- sure ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r139049560 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -156,64 +155,64 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * Current Graph is not changed. */ def mapVertex[NewNode](fun: N => NewNode): Graph[NewNode, E] = { -val vertexes = vertices.map(node => (node, fun(node))) +val newVertexes = getVertices.map(node => (node, fun(node))) -val vertexMap: Map[N, NewNode] = vertexes.toMap +val vertexMap: Map[N, NewNode] = newVertexes.toMap -val newEdges = edges.map { edge => +val newEdges = getEdges.map { edge => (vertexMap(edge._1), edge._2, vertexMap(edge._3)) } -new Graph(vertexes.map(_._2), newEdges) +new Graph(newVertexes.map(_._2), newEdges) --- End diff -- I believe the plural of Vertex is Vertices ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r139049164 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -81,28 +86,35 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * out degree */ def outDegreeOf(node: N): Int = { -edges.count(_._1 == node) +outgoingEdgesOf(node).size } /** * in degree */ def inDegreeOf(node: N): Int = { -edges.count(_._3 == node) +incomingEdgesOf(node).size } /** * out going edges. */ def outgoingEdgesOf(node: N): List[(N, E, N)] = { -edges.filter(_._1 == node) +_outEdges.getOrElse(node, List.empty) } /** * incoming edges. */ def incomingEdgesOf(node: N): List[(N, E, N)] = { -edges.filter(_._3 == node) +_inEdges.getOrElse(node, List.empty) + } + + /** + * adjacent vertices. + */ + def adjacentVertices(node: N): List[N] = { --- End diff -- can this be private ? We don't know this is a DAG from outside. ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r138875669 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -165,7 +181,7 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * edges connected to node */ def edgesOf(node: N): List[(N, E, N)] = { -(incomingEdgesOf(node) ++ outgoingEdgesOf(node)).toSet[(N, E, N)].toList.sortBy(_indexs(_)) +(incomingEdgesOf(node) ++ outgoingEdgesOf(node)).distinct.sortBy(_indexs(_)) --- End diff -- why do we need to `distinct` and `sort` here ? ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r138873545 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -81,28 +86,35 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * out degree */ def outDegreeOf(node: N): Int = { -edges.count(_._1 == node) +outgoingEdgesOf(node).size } /** * in degree */ def inDegreeOf(node: N): Int = { -edges.count(_._3 == node) +incomingEdgesOf(node).size } /** * out going edges. */ def outgoingEdgesOf(node: N): List[(N, E, N)] = { -edges.filter(_._1 == node) +_outEdges.getOrElse(node, List.empty) } /** * incoming edges. */ def incomingEdgesOf(node: N): List[(N, E, N)] = { -edges.filter(_._3 == node) +_inEdges.getOrElse(node, List.empty) + } + + /** + * adjacent vertices. + */ + def adjacentVertices(node: N): List[N] = { --- End diff -- how about vertices of incoming edges ? ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r138875439 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -28,6 +29,8 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial private val _vertices = mutable.Set.empty[N] private val _edges = mutable.Set.empty[(N, E, N)] + private val _outEdges = mutable.Map.empty[N, List[(N, E, N)]] --- End diff -- Will `Set` perform better than `List` here ? ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r138876553 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -318,11 +355,12 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * http://www.drdobbs.com/database/topological-sorting/184410262 */ def topologicalOrderWithCirclesIterator: Iterator[N] = { -if (hasCycle()) { +val result = tryTopologicalOrderIterator --- End diff -- `match case` looks more clear ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r138876219 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -355,19 +393,7 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * check whether there is a loop */ def hasCycle(): Boolean = { -@tailrec -def detectCycle(graph: Graph[N, E]): Boolean = { - if (graph.edges.isEmpty) { -false - } else if (graph.vertices.nonEmpty && !graph.vertices.exists(graph.inDegreeOf(_) == 0)) { -true - } else { -graph.removeZeroInDegree -detectCycle(graph) - } -} - -detectCycle(copy) +tryTopologicalOrderIterator.isFailure --- End diff -- can we cache result here ? ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r138875971 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -243,13 +259,34 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * The node returned by Iterator is stable sorted. */ def topologicalOrderIterator: Iterator[N] = { -val newGraph = copy -var output = List.empty[N] - -while (!newGraph.isEmpty) { - output ++= newGraph.removeZeroInDegree +tryTopologicalOrderIterator.get --- End diff -- This will throw exception on failure ? The callers don't know. ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r138876832 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -243,13 +259,34 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * The node returned by Iterator is stable sorted. */ def topologicalOrderIterator: Iterator[N] = { -val newGraph = copy -var output = List.empty[N] - -while (!newGraph.isEmpty) { - output ++= newGraph.removeZeroInDegree +tryTopologicalOrderIterator.get + } + + private def tryTopologicalOrderIterator: Try[Iterator[N]] = { +Try { + val indegreeMap = mutable.Map.empty[N, Int] --- End diff -- why not generate `indegreeMap` with `vertices.map` ? ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r138879455 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -28,6 +29,8 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial private val _vertices = mutable.Set.empty[N] private val _edges = mutable.Set.empty[(N, E, N)] + private val _outEdges = mutable.Map.empty[N, List[(N, E, N)]] + private val _inEdges = mutable.Map.empty[N, List[(N, E, N)]] --- End diff -- Why do we have "_abc" style variable names here ? ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user huafengw commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r138880775 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -81,28 +86,35 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * out degree */ def outDegreeOf(node: N): Int = { -edges.count(_._1 == node) +outgoingEdgesOf(node).size } /** * in degree */ def inDegreeOf(node: N): Int = { -edges.count(_._3 == node) +incomingEdgesOf(node).size } /** * out going edges. */ def outgoingEdgesOf(node: N): List[(N, E, N)] = { -edges.filter(_._1 == node) +_outEdges.getOrElse(node, List.empty) } /** * incoming edges. */ def incomingEdgesOf(node: N): List[(N, E, N)] = { -edges.filter(_._3 == node) +_inEdges.getOrElse(node, List.empty) + } + + /** + * adjacent vertices. + */ + def adjacentVertices(node: N): List[N] = { --- End diff -- It's a directed graph ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user huafengw commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r138881132 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -243,13 +259,34 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * The node returned by Iterator is stable sorted. */ def topologicalOrderIterator: Iterator[N] = { -val newGraph = copy -var output = List.empty[N] - -while (!newGraph.isEmpty) { - output ++= newGraph.removeZeroInDegree +tryTopologicalOrderIterator.get --- End diff -- I think exception is better than dead loop ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user huafengw commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r138880939 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -165,7 +181,7 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * edges connected to node */ def edgesOf(node: N): List[(N, E, N)] = { -(incomingEdgesOf(node) ++ outgoingEdgesOf(node)).toSet[(N, E, N)].toList.sortBy(_indexs(_)) +(incomingEdgesOf(node) ++ outgoingEdgesOf(node)).distinct.sortBy(_indexs(_)) --- End diff -- ``` // This is used to ensure the output of this Graph is always stable // Like method vertices(), or edges() private var _indexs = Map.empty[Any, Int] ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r138883853 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -243,13 +259,34 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * The node returned by Iterator is stable sorted. */ def topologicalOrderIterator: Iterator[N] = { -val newGraph = copy -var output = List.empty[N] - -while (!newGraph.isEmpty) { - output ++= newGraph.removeZeroInDegree +tryTopologicalOrderIterator.get --- End diff -- Then return a `Try` is better since the caller will be aware this method may fail then. ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user huafengw commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r138882103 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -355,19 +393,7 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * check whether there is a loop */ def hasCycle(): Boolean = { -@tailrec -def detectCycle(graph: Graph[N, E]): Boolean = { - if (graph.edges.isEmpty) { -false - } else if (graph.vertices.nonEmpty && !graph.vertices.exists(graph.inDegreeOf(_) == 0)) { -true - } else { -graph.removeZeroInDegree -detectCycle(graph) - } -} - -detectCycle(copy) +tryTopologicalOrderIterator.isFailure --- End diff -- The graph is mutable so we have to check whether the Graph is changed if caching the result. ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
Github user manuzhang commented on a diff in the pull request: https://github.com/apache/incubator-gearpump/pull/223#discussion_r138884320 --- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala --- @@ -355,19 +393,7 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, E, N)]) extends Serial * check whether there is a loop */ def hasCycle(): Boolean = { -@tailrec -def detectCycle(graph: Graph[N, E]): Boolean = { - if (graph.edges.isEmpty) { -false - } else if (graph.vertices.nonEmpty && !graph.vertices.exists(graph.inDegreeOf(_) == 0)) { -true - } else { -graph.removeZeroInDegree -detectCycle(graph) - } -} - -detectCycle(copy) +tryTopologicalOrderIterator.isFailure --- End diff -- sure, this can be left as a future optimization and I'm not a big fan of "mutable" ---
[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...
GitHub user huafengw opened a pull request: https://github.com/apache/incubator-gearpump/pull/223 [GEARPUMP-349] Optimize Graph topologicalOrderIterator performance Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the commit message is formatted like: `[GEARPUMP-] Meaningful description of pull request` - [ ] Make sure tests pass via `sbt clean test`. - [ ] Make sure old documentation affected by the pull request has been updated and new documentation added for new functionality. You can merge this pull request into a Git repository by running: $ git pull https://github.com/huafengw/incubator-gearpump graph Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-gearpump/pull/223.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #223 commit 2673c99cb26a8a0733af772f01748f818ea08857 Author: huafengw Date: 2017-09-05T08:59:49Z [GEARPUMP-349] Optimize Graph topologicalOrderIterator performance ---