[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-18 Thread huafengw
Github user huafengw commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r139345072
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -318,12 +360,8 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* http://www.drdobbs.com/database/topological-sorting/184410262
*/
   def topologicalOrderWithCirclesIterator: Iterator[N] = {
-if (hasCycle()) {
-  val topo = getAcyclicCopy().topologicalOrderIterator
-  topo.flatMap(_.sortBy(_indexs(_)).iterator)
-} else {
-  topologicalOrderIterator
-}
+val topo = getAcyclicCopy().topologicalOrderIterator
+topo.flatMap(_.sortBy(indexs(_)).iterator)
--- End diff --

It's not. `topo` is an `Iterator[List[N]]` so the items of topo is sorted 
but the item it self is a list and it's not sorted.


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-16 Thread manuzhang
Github user manuzhang commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r139280373
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -318,12 +360,8 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* http://www.drdobbs.com/database/topological-sorting/184410262
*/
   def topologicalOrderWithCirclesIterator: Iterator[N] = {
-if (hasCycle()) {
-  val topo = getAcyclicCopy().topologicalOrderIterator
-  topo.flatMap(_.sortBy(_indexs(_)).iterator)
-} else {
-  topologicalOrderIterator
-}
+val topo = getAcyclicCopy().topologicalOrderIterator
+topo.flatMap(_.sortBy(indexs(_)).iterator)
--- End diff --

Is `topologicalOrderIterator` already sorted ? Why do we sort again ? 


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread manuzhang
Github user manuzhang commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r139061587
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -243,13 +258,36 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* The node returned by Iterator is stable sorted.
*/
   def topologicalOrderIterator: Iterator[N] = {
-val newGraph = copy
-var output = List.empty[N]
+tryTopologicalOrderIterator match {
+  case Success(intertor) => intertor
--- End diff --

intertor ?


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread manuzhang
Github user manuzhang commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r139049560
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -156,64 +155,64 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* Current Graph is not changed.
*/
   def mapVertex[NewNode](fun: N => NewNode): Graph[NewNode, E] = {
-val vertexes = vertices.map(node => (node, fun(node)))
+val newVertexes = getVertices.map(node => (node, fun(node)))
 
-val vertexMap: Map[N, NewNode] = vertexes.toMap
+val vertexMap: Map[N, NewNode] = newVertexes.toMap
 
-val newEdges = edges.map { edge =>
+val newEdges = getEdges.map { edge =>
   (vertexMap(edge._1), edge._2, vertexMap(edge._3))
 }
-new Graph(vertexes.map(_._2), newEdges)
+new Graph(newVertexes.map(_._2), newEdges)
--- End diff --

I believe the plural of Vertex is Vertices


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread manuzhang
Github user manuzhang commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r139049164
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -81,28 +86,35 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* out degree
*/
   def outDegreeOf(node: N): Int = {
-edges.count(_._1 == node)
+outgoingEdgesOf(node).size
   }
 
   /**
* in degree
*/
   def inDegreeOf(node: N): Int = {
-edges.count(_._3 == node)
+incomingEdgesOf(node).size
   }
 
   /**
* out going edges.
*/
   def outgoingEdgesOf(node: N): List[(N, E, N)] = {
-edges.filter(_._1 == node)
+_outEdges.getOrElse(node, List.empty)
   }
 
   /**
* incoming edges.
*/
   def incomingEdgesOf(node: N): List[(N, E, N)] = {
-edges.filter(_._3 == node)
+_inEdges.getOrElse(node, List.empty)
+  }
+
+  /**
+   * adjacent vertices.
+   */
+  def adjacentVertices(node: N): List[N] = {
--- End diff --

can this be private ? We don't know this is a DAG from outside.


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread manuzhang
Github user manuzhang commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r138875669
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -165,7 +181,7 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* edges connected to node
*/
   def edgesOf(node: N): List[(N, E, N)] = {
-(incomingEdgesOf(node) ++ outgoingEdgesOf(node)).toSet[(N, E, 
N)].toList.sortBy(_indexs(_))
+(incomingEdgesOf(node) ++ 
outgoingEdgesOf(node)).distinct.sortBy(_indexs(_))
--- End diff --

why do we need to `distinct` and `sort` here ?


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread manuzhang
Github user manuzhang commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r138873545
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -81,28 +86,35 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* out degree
*/
   def outDegreeOf(node: N): Int = {
-edges.count(_._1 == node)
+outgoingEdgesOf(node).size
   }
 
   /**
* in degree
*/
   def inDegreeOf(node: N): Int = {
-edges.count(_._3 == node)
+incomingEdgesOf(node).size
   }
 
   /**
* out going edges.
*/
   def outgoingEdgesOf(node: N): List[(N, E, N)] = {
-edges.filter(_._1 == node)
+_outEdges.getOrElse(node, List.empty)
   }
 
   /**
* incoming edges.
*/
   def incomingEdgesOf(node: N): List[(N, E, N)] = {
-edges.filter(_._3 == node)
+_inEdges.getOrElse(node, List.empty)
+  }
+
+  /**
+   * adjacent vertices.
+   */
+  def adjacentVertices(node: N): List[N] = {
--- End diff --

how about vertices of incoming edges ?


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread manuzhang
Github user manuzhang commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r138875439
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -28,6 +29,8 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, 
E, N)]) extends Serial
 
   private val _vertices = mutable.Set.empty[N]
   private val _edges = mutable.Set.empty[(N, E, N)]
+  private val _outEdges = mutable.Map.empty[N, List[(N, E, N)]]
--- End diff --

Will `Set` perform better than `List` here ?


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread manuzhang
Github user manuzhang commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r138876553
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -318,11 +355,12 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* http://www.drdobbs.com/database/topological-sorting/184410262
*/
   def topologicalOrderWithCirclesIterator: Iterator[N] = {
-if (hasCycle()) {
+val result = tryTopologicalOrderIterator
--- End diff --

`match case` looks more clear


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread manuzhang
Github user manuzhang commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r138876219
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -355,19 +393,7 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* check whether there is a loop
*/
   def hasCycle(): Boolean = {
-@tailrec
-def detectCycle(graph: Graph[N, E]): Boolean = {
-  if (graph.edges.isEmpty) {
-false
-  } else if (graph.vertices.nonEmpty && 
!graph.vertices.exists(graph.inDegreeOf(_) == 0)) {
-true
-  } else {
-graph.removeZeroInDegree
-detectCycle(graph)
-  }
-}
-
-detectCycle(copy)
+tryTopologicalOrderIterator.isFailure
--- End diff --

can we cache result here ?


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread manuzhang
Github user manuzhang commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r138875971
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -243,13 +259,34 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* The node returned by Iterator is stable sorted.
*/
   def topologicalOrderIterator: Iterator[N] = {
-val newGraph = copy
-var output = List.empty[N]
-
-while (!newGraph.isEmpty) {
-  output ++= newGraph.removeZeroInDegree
+tryTopologicalOrderIterator.get
--- End diff --

This will throw exception on failure ? The callers don't know.


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread manuzhang
Github user manuzhang commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r138876832
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -243,13 +259,34 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* The node returned by Iterator is stable sorted.
*/
   def topologicalOrderIterator: Iterator[N] = {
-val newGraph = copy
-var output = List.empty[N]
-
-while (!newGraph.isEmpty) {
-  output ++= newGraph.removeZeroInDegree
+tryTopologicalOrderIterator.get
+  }
+
+  private def tryTopologicalOrderIterator: Try[Iterator[N]] = {
+Try {
+  val indegreeMap = mutable.Map.empty[N, Int]
--- End diff --

why not generate `indegreeMap` with `vertices.map` ?


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread manuzhang
Github user manuzhang commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r138879455
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -28,6 +29,8 @@ class Graph[N, E](vertexList: List[N], edgeList: List[(N, 
E, N)]) extends Serial
 
   private val _vertices = mutable.Set.empty[N]
   private val _edges = mutable.Set.empty[(N, E, N)]
+  private val _outEdges = mutable.Map.empty[N, List[(N, E, N)]]
+  private val _inEdges = mutable.Map.empty[N, List[(N, E, N)]]
--- End diff --

Why do we have "_abc" style variable names here ?


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread huafengw
Github user huafengw commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r138880775
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -81,28 +86,35 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* out degree
*/
   def outDegreeOf(node: N): Int = {
-edges.count(_._1 == node)
+outgoingEdgesOf(node).size
   }
 
   /**
* in degree
*/
   def inDegreeOf(node: N): Int = {
-edges.count(_._3 == node)
+incomingEdgesOf(node).size
   }
 
   /**
* out going edges.
*/
   def outgoingEdgesOf(node: N): List[(N, E, N)] = {
-edges.filter(_._1 == node)
+_outEdges.getOrElse(node, List.empty)
   }
 
   /**
* incoming edges.
*/
   def incomingEdgesOf(node: N): List[(N, E, N)] = {
-edges.filter(_._3 == node)
+_inEdges.getOrElse(node, List.empty)
+  }
+
+  /**
+   * adjacent vertices.
+   */
+  def adjacentVertices(node: N): List[N] = {
--- End diff --

It's a directed graph


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread huafengw
Github user huafengw commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r138881132
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -243,13 +259,34 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* The node returned by Iterator is stable sorted.
*/
   def topologicalOrderIterator: Iterator[N] = {
-val newGraph = copy
-var output = List.empty[N]
-
-while (!newGraph.isEmpty) {
-  output ++= newGraph.removeZeroInDegree
+tryTopologicalOrderIterator.get
--- End diff --

I think exception is better than dead loop


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread huafengw
Github user huafengw commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r138880939
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -165,7 +181,7 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* edges connected to node
*/
   def edgesOf(node: N): List[(N, E, N)] = {
-(incomingEdgesOf(node) ++ outgoingEdgesOf(node)).toSet[(N, E, 
N)].toList.sortBy(_indexs(_))
+(incomingEdgesOf(node) ++ 
outgoingEdgesOf(node)).distinct.sortBy(_indexs(_))
--- End diff --

```
 // This is used to ensure the output of this Graph is always stable
 // Like method vertices(), or edges()
 private var _indexs = Map.empty[Any, Int]



---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread manuzhang
Github user manuzhang commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r138883853
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -243,13 +259,34 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* The node returned by Iterator is stable sorted.
*/
   def topologicalOrderIterator: Iterator[N] = {
-val newGraph = copy
-var output = List.empty[N]
-
-while (!newGraph.isEmpty) {
-  output ++= newGraph.removeZeroInDegree
+tryTopologicalOrderIterator.get
--- End diff --

Then return a `Try` is better since the caller will be aware this method 
may fail then.


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread huafengw
Github user huafengw commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r138882103
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -355,19 +393,7 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* check whether there is a loop
*/
   def hasCycle(): Boolean = {
-@tailrec
-def detectCycle(graph: Graph[N, E]): Boolean = {
-  if (graph.edges.isEmpty) {
-false
-  } else if (graph.vertices.nonEmpty && 
!graph.vertices.exists(graph.inDegreeOf(_) == 0)) {
-true
-  } else {
-graph.removeZeroInDegree
-detectCycle(graph)
-  }
-}
-
-detectCycle(copy)
+tryTopologicalOrderIterator.isFailure
--- End diff --

The graph is mutable so we have to check whether the Graph is changed if 
caching the result.


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-14 Thread manuzhang
Github user manuzhang commented on a diff in the pull request:

https://github.com/apache/incubator-gearpump/pull/223#discussion_r138884320
  
--- Diff: core/src/main/scala/org/apache/gearpump/util/Graph.scala ---
@@ -355,19 +393,7 @@ class Graph[N, E](vertexList: List[N], edgeList: 
List[(N, E, N)]) extends Serial
* check whether there is a loop
*/
   def hasCycle(): Boolean = {
-@tailrec
-def detectCycle(graph: Graph[N, E]): Boolean = {
-  if (graph.edges.isEmpty) {
-false
-  } else if (graph.vertices.nonEmpty && 
!graph.vertices.exists(graph.inDegreeOf(_) == 0)) {
-true
-  } else {
-graph.removeZeroInDegree
-detectCycle(graph)
-  }
-}
-
-detectCycle(copy)
+tryTopologicalOrderIterator.isFailure
--- End diff --

sure, this can be left as a future optimization and I'm not a big fan of 
"mutable"


---


[GitHub] incubator-gearpump pull request #223: [GEARPUMP-349] Optimize Graph topologi...

2017-09-05 Thread huafengw
GitHub user huafengw opened a pull request:

https://github.com/apache/incubator-gearpump/pull/223

[GEARPUMP-349] Optimize Graph topologicalOrderIterator performance

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the commit message is formatted like:
   `[GEARPUMP-] Meaningful description of pull request` 
 - [ ] Make sure tests pass via `sbt clean test`.
 - [ ] Make sure old documentation affected by the pull request has been 
updated and new documentation added for new functionality. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huafengw/incubator-gearpump graph

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-gearpump/pull/223.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #223


commit 2673c99cb26a8a0733af772f01748f818ea08857
Author: huafengw 
Date:   2017-09-05T08:59:49Z

[GEARPUMP-349] Optimize Graph topologicalOrderIterator performance




---