Added connected components recipe

Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo
Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/d5404b81
Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/d5404b81
Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/d5404b81

Branch: refs/heads/TINKERPOP-1602
Commit: d5404b8166525f0dd62d0e0bde3cce9fc68cd887
Parents: 52f7f96
Author: Stephen Mallette <sp...@genoprime.com>
Authored: Mon Jan 9 11:12:03 2017 -0500
Committer: Stephen Mallette <sp...@genoprime.com>
Committed: Mon Jan 9 11:12:03 2017 -0500

----------------------------------------------------------------------
 docs/src/recipes/connected-components.asciidoc |  82 ++++++++++++++++++++
 docs/src/recipes/index.asciidoc                |   2 +
 docs/static/images/connected-components.png    | Bin 0 -> 17946 bytes
 3 files changed, 84 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/d5404b81/docs/src/recipes/connected-components.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/connected-components.asciidoc 
b/docs/src/recipes/connected-components.asciidoc
new file mode 100644
index 0000000..cfb93fb
--- /dev/null
+++ b/docs/src/recipes/connected-components.asciidoc
@@ -0,0 +1,82 @@
+////
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+////
+[[connected-components]]
+Connected Components
+--------------------
+
+Gremlin can be used to find 
link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[connected 
components]
+in a graph. Consider the following graph which has three connected components:
+
+image:connected-components.png[width=600]
+
+[gremlin-groovy]
+----
+g.addV(id, "A").as("a").
+  addV(id, "B").as("b").
+  addV(id, "C").as("c").
+  addV(id, "D").as("d").
+  addV(id, "E").as("e").
+  addV(id, "F").
+  addE("link").from("a").to("b").
+  addE("link").from("b").to("c").
+  addE("link").from("d").to("e").iterate()
+----
+
+One way to detect the various subgraphs would be to do something like this:
+
+[gremlin-groovy,existing]
+----
+g.V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()).  
<1>
+  aggregate("p").by(path()).cap("p").                                          
<2>
+  unfold().limit(local, 1).dedup().                                            
<3>
+  map(__.as("v").select("p").unfold().                                         
<4>
+         filter(unfold().where(eq("v"))).
+         unfold().dedup().order().by(id).fold()
+  ).dedup()                                                                    
<5>
+----
+
+<1> Iterate all vertices and repeatedly traverse over both incoming and 
outgoing edges (TinkerPop doesn't support
+unidirectional graphs directly so it must be simulated by ignoring the 
direction with `both`). Note the use of `emit`
+prior to `repeat` as this allows for return of a single length path.
+<2> Aggregate the emitted vertices to "p" and get their path information. 
Calling `cap` at the end will push the
+aggregate path list into the traversal. It is within these paths that the list 
of connected components will be
+identified. Obviously the paths list are duplicative in the sense that they 
contains different paths travelled over
+the same vertices.
+<3> Unroll the elements in the path list with `unfold` and grab the first 
vertex in each path and `dedup`.
+<4> Use the first vertex in each path to filter against the paths stored in 
"p". When a path is found that has the
+vertex in it, dedup the vertices in the path, order it by the identifier. Each 
path output from this `map` step
+represents a connected component.
+<5> The connected component list is duplicative given the nature of the paths 
in "p", but now that the vertices within
+the paths are ordered, a final `dedup` will make the list of connective 
components unique.
+
+NOTE: This is a nice example of where running smaller pieces of a large 
Gremlin statement make it easier to see what
+is happening at each step. Consider running this example one line at a time 
(or perhaps even in a step at a time) to
+see the output at each point.
+
+While the above approach returns results nicely, the traversal doesn't appear 
to work with OLAP. A less efficient
+approach, but one more suited for OLAP execution looks quite similar but does 
not use `dedup` as heavily (thus
+`GraphComputer` is forced to analyze far more paths):
+
+[gremlin-groovy,existing]
+----
+g.withComputer().V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()).
+  aggregate("p").by(path()).cap("p").unfold().limit(local, 1).
+  map(__.as("v").select("p").unfold().
+         filter(unfold().where(eq("v"))).
+         unfold().dedup().order().by(id).fold()
+  ).toSet()
+----
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/d5404b81/docs/src/recipes/index.asciidoc
----------------------------------------------------------------------
diff --git a/docs/src/recipes/index.asciidoc b/docs/src/recipes/index.asciidoc
index 47c2b37..8ffc358 100644
--- a/docs/src/recipes/index.asciidoc
+++ b/docs/src/recipes/index.asciidoc
@@ -40,6 +40,8 @@ include::between-vertices.asciidoc[]
 
 include::centrality.asciidoc[]
 
+include::connected-components.asciidoc[]
+
 include::cycle-detection.asciidoc[]
 
 include::if-then-based-grouping.asciidoc[]

http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/d5404b81/docs/static/images/connected-components.png
----------------------------------------------------------------------
diff --git a/docs/static/images/connected-components.png 
b/docs/static/images/connected-components.png
new file mode 100755
index 0000000..ef3cc41
Binary files /dev/null and b/docs/static/images/connected-components.png differ

Reply via email to