Added connected components recipe
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/d5404b81 Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/d5404b81 Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/d5404b81 Branch: refs/heads/TINKERPOP-1602 Commit: d5404b8166525f0dd62d0e0bde3cce9fc68cd887 Parents: 52f7f96 Author: Stephen Mallette <sp...@genoprime.com> Authored: Mon Jan 9 11:12:03 2017 -0500 Committer: Stephen Mallette <sp...@genoprime.com> Committed: Mon Jan 9 11:12:03 2017 -0500 ---------------------------------------------------------------------- docs/src/recipes/connected-components.asciidoc | 82 ++++++++++++++++++++ docs/src/recipes/index.asciidoc | 2 + docs/static/images/connected-components.png | Bin 0 -> 17946 bytes 3 files changed, 84 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/d5404b81/docs/src/recipes/connected-components.asciidoc ---------------------------------------------------------------------- diff --git a/docs/src/recipes/connected-components.asciidoc b/docs/src/recipes/connected-components.asciidoc new file mode 100644 index 0000000..cfb93fb --- /dev/null +++ b/docs/src/recipes/connected-components.asciidoc @@ -0,0 +1,82 @@ +//// +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +//// +[[connected-components]] +Connected Components +-------------------- + +Gremlin can be used to find link:https://en.wikipedia.org/wiki/Connected_component_(graph_theory)[connected components] +in a graph. Consider the following graph which has three connected components: + +image:connected-components.png[width=600] + +[gremlin-groovy] +---- +g.addV(id, "A").as("a"). + addV(id, "B").as("b"). + addV(id, "C").as("c"). + addV(id, "D").as("d"). + addV(id, "E").as("e"). + addV(id, "F"). + addE("link").from("a").to("b"). + addE("link").from("b").to("c"). + addE("link").from("d").to("e").iterate() +---- + +One way to detect the various subgraphs would be to do something like this: + +[gremlin-groovy,existing] +---- +g.V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()). <1> + aggregate("p").by(path()).cap("p"). <2> + unfold().limit(local, 1).dedup(). <3> + map(__.as("v").select("p").unfold(). <4> + filter(unfold().where(eq("v"))). + unfold().dedup().order().by(id).fold() + ).dedup() <5> +---- + +<1> Iterate all vertices and repeatedly traverse over both incoming and outgoing edges (TinkerPop doesn't support +unidirectional graphs directly so it must be simulated by ignoring the direction with `both`). Note the use of `emit` +prior to `repeat` as this allows for return of a single length path. +<2> Aggregate the emitted vertices to "p" and get their path information. Calling `cap` at the end will push the +aggregate path list into the traversal. It is within these paths that the list of connected components will be +identified. Obviously the paths list are duplicative in the sense that they contains different paths travelled over +the same vertices. +<3> Unroll the elements in the path list with `unfold` and grab the first vertex in each path and `dedup`. +<4> Use the first vertex in each path to filter against the paths stored in "p". When a path is found that has the +vertex in it, dedup the vertices in the path, order it by the identifier. Each path output from this `map` step +represents a connected component. +<5> The connected component list is duplicative given the nature of the paths in "p", but now that the vertices within +the paths are ordered, a final `dedup` will make the list of connective components unique. + +NOTE: This is a nice example of where running smaller pieces of a large Gremlin statement make it easier to see what +is happening at each step. Consider running this example one line at a time (or perhaps even in a step at a time) to +see the output at each point. + +While the above approach returns results nicely, the traversal doesn't appear to work with OLAP. A less efficient +approach, but one more suited for OLAP execution looks quite similar but does not use `dedup` as heavily (thus +`GraphComputer` is forced to analyze far more paths): + +[gremlin-groovy,existing] +---- +g.withComputer().V().emit(cyclicPath().or().not(both())).repeat(both()).until(cyclicPath()). + aggregate("p").by(path()).cap("p").unfold().limit(local, 1). + map(__.as("v").select("p").unfold(). + filter(unfold().where(eq("v"))). + unfold().dedup().order().by(id).fold() + ).toSet() +---- \ No newline at end of file http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/d5404b81/docs/src/recipes/index.asciidoc ---------------------------------------------------------------------- diff --git a/docs/src/recipes/index.asciidoc b/docs/src/recipes/index.asciidoc index 47c2b37..8ffc358 100644 --- a/docs/src/recipes/index.asciidoc +++ b/docs/src/recipes/index.asciidoc @@ -40,6 +40,8 @@ include::between-vertices.asciidoc[] include::centrality.asciidoc[] +include::connected-components.asciidoc[] + include::cycle-detection.asciidoc[] include::if-then-based-grouping.asciidoc[] http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/d5404b81/docs/static/images/connected-components.png ---------------------------------------------------------------------- diff --git a/docs/static/images/connected-components.png b/docs/static/images/connected-components.png new file mode 100755 index 0000000..ef3cc41 Binary files /dev/null and b/docs/static/images/connected-components.png differ