Moved hadoop-gremlin provider documentation to the provider docs CTR
Project: http://git-wip-us.apache.org/repos/asf/tinkerpop/repo Commit: http://git-wip-us.apache.org/repos/asf/tinkerpop/commit/e3c5d8ed Tree: http://git-wip-us.apache.org/repos/asf/tinkerpop/tree/e3c5d8ed Diff: http://git-wip-us.apache.org/repos/asf/tinkerpop/diff/e3c5d8ed Branch: refs/heads/TINKERPOP-1063 Commit: e3c5d8ed1ca9028e35e78b9b1da6e73b8b066659 Parents: ccd2630 Author: Stephen Mallette <sp...@genoprime.com> Authored: Wed Jun 15 11:25:20 2016 -0400 Committer: Stephen Mallette <sp...@genoprime.com> Committed: Wed Jun 15 11:25:20 2016 -0400 ---------------------------------------------------------------------- docs/src/dev/provider/index.asciidoc | 42 ++++++++++++++++++ .../reference/implementations-hadoop.asciidoc | 45 +------------------- 2 files changed, 43 insertions(+), 44 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/e3c5d8ed/docs/src/dev/provider/index.asciidoc ---------------------------------------------------------------------- diff --git a/docs/src/dev/provider/index.asciidoc b/docs/src/dev/provider/index.asciidoc index 2d51274..7b876a1 100644 --- a/docs/src/dev/provider/index.asciidoc +++ b/docs/src/dev/provider/index.asciidoc @@ -288,6 +288,48 @@ for (final MapReduce mapReduce : mapReducers) { <2> If there is no reduce stage, the the map-stage results are inserted into Memory as specified by the application developer's `MapReduce.addResultToMemory()` implementation. +Hadoop-Gremlin Usage +^^^^^^^^^^^^^^^^^^^^ + +Hadoop-Gremlin is centered around `InputFormats` and `OutputFormats`. If a 3rd-party graph system provider wishes to +leverage Hadoop-Gremlin (and its respective `GraphComputer` engines), then they need to provide, at minimum, a +Hadoop2 `InputFormat<NullWritable,VertexWritable>` for their graph system. If the provider wishes to persist computed +results back to their graph system (and not just to HDFS via a `FileOutputFormat`), then a graph system specific +`OutputFormat<NullWritable,VertexWritable>` must be developed as well. + +Conceptually, `HadoopGraph` is a wrapper around a `Configuration` object. There is no "data" in the `HadoopGraph` as +the `InputFormat` specifies where and how to get the graph data at OLAP (and OLTP) runtime. Thus, `HadoopGraph` is a +small object with little overhead. Graph system providers should realize `HadoopGraph` as the gateway to the OLAP +features offered by Hadoop-Gremlin. For example, a graph system specific `Graph.compute(Class<? extends GraphComputer> +graphComputerClass)`-method may look as follows: + +[source,java] +---- +public <C extends GraphComputer> C compute(final Class<C> graphComputerClass) throws IllegalArgumentException { + try { + if (AbstractHadoopGraphComputer.class.isAssignableFrom(graphComputerClass)) + return graphComputerClass.getConstructor(HadoopGraph.class).newInstance(this); + else + throw Graph.Exceptions.graphDoesNotSupportProvidedGraphComputer(graphComputerClass); + } catch (final Exception e) { + throw new IllegalArgumentException(e.getMessage(),e); + } +} +---- + +Note that the configurations for Hadoop are assumed to be in the `Graph.configuration()` object. If this is not the +case, then the `Configuration` provided to `HadoopGraph.open()` should be dynamically created within the +`compute()`-method. It is in the provided configuration that `HadoopGraph` gets the various properties which +determine how to read and write data to and from Hadoop. For instance, `gremlin.hadoop.graphInputFormat` and +`gremlin.hadoop.graphOutputFormat`. + +IMPORTANT: A graph system provider's `OutputFormat` should implement the `PersistResultGraphAware` interface which +determines which persistence options are available to the user. For the standard file-based `OutputFormats` provided +by Hadoop-Gremlin (e.g. <<gryo-io-format,`GryoOutputFormat`>>, <<graphson-io-format,`GraphSONOutputFormat`>>, +and <<script-io-format,`ScriptInputOutputFormat`>>) `ResultGraph.ORIGINAL` is not supported as the original graph +data files are not random access and are, in essence, immutable. Thus, these file-based `OutputFormats` only support +`ResultGraph.NEW` which creates a copy of the data specified by the `Persist` enum. + [[io-implementations]] IO Implementations ^^^^^^^^^^^^^^^^^^ http://git-wip-us.apache.org/repos/asf/tinkerpop/blob/e3c5d8ed/docs/src/reference/implementations-hadoop.asciidoc ---------------------------------------------------------------------- diff --git a/docs/src/reference/implementations-hadoop.asciidoc b/docs/src/reference/implementations-hadoop.asciidoc index 8c591e2..b89c0a1 100644 --- a/docs/src/reference/implementations-hadoop.asciidoc +++ b/docs/src/reference/implementations-hadoop.asciidoc @@ -904,47 +904,4 @@ Vertex 4 ("josh") is isolated below: "age":[{"id":7,"value":32}]} } } ----- - -Hadoop-Gremlin for Graph System Providers -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Hadoop-Gremlin is centered around `InputFormats` and `OutputFormats`. If a 3rd-party graph system provider wishes to -leverage Hadoop-Gremlin (and its respective `GraphComputer` engines), then they need to provide, at minimum, a -Hadoop2 `InputFormat<NullWritable,VertexWritable>` for their graph system. If the provider wishes to persist computed -results back to their graph system (and not just to HDFS via a `FileOutputFormat`), then a graph system specific -`OutputFormat<NullWritable,VertexWritable>` must be developed as well. - -Conceptually, `HadoopGraph` is a wrapper around a `Configuration` object. There is no "data" in the `HadoopGraph` as -the `InputFormat` specifies where and how to get the graph data at OLAP (and OLTP) runtime. Thus, `HadoopGraph` is a -small object with little overhead. Graph system providers should realize `HadoopGraph` as the gateway to the OLAP -features offered by Hadoop-Gremlin. For example, a graph system specific `Graph.compute(Class<? extends GraphComputer> -graphComputerClass)`-method may look as follows: - -[source,java] ----- -public <C extends GraphComputer> C compute(final Class<C> graphComputerClass) throws IllegalArgumentException { - try { - if (AbstractHadoopGraphComputer.class.isAssignableFrom(graphComputerClass)) - return graphComputerClass.getConstructor(HadoopGraph.class).newInstance(this); - else - throw Graph.Exceptions.graphDoesNotSupportProvidedGraphComputer(graphComputerClass); - } catch (final Exception e) { - throw new IllegalArgumentException(e.getMessage(),e); - } -} ----- - -Note that the configurations for Hadoop are assumed to be in the `Graph.configuration()` object. If this is not the -case, then the `Configuration` provided to `HadoopGraph.open()` should be dynamically created within the -`compute()`-method. It is in the provided configuration that `HadoopGraph` gets the various properties which -determine how to read and write data to and from Hadoop. For instance, `gremlin.hadoop.graphInputFormat` and -`gremlin.hadoop.graphOutputFormat`. - -IMPORTANT: A graph system provider's `OutputFormat` should implement the `PersistResultGraphAware` interface which -determines which persistence options are available to the user. For the standard file-based `OutputFormats` provided -by Hadoop-Gremlin (e.g. <<gryo-io-format,`GryoOutputFormat`>>, <<graphson-io-format,`GraphSONOutputFormat`>>, -and <<script-io-format,`ScriptInputOutputFormat`>>) `ResultGraph.ORIGINAL` is not supported as the original graph -data files are not random access and are, in essence, immutable. Thus, these file-based `OutputFormats` only support -`ResultGraph.NEW` which creates a copy of the data specified by the `Persist` enum. - +---- \ No newline at end of file