[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...

2017-11-01 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/tinkerpop/pull/721


---


[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...

2017-10-15 Thread pluradj
Github user pluradj commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/721#discussion_r144719670
  
--- Diff: docs/src/recipes/olap-spark-yarn.asciidoc ---
@@ -0,0 +1,153 @@
+
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+[[olap-spark-yarn]]
+OLAP traversals with Spark on Yarn
+--
+
+TinkerPop's combination of 
http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer]
+and 
http://tinkerpop.apache.org/docs/current/reference/#_properties_files[HadoopGraph]
 allows for running
+distributed, analytical graph queries (OLAP) on a computer cluster. The

+http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[reference
 documentation] covers the cases
+where Spark runs locally or where the cluster is managed by a Spark 
server. However, many users can only run OLAP jobs
+via the http://hadoop.apache.org/[Hadoop 2.x] Resource Manager (Yarn), 
which requires `SparkGraphComputer` to be
+configured differently. This recipe describes this configuration.
+
+Approach
+
+
+Most configuration problems of TinkerPop with Spark on Yarn stem from 
three reasons:
+
+1. `SparkGraphComputer` creates its own `SparkContext` so it does not get 
any configs from the usual `spark-submit` command.
+2. The TinkerPop Spark plugin did not include Spark on Yarn runtime 
dependencies until version 3.2.7/3.3.1.
+3. Resolving reason 2 by adding the cluster's `spark-assembly` jar to the 
classpath creates a host of version
+conflicts, because Spark 1.x dependency versions have remained frozen 
since 2014.
+
+The current recipe follows a minimalist approach in which no dependencies 
are added to the dependencies
+included in the TinkerPop binary distribution. The Hadoop cluster's Spark 
installation is completely ignored. This
+approach minimizes the chance of dependency version conflicts.
+
+Prerequisites
+~
+This recipe is suitable for both a real external and a local pseudo Hadoop 
cluster. While the recipe is maintained
+for the vanilla Hadoop pseudo-cluster, it has been reported to work on 
real clusters with Hadoop distributions
+from various vendors.
+
+If you want to try the recipe on a local Hadoop pseudo-cluster, the 
easiest way to install
+it is to look at the install script at 
https://github.com/apache/tinkerpop/blob/x.y.z/docker/hadoop/install.sh
+and the `start hadoop` section of 
https://github.com/apache/tinkerpop/blob/x.y.z/docker/scripts/build.sh.
+
+This recipe assumes that you installed the gremlin console with the
+http://tinkerpop.apache.org/docs/x.y.z/reference/#spark-plugin[spark 
plugin] (the
+http://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-plugin[hadoop 
plugin] is optional). Your Hadoop cluster
+may have been configured to use file compression, e.g. lzo compression. If 
so, you need to copy the relevant
+jar (e.g. `hadoop-lzo-*.jar`) to gremlin console's `ext/spark-gremlin/lib` 
folder.
+
+For starting the gremlin console in the right environment, create a shell 
script (e.g. `bin/spark-yarn.sh`) with the
+contents below. Of course, actual values for `GREMLIN_HOME`, `HADOOP_HOME` 
and `HADOOP_CONF_DIR` need to be adapted to
+your particular environment.
+
+[source]
+
+#!/bin/bash
+# Variables to be adapted to the actual environment

+GREMLIN_HOME=/home/yourdir/lib/apache-tinkerpop-gremlin-console-x.y.z-standalone
+export HADOOP_HOME=/usr/local/lib/hadoop-2.7.2
+export HADOOP_CONF_DIR=/usr/local/lib/hadoop-2.7.2/etc/hadoop
+
+# Have TinkerPop find the hadoop cluster configs and hadoop native 
libraries
+export CLASSPATH=$HADOOP_CONF_DIR
+export 
JAVA_OPTIONS="-Djava.library.path=$HADOOP_HOME/lib/native:$HADOOP_HOME/lib/native/Linux-amd64-64"
+
+# Start gremlin-console without getting the HADOOP_GREMLIN_LIBS warning
+cd $GREMLIN_HOME
+[ ! -e empty ] && mkdir empty
+export HADOOP_GREMLIN_LIBS=$GREMLIN_HOME/empty
+bin/gremlin.sh

[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...

2017-10-15 Thread pluradj
Github user pluradj commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/721#discussion_r144719827
  
--- Diff: docs/src/recipes/olap-spark-yarn.asciidoc ---
@@ -0,0 +1,153 @@
+
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+[[olap-spark-yarn]]
+OLAP traversals with Spark on Yarn
+--
+
+TinkerPop's combination of 
http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer]
+and 
http://tinkerpop.apache.org/docs/current/reference/#_properties_files[HadoopGraph]
 allows for running
+distributed, analytical graph queries (OLAP) on a computer cluster. The

+http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[reference
 documentation] covers the cases
+where Spark runs locally or where the cluster is managed by a Spark 
server. However, many users can only run OLAP jobs
+via the http://hadoop.apache.org/[Hadoop 2.x] Resource Manager (Yarn), 
which requires `SparkGraphComputer` to be
+configured differently. This recipe describes this configuration.
+
+Approach
+
+
+Most configuration problems of TinkerPop with Spark on Yarn stem from 
three reasons:
+
+1. `SparkGraphComputer` creates its own `SparkContext` so it does not get 
any configs from the usual `spark-submit` command.
+2. The TinkerPop Spark plugin did not include Spark on Yarn runtime 
dependencies until version 3.2.7/3.3.1.
+3. Resolving reason 2 by adding the cluster's `spark-assembly` jar to the 
classpath creates a host of version
+conflicts, because Spark 1.x dependency versions have remained frozen 
since 2014.
+
+The current recipe follows a minimalist approach in which no dependencies 
are added to the dependencies
+included in the TinkerPop binary distribution. The Hadoop cluster's Spark 
installation is completely ignored. This
+approach minimizes the chance of dependency version conflicts.
+
+Prerequisites
+~
+This recipe is suitable for both a real external and a local pseudo Hadoop 
cluster. While the recipe is maintained
+for the vanilla Hadoop pseudo-cluster, it has been reported to work on 
real clusters with Hadoop distributions
+from various vendors.
+
+If you want to try the recipe on a local Hadoop pseudo-cluster, the 
easiest way to install
+it is to look at the install script at 
https://github.com/apache/tinkerpop/blob/x.y.z/docker/hadoop/install.sh
+and the `start hadoop` section of 
https://github.com/apache/tinkerpop/blob/x.y.z/docker/scripts/build.sh.
+
+This recipe assumes that you installed the gremlin console with the
+http://tinkerpop.apache.org/docs/x.y.z/reference/#spark-plugin[spark 
plugin] (the
+http://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-plugin[hadoop 
plugin] is optional). Your Hadoop cluster
+may have been configured to use file compression, e.g. lzo compression. If 
so, you need to copy the relevant
+jar (e.g. `hadoop-lzo-*.jar`) to gremlin console's `ext/spark-gremlin/lib` 
folder.
+
+For starting the gremlin console in the right environment, create a shell 
script (e.g. `bin/spark-yarn.sh`) with the
+contents below. Of course, actual values for `GREMLIN_HOME`, `HADOOP_HOME` 
and `HADOOP_CONF_DIR` need to be adapted to
+your particular environment.
+
+[source]
+
+#!/bin/bash
+# Variables to be adapted to the actual environment

+GREMLIN_HOME=/home/yourdir/lib/apache-tinkerpop-gremlin-console-x.y.z-standalone
+export HADOOP_HOME=/usr/local/lib/hadoop-2.7.2
+export HADOOP_CONF_DIR=/usr/local/lib/hadoop-2.7.2/etc/hadoop
+
+# Have TinkerPop find the hadoop cluster configs and hadoop native 
libraries
+export CLASSPATH=$HADOOP_CONF_DIR
+export 
JAVA_OPTIONS="-Djava.library.path=$HADOOP_HOME/lib/native:$HADOOP_HOME/lib/native/Linux-amd64-64"
+
+# Start gremlin-console without getting the HADOOP_GREMLIN_LIBS warning
+cd $GREMLIN_HOME
+[ ! -e empty ] && mkdir empty
+export HADOOP_GREMLIN_LIBS=$GREMLIN_HOME/empty
+bin/gremlin.sh

[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...

2017-10-15 Thread pluradj
Github user pluradj commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/721#discussion_r144653328
  
--- Diff: docs/src/recipes/olap-spark-yarn.asciidoc ---
@@ -0,0 +1,153 @@
+
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+[[olap-spark-yarn]]
+OLAP traversals with Spark on Yarn
+--
+
+TinkerPop's combination of 
http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer]
+and 
http://tinkerpop.apache.org/docs/current/reference/#_properties_files[HadoopGraph]
 allows for running
+distributed, analytical graph queries (OLAP) on a computer cluster. The

+http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[reference
 documentation] covers the cases
+where Spark runs locally or where the cluster is managed by a Spark 
server. However, many users can only run OLAP jobs
+via the http://hadoop.apache.org/[Hadoop 2.x] Resource Manager (Yarn), 
which requires `SparkGraphComputer` to be
--- End diff --

capitalize YARN throughout the doc


---


[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...

2017-10-15 Thread pluradj
Github user pluradj commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/721#discussion_r144719566
  
--- Diff: docs/src/recipes/olap-spark-yarn.asciidoc ---
@@ -0,0 +1,153 @@
+
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+[[olap-spark-yarn]]
+OLAP traversals with Spark on Yarn
+--
+
+TinkerPop's combination of 
http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer]
+and 
http://tinkerpop.apache.org/docs/current/reference/#_properties_files[HadoopGraph]
 allows for running
+distributed, analytical graph queries (OLAP) on a computer cluster. The

+http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[reference
 documentation] covers the cases
+where Spark runs locally or where the cluster is managed by a Spark 
server. However, many users can only run OLAP jobs
+via the http://hadoop.apache.org/[Hadoop 2.x] Resource Manager (Yarn), 
which requires `SparkGraphComputer` to be
+configured differently. This recipe describes this configuration.
+
+Approach
+
+
+Most configuration problems of TinkerPop with Spark on Yarn stem from 
three reasons:
+
+1. `SparkGraphComputer` creates its own `SparkContext` so it does not get 
any configs from the usual `spark-submit` command.
+2. The TinkerPop Spark plugin did not include Spark on Yarn runtime 
dependencies until version 3.2.7/3.3.1.
+3. Resolving reason 2 by adding the cluster's `spark-assembly` jar to the 
classpath creates a host of version
+conflicts, because Spark 1.x dependency versions have remained frozen 
since 2014.
+
+The current recipe follows a minimalist approach in which no dependencies 
are added to the dependencies
+included in the TinkerPop binary distribution. The Hadoop cluster's Spark 
installation is completely ignored. This
+approach minimizes the chance of dependency version conflicts.
+
+Prerequisites
+~
+This recipe is suitable for both a real external and a local pseudo Hadoop 
cluster. While the recipe is maintained
+for the vanilla Hadoop pseudo-cluster, it has been reported to work on 
real clusters with Hadoop distributions
+from various vendors.
+
+If you want to try the recipe on a local Hadoop pseudo-cluster, the 
easiest way to install
+it is to look at the install script at 
https://github.com/apache/tinkerpop/blob/x.y.z/docker/hadoop/install.sh
+and the `start hadoop` section of 
https://github.com/apache/tinkerpop/blob/x.y.z/docker/scripts/build.sh.
+
+This recipe assumes that you installed the gremlin console with the
+http://tinkerpop.apache.org/docs/x.y.z/reference/#spark-plugin[spark 
plugin] (the
+http://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-plugin[hadoop 
plugin] is optional). Your Hadoop cluster
+may have been configured to use file compression, e.g. lzo compression. If 
so, you need to copy the relevant
--- End diff --

capitalize LZO


---


[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...

2017-10-15 Thread pluradj
Github user pluradj commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/721#discussion_r144653511
  
--- Diff: docs/src/recipes/olap-spark-yarn.asciidoc ---
@@ -0,0 +1,153 @@
+
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+[[olap-spark-yarn]]
+OLAP traversals with Spark on Yarn
+--
+
+TinkerPop's combination of 
http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer]
+and 
http://tinkerpop.apache.org/docs/current/reference/#_properties_files[HadoopGraph]
 allows for running
+distributed, analytical graph queries (OLAP) on a computer cluster. The

+http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[reference
 documentation] covers the cases
+where Spark runs locally or where the cluster is managed by a Spark 
server. However, many users can only run OLAP jobs
+via the http://hadoop.apache.org/[Hadoop 2.x] Resource Manager (Yarn), 
which requires `SparkGraphComputer` to be
+configured differently. This recipe describes this configuration.
+
+Approach
+
+
+Most configuration problems of TinkerPop with Spark on Yarn stem from 
three reasons:
+
+1. `SparkGraphComputer` creates its own `SparkContext` so it does not get 
any configs from the usual `spark-submit` command.
+2. The TinkerPop Spark plugin did not include Spark on Yarn runtime 
dependencies until version 3.2.7/3.3.1.
+3. Resolving reason 2 by adding the cluster's `spark-assembly` jar to the 
classpath creates a host of version
+conflicts, because Spark 1.x dependency versions have remained frozen 
since 2014.
+
+The current recipe follows a minimalist approach in which no dependencies 
are added to the dependencies
+included in the TinkerPop binary distribution. The Hadoop cluster's Spark 
installation is completely ignored. This
+approach minimizes the chance of dependency version conflicts.
+
+Prerequisites
+~
+This recipe is suitable for both a real external and a local pseudo Hadoop 
cluster. While the recipe is maintained
+for the vanilla Hadoop pseudo-cluster, it has been reported to work on 
real clusters with Hadoop distributions
+from various vendors.
+
+If you want to try the recipe on a local Hadoop pseudo-cluster, the 
easiest way to install
+it is to look at the install script at 
https://github.com/apache/tinkerpop/blob/x.y.z/docker/hadoop/install.sh
+and the `start hadoop` section of 
https://github.com/apache/tinkerpop/blob/x.y.z/docker/scripts/build.sh.
+
+This recipe assumes that you installed the gremlin console with the
+http://tinkerpop.apache.org/docs/x.y.z/reference/#spark-plugin[spark 
plugin] (the
+http://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-plugin[hadoop 
plugin] is optional). Your Hadoop cluster
+may have been configured to use file compression, e.g. lzo compression. If 
so, you need to copy the relevant
+jar (e.g. `hadoop-lzo-*.jar`) to gremlin console's `ext/spark-gremlin/lib` 
folder.
+
+For starting the gremlin console in the right environment, create a shell 
script (e.g. `bin/spark-yarn.sh`) with the
--- End diff --

capitalize Gremlin Console throughout the doc


---


[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...

2017-10-15 Thread pluradj
Github user pluradj commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/721#discussion_r144655107
  
--- Diff: docs/src/recipes/olap-spark-yarn.asciidoc ---
@@ -0,0 +1,153 @@
+
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+
+[[olap-spark-yarn]]
+OLAP traversals with Spark on Yarn
+--
+
+TinkerPop's combination of 
http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer]
+and 
http://tinkerpop.apache.org/docs/current/reference/#_properties_files[HadoopGraph]
 allows for running
+distributed, analytical graph queries (OLAP) on a computer cluster. The

+http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[reference
 documentation] covers the cases
+where Spark runs locally or where the cluster is managed by a Spark 
server. However, many users can only run OLAP jobs
+via the http://hadoop.apache.org/[Hadoop 2.x] Resource Manager (Yarn), 
which requires `SparkGraphComputer` to be
+configured differently. This recipe describes this configuration.
+
+Approach
+
+
+Most configuration problems of TinkerPop with Spark on Yarn stem from 
three reasons:
+
+1. `SparkGraphComputer` creates its own `SparkContext` so it does not get 
any configs from the usual `spark-submit` command.
+2. The TinkerPop Spark plugin did not include Spark on Yarn runtime 
dependencies until version 3.2.7/3.3.1.
+3. Resolving reason 2 by adding the cluster's `spark-assembly` jar to the 
classpath creates a host of version
+conflicts, because Spark 1.x dependency versions have remained frozen 
since 2014.
+
+The current recipe follows a minimalist approach in which no dependencies 
are added to the dependencies
+included in the TinkerPop binary distribution. The Hadoop cluster's Spark 
installation is completely ignored. This
+approach minimizes the chance of dependency version conflicts.
+
+Prerequisites
+~
+This recipe is suitable for both a real external and a local pseudo Hadoop 
cluster. While the recipe is maintained
+for the vanilla Hadoop pseudo-cluster, it has been reported to work on 
real clusters with Hadoop distributions
+from various vendors.
+
+If you want to try the recipe on a local Hadoop pseudo-cluster, the 
easiest way to install
+it is to look at the install script at 
https://github.com/apache/tinkerpop/blob/x.y.z/docker/hadoop/install.sh
+and the `start hadoop` section of 
https://github.com/apache/tinkerpop/blob/x.y.z/docker/scripts/build.sh.
+
+This recipe assumes that you installed the gremlin console with the
+http://tinkerpop.apache.org/docs/x.y.z/reference/#spark-plugin[spark 
plugin] (the
+http://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-plugin[hadoop 
plugin] is optional). Your Hadoop cluster
+may have been configured to use file compression, e.g. lzo compression. If 
so, you need to copy the relevant
+jar (e.g. `hadoop-lzo-*.jar`) to gremlin console's `ext/spark-gremlin/lib` 
folder.
+
+For starting the gremlin console in the right environment, create a shell 
script (e.g. `bin/spark-yarn.sh`) with the
+contents below. Of course, actual values for `GREMLIN_HOME`, `HADOOP_HOME` 
and `HADOOP_CONF_DIR` need to be adapted to
+your particular environment.
+
+[source]
+
+#!/bin/bash
+# Variables to be adapted to the actual environment

+GREMLIN_HOME=/home/yourdir/lib/apache-tinkerpop-gremlin-console-x.y.z-standalone
+export HADOOP_HOME=/usr/local/lib/hadoop-2.7.2
+export HADOOP_CONF_DIR=/usr/local/lib/hadoop-2.7.2/etc/hadoop
+
+# Have TinkerPop find the hadoop cluster configs and hadoop native 
libraries
+export CLASSPATH=$HADOOP_CONF_DIR
+export 
JAVA_OPTIONS="-Djava.library.path=$HADOOP_HOME/lib/native:$HADOOP_HOME/lib/native/Linux-amd64-64"
+
+# Start gremlin-console without getting the HADOOP_GREMLIN_LIBS warning
+cd $GREMLIN_HOME
+[ ! -e empty ] && mkdir empty
+export HADOOP_GREMLIN_LIBS=$GREMLIN_HOME/empty
+bin/gremlin.sh

[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...

2017-10-12 Thread spmallette
Github user spmallette commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/721#discussion_r144345569
  
--- Diff: CHANGELOG.asciidoc ---
@@ -43,7 +43,13 @@ 
image::https://raw.githubusercontent.com/apache/tinkerpop/master/docs/static/ima
 * Fixed a bug in `Neo4jGremlinPlugin` that prevented it from loading 
properly in the `GremlinPythonScriptEngine`.
 * Fixed a bug in `ComputerVerificationStrategy` where child traversals 
were being analyzed prior to compilation.
 * Fixed a bug that prevented Gremlin from ordering lists and streams made 
of mixed number types.
-* Fixed a bug where `keepLabels` were being corrupted because a defensive 
copy was not being made when they were being set by `PathRetractionStrategy`. 
+* Fixed a bug where `keepLabels` were being corrupted because a defensive 
copy was not being made when they were being set by `PathRetractionStrategy`.
+* Added a recipe for OLAP traversals with Spark on Yarn
+
+Improvements
--- End diff --

No need to add the "Improvements" section. That gets added on release and 
we generate that output from JIRA.


---


[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...

2017-10-03 Thread okram
Github user okram commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/721#discussion_r142426106
  
--- Diff: hadoop-gremlin/conf/hadoop-gryo.properties ---
@@ -29,8 +29,8 @@ gremlin.hadoop.outputLocation=output
 spark.master=local[4]
 spark.executor.memory=1g
 
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
+gremlin.spark.persistContext=true
--- End diff --

Yes, we know what it does, but by default we have it set to `false` and we 
don't want to create a backwards breaking usage.


---


[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...

2017-10-02 Thread vtslab
Github user vtslab commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/721#discussion_r142223415
  
--- Diff: hadoop-gremlin/conf/hadoop-gryo.properties ---
@@ -29,8 +29,8 @@ gremlin.hadoop.outputLocation=output
 spark.master=local[4]
 spark.executor.memory=1g
 
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
+gremlin.spark.persistContext=true
--- End diff --

Good question, I had not justified this yet. My original reason was that 
stopping both the SparkContext and the gremlin console as in the docs 
generation, can lead to race conditions in spark-yarn with random connection 
exceptions showing up in the console output in the docs. But as a bonus, 
follow-up OLAP queries get answered much faster as you skip the overhead for 
getting resources from yarn. This is what is also done in Apache Zeppelin, 
Spark shell and the like.

The alternative is to set the property in the console together with the 
other properties. This would require some more explanation and configuration 
work afterwards to/from the recipe users, but would leave the properties file 
untouched. I like the current proposal better, but I am fine with both.


---


[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...

2017-10-01 Thread okram
Github user okram commented on a diff in the pull request:

https://github.com/apache/tinkerpop/pull/721#discussion_r142035586
  
--- Diff: hadoop-gremlin/conf/hadoop-gryo.properties ---
@@ -29,8 +29,8 @@ gremlin.hadoop.outputLocation=output
 spark.master=local[4]
 spark.executor.memory=1g
 
spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
+gremlin.spark.persistContext=true
--- End diff --

Why is this defaulted now?


---


[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...

2017-09-24 Thread vtslab
GitHub user vtslab opened a pull request:

https://github.com/apache/tinkerpop/pull/721

TINKERPOP-1786 Recipe and missing manifest items for Spark on Yarn (TP32)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vtslab/incubator-tinkerpop 
spark-yarn-recipe-tp32

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/tinkerpop/pull/721.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #721


commit 250042b66b49d73619f7f25177c7ce755202e337
Author: HadoopMarc 
Date:   2017-09-10T12:45:45Z

Added spark-yarn recipe and missing manifest items in spark-gremlin




---