[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...
Github user asfgit closed the pull request at: https://github.com/apache/tinkerpop/pull/721 ---
[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...
Github user pluradj commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/721#discussion_r144719670 --- Diff: docs/src/recipes/olap-spark-yarn.asciidoc --- @@ -0,0 +1,153 @@ + +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. + +[[olap-spark-yarn]] +OLAP traversals with Spark on Yarn +-- + +TinkerPop's combination of http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer] +and http://tinkerpop.apache.org/docs/current/reference/#_properties_files[HadoopGraph] allows for running +distributed, analytical graph queries (OLAP) on a computer cluster. The +http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[reference documentation] covers the cases +where Spark runs locally or where the cluster is managed by a Spark server. However, many users can only run OLAP jobs +via the http://hadoop.apache.org/[Hadoop 2.x] Resource Manager (Yarn), which requires `SparkGraphComputer` to be +configured differently. This recipe describes this configuration. + +Approach + + +Most configuration problems of TinkerPop with Spark on Yarn stem from three reasons: + +1. `SparkGraphComputer` creates its own `SparkContext` so it does not get any configs from the usual `spark-submit` command. +2. The TinkerPop Spark plugin did not include Spark on Yarn runtime dependencies until version 3.2.7/3.3.1. +3. Resolving reason 2 by adding the cluster's `spark-assembly` jar to the classpath creates a host of version +conflicts, because Spark 1.x dependency versions have remained frozen since 2014. + +The current recipe follows a minimalist approach in which no dependencies are added to the dependencies +included in the TinkerPop binary distribution. The Hadoop cluster's Spark installation is completely ignored. This +approach minimizes the chance of dependency version conflicts. + +Prerequisites +~ +This recipe is suitable for both a real external and a local pseudo Hadoop cluster. While the recipe is maintained +for the vanilla Hadoop pseudo-cluster, it has been reported to work on real clusters with Hadoop distributions +from various vendors. + +If you want to try the recipe on a local Hadoop pseudo-cluster, the easiest way to install +it is to look at the install script at https://github.com/apache/tinkerpop/blob/x.y.z/docker/hadoop/install.sh +and the `start hadoop` section of https://github.com/apache/tinkerpop/blob/x.y.z/docker/scripts/build.sh. + +This recipe assumes that you installed the gremlin console with the +http://tinkerpop.apache.org/docs/x.y.z/reference/#spark-plugin[spark plugin] (the +http://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-plugin[hadoop plugin] is optional). Your Hadoop cluster +may have been configured to use file compression, e.g. lzo compression. If so, you need to copy the relevant +jar (e.g. `hadoop-lzo-*.jar`) to gremlin console's `ext/spark-gremlin/lib` folder. + +For starting the gremlin console in the right environment, create a shell script (e.g. `bin/spark-yarn.sh`) with the +contents below. Of course, actual values for `GREMLIN_HOME`, `HADOOP_HOME` and `HADOOP_CONF_DIR` need to be adapted to +your particular environment. + +[source] + +#!/bin/bash +# Variables to be adapted to the actual environment +GREMLIN_HOME=/home/yourdir/lib/apache-tinkerpop-gremlin-console-x.y.z-standalone +export HADOOP_HOME=/usr/local/lib/hadoop-2.7.2 +export HADOOP_CONF_DIR=/usr/local/lib/hadoop-2.7.2/etc/hadoop + +# Have TinkerPop find the hadoop cluster configs and hadoop native libraries +export CLASSPATH=$HADOOP_CONF_DIR +export JAVA_OPTIONS="-Djava.library.path=$HADOOP_HOME/lib/native:$HADOOP_HOME/lib/native/Linux-amd64-64" + +# Start gremlin-console without getting the HADOOP_GREMLIN_LIBS warning +cd $GREMLIN_HOME +[ ! -e empty ] && mkdir empty +export HADOOP_GREMLIN_LIBS=$GREMLIN_HOME/empty +bin/gremlin.sh +--
[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...
Github user pluradj commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/721#discussion_r144719827 --- Diff: docs/src/recipes/olap-spark-yarn.asciidoc --- @@ -0,0 +1,153 @@ + +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. + +[[olap-spark-yarn]] +OLAP traversals with Spark on Yarn +-- + +TinkerPop's combination of http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer] +and http://tinkerpop.apache.org/docs/current/reference/#_properties_files[HadoopGraph] allows for running +distributed, analytical graph queries (OLAP) on a computer cluster. The +http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[reference documentation] covers the cases +where Spark runs locally or where the cluster is managed by a Spark server. However, many users can only run OLAP jobs +via the http://hadoop.apache.org/[Hadoop 2.x] Resource Manager (Yarn), which requires `SparkGraphComputer` to be +configured differently. This recipe describes this configuration. + +Approach + + +Most configuration problems of TinkerPop with Spark on Yarn stem from three reasons: + +1. `SparkGraphComputer` creates its own `SparkContext` so it does not get any configs from the usual `spark-submit` command. +2. The TinkerPop Spark plugin did not include Spark on Yarn runtime dependencies until version 3.2.7/3.3.1. +3. Resolving reason 2 by adding the cluster's `spark-assembly` jar to the classpath creates a host of version +conflicts, because Spark 1.x dependency versions have remained frozen since 2014. + +The current recipe follows a minimalist approach in which no dependencies are added to the dependencies +included in the TinkerPop binary distribution. The Hadoop cluster's Spark installation is completely ignored. This +approach minimizes the chance of dependency version conflicts. + +Prerequisites +~ +This recipe is suitable for both a real external and a local pseudo Hadoop cluster. While the recipe is maintained +for the vanilla Hadoop pseudo-cluster, it has been reported to work on real clusters with Hadoop distributions +from various vendors. + +If you want to try the recipe on a local Hadoop pseudo-cluster, the easiest way to install +it is to look at the install script at https://github.com/apache/tinkerpop/blob/x.y.z/docker/hadoop/install.sh +and the `start hadoop` section of https://github.com/apache/tinkerpop/blob/x.y.z/docker/scripts/build.sh. + +This recipe assumes that you installed the gremlin console with the +http://tinkerpop.apache.org/docs/x.y.z/reference/#spark-plugin[spark plugin] (the +http://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-plugin[hadoop plugin] is optional). Your Hadoop cluster +may have been configured to use file compression, e.g. lzo compression. If so, you need to copy the relevant +jar (e.g. `hadoop-lzo-*.jar`) to gremlin console's `ext/spark-gremlin/lib` folder. + +For starting the gremlin console in the right environment, create a shell script (e.g. `bin/spark-yarn.sh`) with the +contents below. Of course, actual values for `GREMLIN_HOME`, `HADOOP_HOME` and `HADOOP_CONF_DIR` need to be adapted to +your particular environment. + +[source] + +#!/bin/bash +# Variables to be adapted to the actual environment +GREMLIN_HOME=/home/yourdir/lib/apache-tinkerpop-gremlin-console-x.y.z-standalone +export HADOOP_HOME=/usr/local/lib/hadoop-2.7.2 +export HADOOP_CONF_DIR=/usr/local/lib/hadoop-2.7.2/etc/hadoop + +# Have TinkerPop find the hadoop cluster configs and hadoop native libraries +export CLASSPATH=$HADOOP_CONF_DIR +export JAVA_OPTIONS="-Djava.library.path=$HADOOP_HOME/lib/native:$HADOOP_HOME/lib/native/Linux-amd64-64" + +# Start gremlin-console without getting the HADOOP_GREMLIN_LIBS warning +cd $GREMLIN_HOME +[ ! -e empty ] && mkdir empty +export HADOOP_GREMLIN_LIBS=$GREMLIN_HOME/empty +bin/gremlin.sh +--
[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...
Github user pluradj commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/721#discussion_r144653328 --- Diff: docs/src/recipes/olap-spark-yarn.asciidoc --- @@ -0,0 +1,153 @@ + +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. + +[[olap-spark-yarn]] +OLAP traversals with Spark on Yarn +-- + +TinkerPop's combination of http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer] +and http://tinkerpop.apache.org/docs/current/reference/#_properties_files[HadoopGraph] allows for running +distributed, analytical graph queries (OLAP) on a computer cluster. The +http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[reference documentation] covers the cases +where Spark runs locally or where the cluster is managed by a Spark server. However, many users can only run OLAP jobs +via the http://hadoop.apache.org/[Hadoop 2.x] Resource Manager (Yarn), which requires `SparkGraphComputer` to be --- End diff -- capitalize YARN throughout the doc ---
[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...
Github user pluradj commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/721#discussion_r144719566 --- Diff: docs/src/recipes/olap-spark-yarn.asciidoc --- @@ -0,0 +1,153 @@ + +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. + +[[olap-spark-yarn]] +OLAP traversals with Spark on Yarn +-- + +TinkerPop's combination of http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer] +and http://tinkerpop.apache.org/docs/current/reference/#_properties_files[HadoopGraph] allows for running +distributed, analytical graph queries (OLAP) on a computer cluster. The +http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[reference documentation] covers the cases +where Spark runs locally or where the cluster is managed by a Spark server. However, many users can only run OLAP jobs +via the http://hadoop.apache.org/[Hadoop 2.x] Resource Manager (Yarn), which requires `SparkGraphComputer` to be +configured differently. This recipe describes this configuration. + +Approach + + +Most configuration problems of TinkerPop with Spark on Yarn stem from three reasons: + +1. `SparkGraphComputer` creates its own `SparkContext` so it does not get any configs from the usual `spark-submit` command. +2. The TinkerPop Spark plugin did not include Spark on Yarn runtime dependencies until version 3.2.7/3.3.1. +3. Resolving reason 2 by adding the cluster's `spark-assembly` jar to the classpath creates a host of version +conflicts, because Spark 1.x dependency versions have remained frozen since 2014. + +The current recipe follows a minimalist approach in which no dependencies are added to the dependencies +included in the TinkerPop binary distribution. The Hadoop cluster's Spark installation is completely ignored. This +approach minimizes the chance of dependency version conflicts. + +Prerequisites +~ +This recipe is suitable for both a real external and a local pseudo Hadoop cluster. While the recipe is maintained +for the vanilla Hadoop pseudo-cluster, it has been reported to work on real clusters with Hadoop distributions +from various vendors. + +If you want to try the recipe on a local Hadoop pseudo-cluster, the easiest way to install +it is to look at the install script at https://github.com/apache/tinkerpop/blob/x.y.z/docker/hadoop/install.sh +and the `start hadoop` section of https://github.com/apache/tinkerpop/blob/x.y.z/docker/scripts/build.sh. + +This recipe assumes that you installed the gremlin console with the +http://tinkerpop.apache.org/docs/x.y.z/reference/#spark-plugin[spark plugin] (the +http://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-plugin[hadoop plugin] is optional). Your Hadoop cluster +may have been configured to use file compression, e.g. lzo compression. If so, you need to copy the relevant --- End diff -- capitalize LZO ---
[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...
Github user pluradj commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/721#discussion_r144653511 --- Diff: docs/src/recipes/olap-spark-yarn.asciidoc --- @@ -0,0 +1,153 @@ + +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. + +[[olap-spark-yarn]] +OLAP traversals with Spark on Yarn +-- + +TinkerPop's combination of http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer] +and http://tinkerpop.apache.org/docs/current/reference/#_properties_files[HadoopGraph] allows for running +distributed, analytical graph queries (OLAP) on a computer cluster. The +http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[reference documentation] covers the cases +where Spark runs locally or where the cluster is managed by a Spark server. However, many users can only run OLAP jobs +via the http://hadoop.apache.org/[Hadoop 2.x] Resource Manager (Yarn), which requires `SparkGraphComputer` to be +configured differently. This recipe describes this configuration. + +Approach + + +Most configuration problems of TinkerPop with Spark on Yarn stem from three reasons: + +1. `SparkGraphComputer` creates its own `SparkContext` so it does not get any configs from the usual `spark-submit` command. +2. The TinkerPop Spark plugin did not include Spark on Yarn runtime dependencies until version 3.2.7/3.3.1. +3. Resolving reason 2 by adding the cluster's `spark-assembly` jar to the classpath creates a host of version +conflicts, because Spark 1.x dependency versions have remained frozen since 2014. + +The current recipe follows a minimalist approach in which no dependencies are added to the dependencies +included in the TinkerPop binary distribution. The Hadoop cluster's Spark installation is completely ignored. This +approach minimizes the chance of dependency version conflicts. + +Prerequisites +~ +This recipe is suitable for both a real external and a local pseudo Hadoop cluster. While the recipe is maintained +for the vanilla Hadoop pseudo-cluster, it has been reported to work on real clusters with Hadoop distributions +from various vendors. + +If you want to try the recipe on a local Hadoop pseudo-cluster, the easiest way to install +it is to look at the install script at https://github.com/apache/tinkerpop/blob/x.y.z/docker/hadoop/install.sh +and the `start hadoop` section of https://github.com/apache/tinkerpop/blob/x.y.z/docker/scripts/build.sh. + +This recipe assumes that you installed the gremlin console with the +http://tinkerpop.apache.org/docs/x.y.z/reference/#spark-plugin[spark plugin] (the +http://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-plugin[hadoop plugin] is optional). Your Hadoop cluster +may have been configured to use file compression, e.g. lzo compression. If so, you need to copy the relevant +jar (e.g. `hadoop-lzo-*.jar`) to gremlin console's `ext/spark-gremlin/lib` folder. + +For starting the gremlin console in the right environment, create a shell script (e.g. `bin/spark-yarn.sh`) with the --- End diff -- capitalize Gremlin Console throughout the doc ---
[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...
Github user pluradj commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/721#discussion_r144655107 --- Diff: docs/src/recipes/olap-spark-yarn.asciidoc --- @@ -0,0 +1,153 @@ + +Licensed to the Apache Software Foundation (ASF) under one or more +contributor license agreements. See the NOTICE file distributed with +this work for additional information regarding copyright ownership. +The ASF licenses this file to You under the Apache License, Version 2.0 +(the "License"); you may not use this file except in compliance with +the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. + +[[olap-spark-yarn]] +OLAP traversals with Spark on Yarn +-- + +TinkerPop's combination of http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[SparkGraphComputer] +and http://tinkerpop.apache.org/docs/current/reference/#_properties_files[HadoopGraph] allows for running +distributed, analytical graph queries (OLAP) on a computer cluster. The +http://tinkerpop.apache.org/docs/current/reference/#sparkgraphcomputer[reference documentation] covers the cases +where Spark runs locally or where the cluster is managed by a Spark server. However, many users can only run OLAP jobs +via the http://hadoop.apache.org/[Hadoop 2.x] Resource Manager (Yarn), which requires `SparkGraphComputer` to be +configured differently. This recipe describes this configuration. + +Approach + + +Most configuration problems of TinkerPop with Spark on Yarn stem from three reasons: + +1. `SparkGraphComputer` creates its own `SparkContext` so it does not get any configs from the usual `spark-submit` command. +2. The TinkerPop Spark plugin did not include Spark on Yarn runtime dependencies until version 3.2.7/3.3.1. +3. Resolving reason 2 by adding the cluster's `spark-assembly` jar to the classpath creates a host of version +conflicts, because Spark 1.x dependency versions have remained frozen since 2014. + +The current recipe follows a minimalist approach in which no dependencies are added to the dependencies +included in the TinkerPop binary distribution. The Hadoop cluster's Spark installation is completely ignored. This +approach minimizes the chance of dependency version conflicts. + +Prerequisites +~ +This recipe is suitable for both a real external and a local pseudo Hadoop cluster. While the recipe is maintained +for the vanilla Hadoop pseudo-cluster, it has been reported to work on real clusters with Hadoop distributions +from various vendors. + +If you want to try the recipe on a local Hadoop pseudo-cluster, the easiest way to install +it is to look at the install script at https://github.com/apache/tinkerpop/blob/x.y.z/docker/hadoop/install.sh +and the `start hadoop` section of https://github.com/apache/tinkerpop/blob/x.y.z/docker/scripts/build.sh. + +This recipe assumes that you installed the gremlin console with the +http://tinkerpop.apache.org/docs/x.y.z/reference/#spark-plugin[spark plugin] (the +http://tinkerpop.apache.org/docs/x.y.z/reference/#hadoop-plugin[hadoop plugin] is optional). Your Hadoop cluster +may have been configured to use file compression, e.g. lzo compression. If so, you need to copy the relevant +jar (e.g. `hadoop-lzo-*.jar`) to gremlin console's `ext/spark-gremlin/lib` folder. + +For starting the gremlin console in the right environment, create a shell script (e.g. `bin/spark-yarn.sh`) with the +contents below. Of course, actual values for `GREMLIN_HOME`, `HADOOP_HOME` and `HADOOP_CONF_DIR` need to be adapted to +your particular environment. + +[source] + +#!/bin/bash +# Variables to be adapted to the actual environment +GREMLIN_HOME=/home/yourdir/lib/apache-tinkerpop-gremlin-console-x.y.z-standalone +export HADOOP_HOME=/usr/local/lib/hadoop-2.7.2 +export HADOOP_CONF_DIR=/usr/local/lib/hadoop-2.7.2/etc/hadoop + +# Have TinkerPop find the hadoop cluster configs and hadoop native libraries +export CLASSPATH=$HADOOP_CONF_DIR +export JAVA_OPTIONS="-Djava.library.path=$HADOOP_HOME/lib/native:$HADOOP_HOME/lib/native/Linux-amd64-64" + +# Start gremlin-console without getting the HADOOP_GREMLIN_LIBS warning +cd $GREMLIN_HOME +[ ! -e empty ] && mkdir empty +export HADOOP_GREMLIN_LIBS=$GREMLIN_HOME/empty +bin/gremlin.sh +--
[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...
Github user spmallette commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/721#discussion_r144345569 --- Diff: CHANGELOG.asciidoc --- @@ -43,7 +43,13 @@ image::https://raw.githubusercontent.com/apache/tinkerpop/master/docs/static/ima * Fixed a bug in `Neo4jGremlinPlugin` that prevented it from loading properly in the `GremlinPythonScriptEngine`. * Fixed a bug in `ComputerVerificationStrategy` where child traversals were being analyzed prior to compilation. * Fixed a bug that prevented Gremlin from ordering lists and streams made of mixed number types. -* Fixed a bug where `keepLabels` were being corrupted because a defensive copy was not being made when they were being set by `PathRetractionStrategy`. +* Fixed a bug where `keepLabels` were being corrupted because a defensive copy was not being made when they were being set by `PathRetractionStrategy`. +* Added a recipe for OLAP traversals with Spark on Yarn + +Improvements --- End diff -- No need to add the "Improvements" section. That gets added on release and we generate that output from JIRA. ---
[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...
Github user okram commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/721#discussion_r142426106 --- Diff: hadoop-gremlin/conf/hadoop-gryo.properties --- @@ -29,8 +29,8 @@ gremlin.hadoop.outputLocation=output spark.master=local[4] spark.executor.memory=1g spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer +gremlin.spark.persistContext=true --- End diff -- Yes, we know what it does, but by default we have it set to `false` and we don't want to create a backwards breaking usage. ---
[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...
Github user vtslab commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/721#discussion_r142223415 --- Diff: hadoop-gremlin/conf/hadoop-gryo.properties --- @@ -29,8 +29,8 @@ gremlin.hadoop.outputLocation=output spark.master=local[4] spark.executor.memory=1g spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer +gremlin.spark.persistContext=true --- End diff -- Good question, I had not justified this yet. My original reason was that stopping both the SparkContext and the gremlin console as in the docs generation, can lead to race conditions in spark-yarn with random connection exceptions showing up in the console output in the docs. But as a bonus, follow-up OLAP queries get answered much faster as you skip the overhead for getting resources from yarn. This is what is also done in Apache Zeppelin, Spark shell and the like. The alternative is to set the property in the console together with the other properties. This would require some more explanation and configuration work afterwards to/from the recipe users, but would leave the properties file untouched. I like the current proposal better, but I am fine with both. ---
[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...
Github user okram commented on a diff in the pull request: https://github.com/apache/tinkerpop/pull/721#discussion_r142035586 --- Diff: hadoop-gremlin/conf/hadoop-gryo.properties --- @@ -29,8 +29,8 @@ gremlin.hadoop.outputLocation=output spark.master=local[4] spark.executor.memory=1g spark.serializer=org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer +gremlin.spark.persistContext=true --- End diff -- Why is this defaulted now? ---
[GitHub] tinkerpop pull request #721: TINKERPOP-1786 Recipe and missing manifest item...
GitHub user vtslab opened a pull request: https://github.com/apache/tinkerpop/pull/721 TINKERPOP-1786 Recipe and missing manifest items for Spark on Yarn (TP32) You can merge this pull request into a Git repository by running: $ git pull https://github.com/vtslab/incubator-tinkerpop spark-yarn-recipe-tp32 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/721.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #721 commit 250042b66b49d73619f7f25177c7ce755202e337 Author: HadoopMarc Date: 2017-09-10T12:45:45Z Added spark-yarn recipe and missing manifest items in spark-gremlin ---