[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26401382 --- Diff: bin/pyspark --- @@ -95,26 +83,13 @@ export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH # Load the PySpark shell.py script when ./pyspark is used interactively: export OLD_PYTHONSTARTUP=$PYTHONSTARTUP -export PYTHONSTARTUP=$FWDIR/python/pyspark/shell.py - -# Build up arguments list manually to preserve quotes and backslashes. -# We export Spark submit arguments as an environment variable because shell.py must run as a -# PYTHONSTARTUP script, which does not take in arguments. This is required for IPython notebooks. -SUBMIT_USAGE_FUNCTION=usage -gatherSparkSubmitOpts $@ -PYSPARK_SUBMIT_ARGS= -whitespace=[[:space:]] -for i in ${SUBMISSION_OPTS[@]}; do - if [[ $i =~ \ ]]; then i=$(echo $i | sed 's/\/\\\/g'); fi - if [[ $i =~ $whitespace ]]; then i=\$i\; fi - PYSPARK_SUBMIT_ARGS=$PYSPARK_SUBMIT_ARGS $i -done -export PYSPARK_SUBMIT_ARGS --- End diff -- Actually `pyspark-shell` shouldn't be in PYSPARK_SUBMIT_ARGS at all. It probably works out of luck when you add it at the end :-), but it's handled internally by the launcher library. (We should probably file a bug to remove this testing stuff from `bin/pyspark`. That would allow other cleanups in these scripts.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26361718 --- Diff: bin/pyspark --- @@ -95,26 +83,13 @@ export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH # Load the PySpark shell.py script when ./pyspark is used interactively: export OLD_PYTHONSTARTUP=$PYTHONSTARTUP -export PYTHONSTARTUP=$FWDIR/python/pyspark/shell.py - -# Build up arguments list manually to preserve quotes and backslashes. -# We export Spark submit arguments as an environment variable because shell.py must run as a -# PYTHONSTARTUP script, which does not take in arguments. This is required for IPython notebooks. -SUBMIT_USAGE_FUNCTION=usage -gatherSparkSubmitOpts $@ -PYSPARK_SUBMIT_ARGS= -whitespace=[[:space:]] -for i in ${SUBMISSION_OPTS[@]}; do - if [[ $i =~ \ ]]; then i=$(echo $i | sed 's/\/\\\/g'); fi - if [[ $i =~ $whitespace ]]; then i=\$i\; fi - PYSPARK_SUBMIT_ARGS=$PYSPARK_SUBMIT_ARGS $i -done -export PYSPARK_SUBMIT_ARGS --- End diff -- Looks like `pyspark-shell` have to in the end of `PYSPARK_SUBMIT_ARGS`, I changed the ordering to `--jars ${KAFKA_ASSEBMLY_JAR} pyspark-shell`, so it works. I'm not sure is it a right behavior or just a bug? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26320982 --- Diff: bin/pyspark --- @@ -95,26 +83,13 @@ export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH # Load the PySpark shell.py script when ./pyspark is used interactively: export OLD_PYTHONSTARTUP=$PYTHONSTARTUP -export PYTHONSTARTUP=$FWDIR/python/pyspark/shell.py - -# Build up arguments list manually to preserve quotes and backslashes. -# We export Spark submit arguments as an environment variable because shell.py must run as a -# PYTHONSTARTUP script, which does not take in arguments. This is required for IPython notebooks. -SUBMIT_USAGE_FUNCTION=usage -gatherSparkSubmitOpts $@ -PYSPARK_SUBMIT_ARGS= -whitespace=[[:space:]] -for i in ${SUBMISSION_OPTS[@]}; do - if [[ $i =~ \ ]]; then i=$(echo $i | sed 's/\/\\\/g'); fi - if [[ $i =~ $whitespace ]]; then i=\$i\; fi - PYSPARK_SUBMIT_ARGS=$PYSPARK_SUBMIT_ARGS $i -done -export PYSPARK_SUBMIT_ARGS --- End diff -- It depends on how the unit tests are run. Are they run through spark-submit? The you just pass `--jars` to spark-submit. If they're run by the code under the `$SPARK_TESTING` in this file, then setting that env variable will probably work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26358490 --- Diff: bin/pyspark --- @@ -95,26 +83,13 @@ export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH # Load the PySpark shell.py script when ./pyspark is used interactively: export OLD_PYTHONSTARTUP=$PYTHONSTARTUP -export PYTHONSTARTUP=$FWDIR/python/pyspark/shell.py - -# Build up arguments list manually to preserve quotes and backslashes. -# We export Spark submit arguments as an environment variable because shell.py must run as a -# PYTHONSTARTUP script, which does not take in arguments. This is required for IPython notebooks. -SUBMIT_USAGE_FUNCTION=usage -gatherSparkSubmitOpts $@ -PYSPARK_SUBMIT_ARGS= -whitespace=[[:space:]] -for i in ${SUBMISSION_OPTS[@]}; do - if [[ $i =~ \ ]]; then i=$(echo $i | sed 's/\/\\\/g'); fi - if [[ $i =~ $whitespace ]]; then i=\$i\; fi - PYSPARK_SUBMIT_ARGS=$PYSPARK_SUBMIT_ARGS $i -done -export PYSPARK_SUBMIT_ARGS --- End diff -- Thanks @vanzin for your reply. Actually the code is running under `$SPARK_TESTING`, I tried to use `PYSPARK_SUBMIT_ARGS` to expose the environment, like this: ``` export PYSPARK_SUBMIT_ARGS=pyspark-shell --jars ${KAFKA_ASSEMBLY_JAR} ``` I'm not sure is it the right way? But seems the jar cannot add into the SparkContext, is there anything else I should take care? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-78219718 Okay cool - LGTM I will pull this in. I just did some local sanity tests, built with Maven and ran (a few) Maven tests. We'll need to keep any eye on the Maven build tomorrow to see if this is succeeding there, since it's not caught by the PRB. Thanks @vanzin for all the work on this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3916 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user jerryshao commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26279970 --- Diff: bin/pyspark --- @@ -95,26 +83,13 @@ export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH # Load the PySpark shell.py script when ./pyspark is used interactively: export OLD_PYTHONSTARTUP=$PYTHONSTARTUP -export PYTHONSTARTUP=$FWDIR/python/pyspark/shell.py - -# Build up arguments list manually to preserve quotes and backslashes. -# We export Spark submit arguments as an environment variable because shell.py must run as a -# PYTHONSTARTUP script, which does not take in arguments. This is required for IPython notebooks. -SUBMIT_USAGE_FUNCTION=usage -gatherSparkSubmitOpts $@ -PYSPARK_SUBMIT_ARGS= -whitespace=[[:space:]] -for i in ${SUBMISSION_OPTS[@]}; do - if [[ $i =~ \ ]]; then i=$(echo $i | sed 's/\/\\\/g'); fi - if [[ $i =~ $whitespace ]]; then i=\$i\; fi - PYSPARK_SUBMIT_ARGS=$PYSPARK_SUBMIT_ARGS $i -done -export PYSPARK_SUBMIT_ARGS --- End diff -- Hi @vanzin , I'm now adding the Kafka python unit test, and it requires to add Kafka assembly jar with `--jars` argument, as I see now you removed this codes, so for now how to add jars with pyspark for unit test, using `PYSPARK_SUBMIT_ARGS`? Thanks a lot. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-78182670 [Test build #28447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28447/consoleFull) for PR 3916 at commit [`18c7e4d`](https://github.com/apache/spark/commit/18c7e4db3b9713c4bc13487e3a15e59b6bf2dc58). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-78182678 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28447/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-78143279 Yeah this LGTM. There are a few other minor comments I left regarding visibility but they don't have to block this PR from going in. I haven't tested whether all the cases (Windows, YARN, PySpark...) are maintained as before and I won't have the bandwidth to do so this week, but we can always do that separately. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-78003405 @andrewor14 are you good with this one? I'd like to merge it soon! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-78158391 FYI I ran our test suite with this change (covers standalone/client, yarn/both, spark-shell and pyspark, Linux only) and all looks ok. Also did some manual testing on Windows. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-78171114 [Test build #28447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28447/consoleFull) for PR 3916 at commit [`18c7e4d`](https://github.com/apache/spark/commit/18c7e4db3b9713c4bc13487e3a15e59b6bf2dc58). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26079948 --- Diff: bin/spark-sql --- @@ -43,15 +46,12 @@ function usage { echo echo CLI options: $FWDIR/bin/spark-class $CLASS --help 21 | grep -v $pattern 12 + exit $2 --- End diff -- ``` exit $2 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26079926 --- Diff: bin/spark-sql --- @@ -25,12 +25,15 @@ set -o posix # NOTE: This exact class name is matched downstream by SparkSubmit. # Any changes need to be reflected there. -CLASS=org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver +export CLASS=org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver # Figure out where Spark is installed -FWDIR=$(cd `dirname $0`/..; pwd) +export FWDIR=$(cd `dirname $0`/..; pwd) function usage { + if [ -n $1 ]; then +echo $1 --- End diff -- ``` echo $1 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26079987 --- Diff: bin/spark-submit --- @@ -17,58 +17,18 @@ # limitations under the License. # -# NOTE: Any changes in this file must be reflected in SparkSubmitDriverBootstrapper.scala! - -export SPARK_HOME=$(cd `dirname $0`/..; pwd) -ORIG_ARGS=($@) - -# Set COLUMNS for progress bar -export COLUMNS=`tput cols` - -while (($#)); do - if [ $1 = --deploy-mode ]; then -SPARK_SUBMIT_DEPLOY_MODE=$2 - elif [ $1 = --properties-file ]; then -SPARK_SUBMIT_PROPERTIES_FILE=$2 - elif [ $1 = --driver-memory ]; then -export SPARK_SUBMIT_DRIVER_MEMORY=$2 - elif [ $1 = --driver-library-path ]; then -export SPARK_SUBMIT_LIBRARY_PATH=$2 - elif [ $1 = --driver-class-path ]; then -export SPARK_SUBMIT_CLASSPATH=$2 - elif [ $1 = --driver-java-options ]; then -export SPARK_SUBMIT_OPTS=$2 - elif [ $1 = --master ]; then -export MASTER=$2 - fi - shift -done - -if [ -z $SPARK_CONF_DIR ]; then - export SPARK_CONF_DIR=$SPARK_HOME/conf -fi -DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf -if [ $MASTER == yarn-cluster ]; then - SPARK_SUBMIT_DEPLOY_MODE=cluster +SPARK_HOME=$(cd `dirname $0`/..; pwd) + +# Only define a usage function if an upstream script hasn't done so. +if ! type -t usage /dev/null 21; then + usage() { +if [ -n $1 ]; then + echo $1 +fi +$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit --help +exit $2 --- End diff -- ``` exit $2 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26079978 --- Diff: bin/spark-submit --- @@ -17,58 +17,18 @@ # limitations under the License. # -# NOTE: Any changes in this file must be reflected in SparkSubmitDriverBootstrapper.scala! - -export SPARK_HOME=$(cd `dirname $0`/..; pwd) -ORIG_ARGS=($@) - -# Set COLUMNS for progress bar -export COLUMNS=`tput cols` - -while (($#)); do - if [ $1 = --deploy-mode ]; then -SPARK_SUBMIT_DEPLOY_MODE=$2 - elif [ $1 = --properties-file ]; then -SPARK_SUBMIT_PROPERTIES_FILE=$2 - elif [ $1 = --driver-memory ]; then -export SPARK_SUBMIT_DRIVER_MEMORY=$2 - elif [ $1 = --driver-library-path ]; then -export SPARK_SUBMIT_LIBRARY_PATH=$2 - elif [ $1 = --driver-class-path ]; then -export SPARK_SUBMIT_CLASSPATH=$2 - elif [ $1 = --driver-java-options ]; then -export SPARK_SUBMIT_OPTS=$2 - elif [ $1 = --master ]; then -export MASTER=$2 - fi - shift -done - -if [ -z $SPARK_CONF_DIR ]; then - export SPARK_CONF_DIR=$SPARK_HOME/conf -fi -DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf -if [ $MASTER == yarn-cluster ]; then - SPARK_SUBMIT_DEPLOY_MODE=cluster +SPARK_HOME=$(cd `dirname $0`/..; pwd) + +# Only define a usage function if an upstream script hasn't done so. +if ! type -t usage /dev/null 21; then + usage() { +if [ -n $1 ]; then + echo $1 --- End diff -- ``` echo $1 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26080561 --- Diff: sbin/spark-daemon.sh --- @@ -121,45 +121,63 @@ if [ $SPARK_NICENESS = ]; then export SPARK_NICENESS=0 fi +run_command() { + mode=$1 + shift -case $option in - - (start|spark-submit) - -mkdir -p $SPARK_PID_DIR + mkdir -p $SPARK_PID_DIR -if [ -f $pid ]; then - TARGET_ID=$(cat $pid) - if [[ $(ps -p $TARGET_ID -o args=) =~ $command ]]; then -echo $command running as process $TARGET_ID. Stop it first. -exit 1 - fi + if [ -f $pid ]; then +TARGET_ID=$(cat $pid) +if [[ $(ps -p $TARGET_ID -o args=) =~ $command ]]; then + echo $command running as process $TARGET_ID. Stop it first. + exit 1 fi + fi -if [ $SPARK_MASTER != ]; then - echo rsync from $SPARK_MASTER - rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $SPARK_MASTER/ $SPARK_HOME -fi + if [ $SPARK_MASTER != ]; then +echo rsync from $SPARK_MASTER +rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $SPARK_MASTER/ $SPARK_HOME --- End diff -- ``` $SPARK_MASTER/ ``` Unless this variable should potentially be interpreted as multiple arguments to `rsync`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26080617 --- Diff: sbin/spark-daemon.sh --- @@ -121,45 +121,63 @@ if [ $SPARK_NICENESS = ]; then export SPARK_NICENESS=0 fi +run_command() { + mode=$1 + shift -case $option in - - (start|spark-submit) - -mkdir -p $SPARK_PID_DIR + mkdir -p $SPARK_PID_DIR -if [ -f $pid ]; then - TARGET_ID=$(cat $pid) - if [[ $(ps -p $TARGET_ID -o args=) =~ $command ]]; then -echo $command running as process $TARGET_ID. Stop it first. -exit 1 - fi + if [ -f $pid ]; then +TARGET_ID=$(cat $pid) +if [[ $(ps -p $TARGET_ID -o args=) =~ $command ]]; then + echo $command running as process $TARGET_ID. Stop it first. + exit 1 fi + fi -if [ $SPARK_MASTER != ]; then - echo rsync from $SPARK_MASTER - rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $SPARK_MASTER/ $SPARK_HOME -fi + if [ $SPARK_MASTER != ]; then +echo rsync from $SPARK_MASTER +rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $SPARK_MASTER/ $SPARK_HOME + fi -spark_rotate_log $log -echo starting $command, logging to $log -if [ $option == spark-submit ]; then - source $SPARK_HOME/bin/utils.sh - gatherSparkSubmitOpts $@ - nohup nice -n $SPARK_NICENESS $SPARK_PREFIX/bin/spark-submit --class $command \ -${SUBMISSION_OPTS[@]} spark-internal ${APPLICATION_OPTS[@]} $log 21 /dev/null -else + spark_rotate_log $log + echo starting $command, logging to $log + + case $mode in --- End diff -- ``` case $mode in ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77929101 Hey Marcelo, I think this is really close to being ready to merge. I noticed some extra output (a '0') when running usage on `spark-shell`: ``` ./bin/spark-shell --help 0 Usage: ./bin/spark-shell [options] Spark assembly has been built with Hive, including Datanucleus jars in classpath. ``` I think whether this should be an exposed API is still a question. However, I'm fine to merge v1 of this as an exposed API and get feedback. It would be helpful to hear from potential users whether this is useful to them... so far I haven't heard much in that direction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26080422 --- Diff: sbin/spark-daemon.sh --- @@ -121,45 +121,63 @@ if [ $SPARK_NICENESS = ]; then export SPARK_NICENESS=0 fi +run_command() { + mode=$1 --- End diff -- ``` mode=$1 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26081687 --- Diff: bin/pyspark2.cmd --- @@ -17,59 +17,22 @@ rem See the License for the specific language governing permissions and rem limitations under the License. rem -set SCALA_VERSION=2.10 - rem Figure out where the Spark framework is installed -set FWDIR=%~dp0..\ - -rem Export this as SPARK_HOME -set SPARK_HOME=%FWDIR% - -rem Test whether the user has built Spark -if exist %FWDIR%RELEASE goto skip_build_test -set FOUND_JAR=0 -for %%d in (%FWDIR%assembly\target\scala-%SCALA_VERSION%\spark-assembly*hadoop*.jar) do ( - set FOUND_JAR=1 -) -if [%FOUND_JAR%] == [0] ( - echo Failed to find Spark assembly JAR. - echo You need to build Spark before running this program. - goto exit -) -:skip_build_test +set SPARK_HOME=%~dp0.. rem Load environment variables from conf\spark-env.cmd, if it exists -if exist %FWDIR%conf\spark-env.cmd call %FWDIR%conf\spark-env.cmd +if exist %SPARK_HOME%\conf\spark-env.cmd call %SPARK_HOME%\conf\spark-env.cmd rem Figure out which Python to use. -if [%PYSPARK_PYTHON%] == [] set PYSPARK_PYTHON=python +if x%PYSPARK_DRIVER_PYTHON%==x ( + set PYSPARK_DRIVER_PYTHON=python + if not [%PYSPARK_PYTHON%] == [] set PYSPARK_DRIVER_PYTHON=%PYSPARK_PYTHON% +) -set PYTHONPATH=%FWDIR%python;%PYTHONPATH% -set PYTHONPATH=%FWDIR%python\lib\py4j-0.8.2.1-src.zip;%PYTHONPATH% +set PYTHONPATH=%SPARK_HOME%\python;%PYTHONPATH% +set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.8.2.1-src.zip;%PYTHONPATH% set OLD_PYTHONSTARTUP=%PYTHONSTARTUP% -set PYTHONSTARTUP=%FWDIR%python\pyspark\shell.py -set PYSPARK_SUBMIT_ARGS=%* - -echo Running %PYSPARK_PYTHON% with PYTHONPATH=%PYTHONPATH% - -rem Check whether the argument is a file -for /f %%i in ('echo %1^| findstr /R \.py') do ( - set PYTHON_FILE=%%i -) - -if [%PYTHON_FILE%] == [] ( - if [%IPYTHON%] == [1] ( - ipython %IPYTHON_OPTS% - ) else ( - %PYSPARK_PYTHON% - ) -) else ( - echo. - echo WARNING: Running python applications through ./bin/pyspark.cmd is deprecated as of Spark 1.0. - echo Use ./bin/spark-submit ^python file^ - echo. - %FWDIR%\bin\spark-submit.cmd %PYSPARK_SUBMIT_ARGS% -) +set PYTHONSTARTUP=%SPARK_HOME%\python\pyspark\shell.py --- End diff -- OK, good to know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77954145 Sorry to spam the PR with comments about Bash word splitting, @vanzin. I think what you have Bash-wise looks good, but as a matter of consistency I just went through and marked the places where Bash word splitting bugs are theoretically possible. As a habit, I think we should always quote Bash output (whether of variables, subshell, etc.), though I can see if in some cases you feel it would be unnecessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26079243 --- Diff: bin/spark-class --- @@ -110,83 +39,48 @@ else exit 1 fi fi -JAVA_VERSION=$($RUNNER -version 21 | grep 'version' | sed 's/.* version \(.*\)\.\(.*\)\..*/\1\2/; 1q') - -# Set JAVA_OPTS to be able to load native libraries and to set heap size -if [ $JAVA_VERSION -ge 18 ]; then - JAVA_OPTS=$OUR_JAVA_OPTS -else - JAVA_OPTS=-XX:MaxPermSize=128m $OUR_JAVA_OPTS -fi -JAVA_OPTS=$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM - -# Load extra JAVA_OPTS from conf/java-opts, if it exists -if [ -e $SPARK_CONF_DIR/java-opts ] ; then - JAVA_OPTS=$JAVA_OPTS `cat $SPARK_CONF_DIR/java-opts` -fi - -# Attention: when changing the way the JAVA_OPTS are assembled, the change must be reflected in CommandUtils.scala! - -TOOLS_DIR=$FWDIR/tools -SPARK_TOOLS_JAR= -if [ -e $TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then - # Use the JAR from the SBT build - export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar` -fi -if [ -e $TOOLS_DIR/target/spark-tools*[0-9Tg].jar ]; then - # Use the JAR from the Maven build - # TODO: this also needs to become an assembly! - export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/spark-tools*[0-9Tg].jar` -fi -# Compute classpath using external script -classpath_output=$($FWDIR/bin/compute-classpath.sh) -if [[ $? != 0 ]]; then - echo $classpath_output - exit 1 -else - CLASSPATH=$classpath_output -fi +# Look for the launcher. In non-release mode, add the compiled classes directly to the classpath +# instead of looking for a jar file. +SPARK_LAUNCHER_CP= +if [ -f $SPARK_HOME/RELEASE ]; then + LAUNCHER_DIR=$SPARK_HOME/lib + num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc -l) --- End diff -- Programmatically consuming the output of `ls` is [generally discouraged](http://mywiki.wooledge.org/ParsingLs). The previous approach using a loop and globbing, or alternate approaches using `find` are safer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26078786 --- Diff: bin/pyspark2.cmd --- @@ -17,59 +17,22 @@ rem See the License for the specific language governing permissions and rem limitations under the License. rem -set SCALA_VERSION=2.10 - rem Figure out where the Spark framework is installed -set FWDIR=%~dp0..\ - -rem Export this as SPARK_HOME -set SPARK_HOME=%FWDIR% - -rem Test whether the user has built Spark -if exist %FWDIR%RELEASE goto skip_build_test -set FOUND_JAR=0 -for %%d in (%FWDIR%assembly\target\scala-%SCALA_VERSION%\spark-assembly*hadoop*.jar) do ( - set FOUND_JAR=1 -) -if [%FOUND_JAR%] == [0] ( - echo Failed to find Spark assembly JAR. - echo You need to build Spark before running this program. - goto exit -) -:skip_build_test +set SPARK_HOME=%~dp0.. rem Load environment variables from conf\spark-env.cmd, if it exists -if exist %FWDIR%conf\spark-env.cmd call %FWDIR%conf\spark-env.cmd +if exist %SPARK_HOME%\conf\spark-env.cmd call %SPARK_HOME%\conf\spark-env.cmd rem Figure out which Python to use. -if [%PYSPARK_PYTHON%] == [] set PYSPARK_PYTHON=python +if x%PYSPARK_DRIVER_PYTHON%==x ( + set PYSPARK_DRIVER_PYTHON=python + if not [%PYSPARK_PYTHON%] == [] set PYSPARK_DRIVER_PYTHON=%PYSPARK_PYTHON% +) -set PYTHONPATH=%FWDIR%python;%PYTHONPATH% -set PYTHONPATH=%FWDIR%python\lib\py4j-0.8.2.1-src.zip;%PYTHONPATH% +set PYTHONPATH=%SPARK_HOME%\python;%PYTHONPATH% +set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.8.2.1-src.zip;%PYTHONPATH% set OLD_PYTHONSTARTUP=%PYTHONSTARTUP% -set PYTHONSTARTUP=%FWDIR%python\pyspark\shell.py -set PYSPARK_SUBMIT_ARGS=%* - -echo Running %PYSPARK_PYTHON% with PYTHONPATH=%PYTHONPATH% - -rem Check whether the argument is a file -for /f %%i in ('echo %1^| findstr /R \.py') do ( - set PYTHON_FILE=%%i -) - -if [%PYTHON_FILE%] == [] ( - if [%IPYTHON%] == [1] ( - ipython %IPYTHON_OPTS% - ) else ( - %PYSPARK_PYTHON% - ) -) else ( - echo. - echo WARNING: Running python applications through ./bin/pyspark.cmd is deprecated as of Spark 1.0. - echo Use ./bin/spark-submit ^python file^ - echo. - %FWDIR%\bin\spark-submit.cmd %PYSPARK_SUBMIT_ARGS% -) +set PYTHONSTARTUP=%SPARK_HOME%\python\pyspark\shell.py --- End diff -- Does Windows batch script suffer from the same types of word splitting problems that Bash does? For example, if the value of `%SPARK_HOME%` contains a space, will this line still work correctly? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26078620 --- Diff: bin/pyspark --- @@ -18,36 +18,24 @@ # # Figure out where Spark is installed -FWDIR=$(cd `dirname $0`/..; pwd) +export SPARK_HOME=$(cd `dirname $0`/..; pwd) -# Export this as SPARK_HOME -export SPARK_HOME=$FWDIR - -source $FWDIR/bin/utils.sh - -source $FWDIR/bin/load-spark-env.sh +source $SPARK_HOME/bin/load-spark-env.sh function usage() { + if [ -n $1 ]; then +echo $1 + fi echo Usage: ./bin/pyspark [options] 12 - $FWDIR/bin/spark-submit --help 21 | grep -v Usage 12 - exit 0 + $SPARK_HOME/bin/spark-submit --help 21 | grep -v Usage 12 + exit $2 --- End diff -- Similarly, regarding word splitting, I suggest: ``` exit $2 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26079792 --- Diff: bin/spark-class --- @@ -110,83 +39,48 @@ else exit 1 fi fi -JAVA_VERSION=$($RUNNER -version 21 | grep 'version' | sed 's/.* version \(.*\)\.\(.*\)\..*/\1\2/; 1q') - -# Set JAVA_OPTS to be able to load native libraries and to set heap size -if [ $JAVA_VERSION -ge 18 ]; then - JAVA_OPTS=$OUR_JAVA_OPTS -else - JAVA_OPTS=-XX:MaxPermSize=128m $OUR_JAVA_OPTS -fi -JAVA_OPTS=$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM - -# Load extra JAVA_OPTS from conf/java-opts, if it exists -if [ -e $SPARK_CONF_DIR/java-opts ] ; then - JAVA_OPTS=$JAVA_OPTS `cat $SPARK_CONF_DIR/java-opts` -fi - -# Attention: when changing the way the JAVA_OPTS are assembled, the change must be reflected in CommandUtils.scala! - -TOOLS_DIR=$FWDIR/tools -SPARK_TOOLS_JAR= -if [ -e $TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then - # Use the JAR from the SBT build - export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar` -fi -if [ -e $TOOLS_DIR/target/spark-tools*[0-9Tg].jar ]; then - # Use the JAR from the Maven build - # TODO: this also needs to become an assembly! - export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/spark-tools*[0-9Tg].jar` -fi -# Compute classpath using external script -classpath_output=$($FWDIR/bin/compute-classpath.sh) -if [[ $? != 0 ]]; then - echo $classpath_output - exit 1 -else - CLASSPATH=$classpath_output -fi +# Look for the launcher. In non-release mode, add the compiled classes directly to the classpath +# instead of looking for a jar file. +SPARK_LAUNCHER_CP= +if [ -f $SPARK_HOME/RELEASE ]; then + LAUNCHER_DIR=$SPARK_HOME/lib + num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc -l) + if [ $num_jars -eq 0 -a -z $SPARK_LAUNCHER_CP ]; then +echo Failed to find Spark launcher in $LAUNCHER_DIR. 12 +echo You need to build Spark before running this program. 12 +exit 1 + fi -if [[ $1 =~ org.apache.spark.tools.* ]]; then - if test -z $SPARK_TOOLS_JAR; then -echo Failed to find Spark Tools Jar in $FWDIR/tools/target/scala-$SPARK_SCALA_VERSION/ 12 -echo You need to run \build/sbt tools/package\ before running $1. 12 + LAUNCHER_JARS=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ || true) + if [ $num_jars -gt 1 ]; then +echo Found multiple Spark launcher jars in $LAUNCHER_DIR: 12 +echo $LAUNCHER_JARS 12 +echo Please remove all but one jar. 12 exit 1 fi - CLASSPATH=$CLASSPATH:$SPARK_TOOLS_JAR -fi -if $cygwin; then - CLASSPATH=`cygpath -wp $CLASSPATH` - if [ $1 == org.apache.spark.tools.JavaAPICompletenessChecker ]; then -export SPARK_TOOLS_JAR=`cygpath -w $SPARK_TOOLS_JAR` + SPARK_LAUNCHER_CP=${LAUNCHER_DIR}/${LAUNCHER_JARS} +else + LAUNCHER_DIR=$SPARK_HOME/launcher/target/scala-$SPARK_SCALA_VERSION + if [ ! -d $LAUNCHER_DIR/classes ]; then +echo Failed to find Spark launcher classes in $LAUNCHER_DIR. 12 +echo You need to build Spark before running this program. 12 +exit 1 fi + SPARK_LAUNCHER_CP=$LAUNCHER_DIR/classes fi -export CLASSPATH -# In Spark submit client mode, the driver is launched in the same JVM as Spark submit itself. -# Here we must parse the properties file for relevant spark.driver.* configs before launching -# the driver JVM itself. Instead of handling this complexity in Bash, we launch a separate JVM -# to prepare the launch environment of this driver JVM. +# The launcher library will print arguments separated by a NULL character, to allow arguments with +# characters that would be otherwise interpreted by the shell. Read that in a while loop, populating +# an array that will be used to exec the final command. +CMD=() +while IFS= read -d '' -r ARG; do + CMD+=($ARG) +done ($RUNNER -cp $SPARK_LAUNCHER_CP org.apache.spark.launcher.Main $@) --- End diff -- Is `$RUNNER` safe to leave unquoted here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26079694 --- Diff: bin/spark-class --- @@ -110,83 +39,48 @@ else exit 1 fi fi -JAVA_VERSION=$($RUNNER -version 21 | grep 'version' | sed 's/.* version \(.*\)\.\(.*\)\..*/\1\2/; 1q') - -# Set JAVA_OPTS to be able to load native libraries and to set heap size -if [ $JAVA_VERSION -ge 18 ]; then - JAVA_OPTS=$OUR_JAVA_OPTS -else - JAVA_OPTS=-XX:MaxPermSize=128m $OUR_JAVA_OPTS -fi -JAVA_OPTS=$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM - -# Load extra JAVA_OPTS from conf/java-opts, if it exists -if [ -e $SPARK_CONF_DIR/java-opts ] ; then - JAVA_OPTS=$JAVA_OPTS `cat $SPARK_CONF_DIR/java-opts` -fi - -# Attention: when changing the way the JAVA_OPTS are assembled, the change must be reflected in CommandUtils.scala! - -TOOLS_DIR=$FWDIR/tools -SPARK_TOOLS_JAR= -if [ -e $TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then - # Use the JAR from the SBT build - export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar` -fi -if [ -e $TOOLS_DIR/target/spark-tools*[0-9Tg].jar ]; then - # Use the JAR from the Maven build - # TODO: this also needs to become an assembly! - export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/spark-tools*[0-9Tg].jar` -fi -# Compute classpath using external script -classpath_output=$($FWDIR/bin/compute-classpath.sh) -if [[ $? != 0 ]]; then - echo $classpath_output - exit 1 -else - CLASSPATH=$classpath_output -fi +# Look for the launcher. In non-release mode, add the compiled classes directly to the classpath +# instead of looking for a jar file. +SPARK_LAUNCHER_CP= +if [ -f $SPARK_HOME/RELEASE ]; then + LAUNCHER_DIR=$SPARK_HOME/lib + num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc -l) --- End diff -- While I understand what the article in the link you posted says, I fail to understand how it applies to this line. The grep command filters what we're looking for, and if you have things that would trigger the cases described in the article, they'd either be filtered out, or you could say you already have a busted build directory that doesn't match the expected. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26079633 --- Diff: bin/spark-class --- @@ -110,83 +39,48 @@ else exit 1 fi fi -JAVA_VERSION=$($RUNNER -version 21 | grep 'version' | sed 's/.* version \(.*\)\.\(.*\)\..*/\1\2/; 1q') - -# Set JAVA_OPTS to be able to load native libraries and to set heap size -if [ $JAVA_VERSION -ge 18 ]; then - JAVA_OPTS=$OUR_JAVA_OPTS -else - JAVA_OPTS=-XX:MaxPermSize=128m $OUR_JAVA_OPTS -fi -JAVA_OPTS=$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM - -# Load extra JAVA_OPTS from conf/java-opts, if it exists -if [ -e $SPARK_CONF_DIR/java-opts ] ; then - JAVA_OPTS=$JAVA_OPTS `cat $SPARK_CONF_DIR/java-opts` -fi - -# Attention: when changing the way the JAVA_OPTS are assembled, the change must be reflected in CommandUtils.scala! - -TOOLS_DIR=$FWDIR/tools -SPARK_TOOLS_JAR= -if [ -e $TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then - # Use the JAR from the SBT build - export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar` -fi -if [ -e $TOOLS_DIR/target/spark-tools*[0-9Tg].jar ]; then - # Use the JAR from the Maven build - # TODO: this also needs to become an assembly! - export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/spark-tools*[0-9Tg].jar` -fi -# Compute classpath using external script -classpath_output=$($FWDIR/bin/compute-classpath.sh) -if [[ $? != 0 ]]; then - echo $classpath_output - exit 1 -else - CLASSPATH=$classpath_output -fi +# Look for the launcher. In non-release mode, add the compiled classes directly to the classpath +# instead of looking for a jar file. +SPARK_LAUNCHER_CP= +if [ -f $SPARK_HOME/RELEASE ]; then + LAUNCHER_DIR=$SPARK_HOME/lib + num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc -l) + if [ $num_jars -eq 0 -a -z $SPARK_LAUNCHER_CP ]; then +echo Failed to find Spark launcher in $LAUNCHER_DIR. 12 +echo You need to build Spark before running this program. 12 +exit 1 + fi -if [[ $1 =~ org.apache.spark.tools.* ]]; then - if test -z $SPARK_TOOLS_JAR; then -echo Failed to find Spark Tools Jar in $FWDIR/tools/target/scala-$SPARK_SCALA_VERSION/ 12 -echo You need to run \build/sbt tools/package\ before running $1. 12 + LAUNCHER_JARS=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ || true) --- End diff -- Same here. This is definitely a minor point relative to the scope of this patch, but I would prefer if we didn't use `ls` like this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26079837 --- Diff: bin/spark-shell --- @@ -28,25 +28,24 @@ esac # Enter posix mode for bash set -o posix -## Global script variables -FWDIR=$(cd `dirname $0`/..; pwd) +export FWDIR=$(cd `dirname $0`/..; pwd) -function usage() { +usage() { + if [ -n $1 ]; then +echo $1 --- End diff -- Same as above. I'd prefer: ``` echo $1 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26079752 --- Diff: bin/spark-class --- @@ -110,83 +39,48 @@ else exit 1 fi fi -JAVA_VERSION=$($RUNNER -version 21 | grep 'version' | sed 's/.* version \(.*\)\.\(.*\)\..*/\1\2/; 1q') - -# Set JAVA_OPTS to be able to load native libraries and to set heap size -if [ $JAVA_VERSION -ge 18 ]; then - JAVA_OPTS=$OUR_JAVA_OPTS -else - JAVA_OPTS=-XX:MaxPermSize=128m $OUR_JAVA_OPTS -fi -JAVA_OPTS=$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM - -# Load extra JAVA_OPTS from conf/java-opts, if it exists -if [ -e $SPARK_CONF_DIR/java-opts ] ; then - JAVA_OPTS=$JAVA_OPTS `cat $SPARK_CONF_DIR/java-opts` -fi - -# Attention: when changing the way the JAVA_OPTS are assembled, the change must be reflected in CommandUtils.scala! - -TOOLS_DIR=$FWDIR/tools -SPARK_TOOLS_JAR= -if [ -e $TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then - # Use the JAR from the SBT build - export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar` -fi -if [ -e $TOOLS_DIR/target/spark-tools*[0-9Tg].jar ]; then - # Use the JAR from the Maven build - # TODO: this also needs to become an assembly! - export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/spark-tools*[0-9Tg].jar` -fi -# Compute classpath using external script -classpath_output=$($FWDIR/bin/compute-classpath.sh) -if [[ $? != 0 ]]; then - echo $classpath_output - exit 1 -else - CLASSPATH=$classpath_output -fi +# Look for the launcher. In non-release mode, add the compiled classes directly to the classpath +# instead of looking for a jar file. +SPARK_LAUNCHER_CP= +if [ -f $SPARK_HOME/RELEASE ]; then + LAUNCHER_DIR=$SPARK_HOME/lib + num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc -l) + if [ $num_jars -eq 0 -a -z $SPARK_LAUNCHER_CP ]; then +echo Failed to find Spark launcher in $LAUNCHER_DIR. 12 +echo You need to build Spark before running this program. 12 +exit 1 + fi -if [[ $1 =~ org.apache.spark.tools.* ]]; then - if test -z $SPARK_TOOLS_JAR; then -echo Failed to find Spark Tools Jar in $FWDIR/tools/target/scala-$SPARK_SCALA_VERSION/ 12 -echo You need to run \build/sbt tools/package\ before running $1. 12 + LAUNCHER_JARS=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ || true) --- End diff -- Same explanation. The `grep` is there to perform the desired match. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26080472 --- Diff: sbin/spark-daemon.sh --- @@ -121,45 +121,63 @@ if [ $SPARK_NICENESS = ]; then export SPARK_NICENESS=0 fi +run_command() { + mode=$1 + shift -case $option in - - (start|spark-submit) - -mkdir -p $SPARK_PID_DIR + mkdir -p $SPARK_PID_DIR -if [ -f $pid ]; then - TARGET_ID=$(cat $pid) - if [[ $(ps -p $TARGET_ID -o args=) =~ $command ]]; then -echo $command running as process $TARGET_ID. Stop it first. -exit 1 - fi + if [ -f $pid ]; then --- End diff -- ``` if [ -f $pid ]; then ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26081000 --- Diff: bin/spark-class --- @@ -110,83 +39,48 @@ else exit 1 fi fi -JAVA_VERSION=$($RUNNER -version 21 | grep 'version' | sed 's/.* version \(.*\)\.\(.*\)\..*/\1\2/; 1q') - -# Set JAVA_OPTS to be able to load native libraries and to set heap size -if [ $JAVA_VERSION -ge 18 ]; then - JAVA_OPTS=$OUR_JAVA_OPTS -else - JAVA_OPTS=-XX:MaxPermSize=128m $OUR_JAVA_OPTS -fi -JAVA_OPTS=$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM - -# Load extra JAVA_OPTS from conf/java-opts, if it exists -if [ -e $SPARK_CONF_DIR/java-opts ] ; then - JAVA_OPTS=$JAVA_OPTS `cat $SPARK_CONF_DIR/java-opts` -fi - -# Attention: when changing the way the JAVA_OPTS are assembled, the change must be reflected in CommandUtils.scala! - -TOOLS_DIR=$FWDIR/tools -SPARK_TOOLS_JAR= -if [ -e $TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then - # Use the JAR from the SBT build - export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar` -fi -if [ -e $TOOLS_DIR/target/spark-tools*[0-9Tg].jar ]; then - # Use the JAR from the Maven build - # TODO: this also needs to become an assembly! - export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/spark-tools*[0-9Tg].jar` -fi -# Compute classpath using external script -classpath_output=$($FWDIR/bin/compute-classpath.sh) -if [[ $? != 0 ]]; then - echo $classpath_output - exit 1 -else - CLASSPATH=$classpath_output -fi +# Look for the launcher. In non-release mode, add the compiled classes directly to the classpath +# instead of looking for a jar file. +SPARK_LAUNCHER_CP= +if [ -f $SPARK_HOME/RELEASE ]; then + LAUNCHER_DIR=$SPARK_HOME/lib + num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc -l) + if [ $num_jars -eq 0 -a -z $SPARK_LAUNCHER_CP ]; then +echo Failed to find Spark launcher in $LAUNCHER_DIR. 12 +echo You need to build Spark before running this program. 12 +exit 1 + fi -if [[ $1 =~ org.apache.spark.tools.* ]]; then - if test -z $SPARK_TOOLS_JAR; then -echo Failed to find Spark Tools Jar in $FWDIR/tools/target/scala-$SPARK_SCALA_VERSION/ 12 -echo You need to run \build/sbt tools/package\ before running $1. 12 + LAUNCHER_JARS=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ || true) --- End diff -- Agreed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26080991 --- Diff: bin/spark-class --- @@ -110,83 +39,48 @@ else exit 1 fi fi -JAVA_VERSION=$($RUNNER -version 21 | grep 'version' | sed 's/.* version \(.*\)\.\(.*\)\..*/\1\2/; 1q') - -# Set JAVA_OPTS to be able to load native libraries and to set heap size -if [ $JAVA_VERSION -ge 18 ]; then - JAVA_OPTS=$OUR_JAVA_OPTS -else - JAVA_OPTS=-XX:MaxPermSize=128m $OUR_JAVA_OPTS -fi -JAVA_OPTS=$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM - -# Load extra JAVA_OPTS from conf/java-opts, if it exists -if [ -e $SPARK_CONF_DIR/java-opts ] ; then - JAVA_OPTS=$JAVA_OPTS `cat $SPARK_CONF_DIR/java-opts` -fi - -# Attention: when changing the way the JAVA_OPTS are assembled, the change must be reflected in CommandUtils.scala! - -TOOLS_DIR=$FWDIR/tools -SPARK_TOOLS_JAR= -if [ -e $TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then - # Use the JAR from the SBT build - export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar` -fi -if [ -e $TOOLS_DIR/target/spark-tools*[0-9Tg].jar ]; then - # Use the JAR from the Maven build - # TODO: this also needs to become an assembly! - export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/spark-tools*[0-9Tg].jar` -fi -# Compute classpath using external script -classpath_output=$($FWDIR/bin/compute-classpath.sh) -if [[ $? != 0 ]]; then - echo $classpath_output - exit 1 -else - CLASSPATH=$classpath_output -fi +# Look for the launcher. In non-release mode, add the compiled classes directly to the classpath +# instead of looking for a jar file. +SPARK_LAUNCHER_CP= +if [ -f $SPARK_HOME/RELEASE ]; then + LAUNCHER_DIR=$SPARK_HOME/lib + num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc -l) --- End diff -- That makes sense. Using `ls` in a general context and using it here where the world is very limited in what we expect are different things, so I agree. It's fine to leave things at busted build directory if things really go wrong with `ls`'s output. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26079852 --- Diff: bin/spark-shell --- @@ -28,25 +28,24 @@ esac # Enter posix mode for bash set -o posix -## Global script variables -FWDIR=$(cd `dirname $0`/..; pwd) +export FWDIR=$(cd `dirname $0`/..; pwd) -function usage() { +usage() { + if [ -n $1 ]; then +echo $1 + fi echo Usage: ./bin/spark-shell [options] $FWDIR/bin/spark-submit --help 21 | grep -v Usage 12 - exit 0 + exit $2 --- End diff -- ``` exit $2 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26081238 --- Diff: sbin/start-thriftserver.sh --- @@ -52,4 +52,4 @@ fi export SUBMIT_USAGE_FUNCTION=usage -exec $FWDIR/sbin/spark-daemon.sh spark-submit $CLASS 1 $@ +exec $FWDIR/sbin/spark-daemon.sh submit $CLASS 1 $@ --- End diff -- ``` $CLASS ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26078579 --- Diff: bin/pyspark --- @@ -18,36 +18,24 @@ # # Figure out where Spark is installed -FWDIR=$(cd `dirname $0`/..; pwd) +export SPARK_HOME=$(cd `dirname $0`/..; pwd) -# Export this as SPARK_HOME -export SPARK_HOME=$FWDIR - -source $FWDIR/bin/utils.sh - -source $FWDIR/bin/load-spark-env.sh +source $SPARK_HOME/bin/load-spark-env.sh function usage() { + if [ -n $1 ]; then +echo $1 --- End diff -- Word splitting with unintended side effects is possible here. I suggest changing this to: ``` echo $1 ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26079464 --- Diff: bin/pyspark2.cmd --- @@ -17,59 +17,22 @@ rem See the License for the specific language governing permissions and rem limitations under the License. rem -set SCALA_VERSION=2.10 - rem Figure out where the Spark framework is installed -set FWDIR=%~dp0..\ - -rem Export this as SPARK_HOME -set SPARK_HOME=%FWDIR% - -rem Test whether the user has built Spark -if exist %FWDIR%RELEASE goto skip_build_test -set FOUND_JAR=0 -for %%d in (%FWDIR%assembly\target\scala-%SCALA_VERSION%\spark-assembly*hadoop*.jar) do ( - set FOUND_JAR=1 -) -if [%FOUND_JAR%] == [0] ( - echo Failed to find Spark assembly JAR. - echo You need to build Spark before running this program. - goto exit -) -:skip_build_test +set SPARK_HOME=%~dp0.. rem Load environment variables from conf\spark-env.cmd, if it exists -if exist %FWDIR%conf\spark-env.cmd call %FWDIR%conf\spark-env.cmd +if exist %SPARK_HOME%\conf\spark-env.cmd call %SPARK_HOME%\conf\spark-env.cmd rem Figure out which Python to use. -if [%PYSPARK_PYTHON%] == [] set PYSPARK_PYTHON=python +if x%PYSPARK_DRIVER_PYTHON%==x ( + set PYSPARK_DRIVER_PYTHON=python + if not [%PYSPARK_PYTHON%] == [] set PYSPARK_DRIVER_PYTHON=%PYSPARK_PYTHON% +) -set PYTHONPATH=%FWDIR%python;%PYTHONPATH% -set PYTHONPATH=%FWDIR%python\lib\py4j-0.8.2.1-src.zip;%PYTHONPATH% +set PYTHONPATH=%SPARK_HOME%\python;%PYTHONPATH% +set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.8.2.1-src.zip;%PYTHONPATH% set OLD_PYTHONSTARTUP=%PYTHONSTARTUP% -set PYTHONSTARTUP=%FWDIR%python\pyspark\shell.py -set PYSPARK_SUBMIT_ARGS=%* - -echo Running %PYSPARK_PYTHON% with PYTHONPATH=%PYTHONPATH% - -rem Check whether the argument is a file -for /f %%i in ('echo %1^| findstr /R \.py') do ( - set PYTHON_FILE=%%i -) - -if [%PYTHON_FILE%] == [] ( - if [%IPYTHON%] == [1] ( - ipython %IPYTHON_OPTS% - ) else ( - %PYSPARK_PYTHON% - ) -) else ( - echo. - echo WARNING: Running python applications through ./bin/pyspark.cmd is deprecated as of Spark 1.0. - echo Use ./bin/spark-submit ^python file^ - echo. - %FWDIR%\bin\spark-submit.cmd %PYSPARK_SUBMIT_ARGS% -) +set PYTHONSTARTUP=%SPARK_HOME%\python\pyspark\shell.py --- End diff -- Windows has all sorts of weird rules about spaces, they have absolutely no correlation to what happens in the bash world. This line works correctly as is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26081219 --- Diff: sbin/spark-daemon.sh --- @@ -121,45 +121,63 @@ if [ $SPARK_NICENESS = ]; then export SPARK_NICENESS=0 fi +run_command() { + mode=$1 + shift -case $option in - - (start|spark-submit) - -mkdir -p $SPARK_PID_DIR + mkdir -p $SPARK_PID_DIR -if [ -f $pid ]; then - TARGET_ID=$(cat $pid) - if [[ $(ps -p $TARGET_ID -o args=) =~ $command ]]; then -echo $command running as process $TARGET_ID. Stop it first. -exit 1 - fi + if [ -f $pid ]; then +TARGET_ID=$(cat $pid) +if [[ $(ps -p $TARGET_ID -o args=) =~ $command ]]; then + echo $command running as process $TARGET_ID. Stop it first. + exit 1 fi + fi -if [ $SPARK_MASTER != ]; then - echo rsync from $SPARK_MASTER - rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $SPARK_MASTER/ $SPARK_HOME -fi + if [ $SPARK_MASTER != ]; then +echo rsync from $SPARK_MASTER +rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' --exclude='contrib/hod/logs/*' $SPARK_MASTER/ $SPARK_HOME + fi -spark_rotate_log $log -echo starting $command, logging to $log -if [ $option == spark-submit ]; then - source $SPARK_HOME/bin/utils.sh - gatherSparkSubmitOpts $@ - nohup nice -n $SPARK_NICENESS $SPARK_PREFIX/bin/spark-submit --class $command \ -${SUBMISSION_OPTS[@]} spark-internal ${APPLICATION_OPTS[@]} $log 21 /dev/null -else + spark_rotate_log $log + echo starting $command, logging to $log + + case $mode in +(class) nohup nice -n $SPARK_NICENESS $SPARK_PREFIX/bin/spark-class $command $@ $log 21 /dev/null -fi -newpid=$! -echo $newpid $pid -sleep 2 -# Check if the process has died; in that case we'll tail the log so the user can see -if [[ ! $(ps -p $newpid -o args=) =~ $command ]]; then - echo failed to launch $command: - tail -2 $log | sed 's/^/ /' - echo full log in $log -fi + newpid=$! + ;; + +(submit) + nohup nice -n $SPARK_NICENESS $SPARK_PREFIX/bin/spark-submit --class $command $@ $log 21 /dev/null --- End diff -- For consistency, and out of paranoia, I suggest for this block: ``` $SPARK_NICENESS $! $newpid $pid ``` And so forth. I know most of these variables are never supposed to contain a space, but this counts as good Bash hygiene, as annoying as it is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77936105 [Test build #28401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28401/consoleFull) for PR 3916 at commit [`897141f`](https://github.com/apache/spark/commit/897141fa7bc521f95f42729fca89467b46fbf412). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77935980 [Test build #28401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28401/consoleFull) for PR 3916 at commit [`897141f`](https://github.com/apache/spark/commit/897141fa7bc521f95f42729fca89467b46fbf412). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77937027 [Test build #28403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28403/consoleFull) for PR 3916 at commit [`a1b8af1`](https://github.com/apache/spark/commit/a1b8af1830fc9fada698098a235691e12bd519dd). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77937171 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28403/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77937990 [Test build #28404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28404/consoleFull) for PR 3916 at commit [`3b28a75`](https://github.com/apache/spark/commit/3b28a7537ed184c88994be176fa1c66d47b43a8e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26065985 --- Diff: launcher/src/main/java/org/apache/spark/launcher/Main.java --- @@ -0,0 +1,173 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import static org.apache.spark.launcher.CommandBuilderUtils.*; + +/** + * Command line interface for the Spark launcher. Used internally by Spark scripts. + */ +class Main { + + /** + * Usage: Main [class] [class args] + * p/ + * This CLI works in two different modes: + * ul + * lispark-submit: if iclass/i is org.apache.spark.deploy.SparkSubmit, the + * {@link SparkLauncher} class is used to launch a Spark application./li + * lispark-class: if another class is provided, an internal Spark class is run./li + * /ul + * + * This class works in tandem with the bin/spark-class script on Unix-like systems, and + * bin/spark-class2.cmd batch script on Windows to execute the final command. + * p/ + * On Unix-like systems, the output is a list of command arguments, separated by the NULL + * character. On Windows, the output is single command line suitable for direct execution + * form the script. --- End diff -- from --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77936109 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28401/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26067056 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java --- @@ -0,0 +1,327 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; + +import static org.apache.spark.launcher.CommandBuilderUtils.*; + +/** + * Special launcher for handling a CLI invocation of SparkSubmit. + * p/ + * This launcher extends SparkLauncher to add command line parsing compatible with --- End diff -- I'm not sure this comment is correct. I think a `SparkLauncher` encapsulates a `SparkSubmitCommandBuilder` rather than the latter extending the former. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r26067102 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java --- @@ -0,0 +1,327 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; + +import static org.apache.spark.launcher.CommandBuilderUtils.*; + +/** + * Special launcher for handling a CLI invocation of SparkSubmit. --- End diff -- I don't think this is still a sub type of `Launcher` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77937167 [Test build #28403 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28403/consoleFull) for PR 3916 at commit [`a1b8af1`](https://github.com/apache/spark/commit/a1b8af1830fc9fada698098a235691e12bd519dd). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77970428 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28407/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77970418 [Test build #28407 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28407/consoleFull) for PR 3916 at commit [`2ce741f`](https://github.com/apache/spark/commit/2ce741ff59c17f64ffea1a0fd91eb922e4e87204). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77958210 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28404/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77957236 [Test build #28407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28407/consoleFull) for PR 3916 at commit [`2ce741f`](https://github.com/apache/spark/commit/2ce741ff59c17f64ffea1a0fd91eb922e4e87204). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77958202 [Test build #28404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28404/consoleFull) for PR 3916 at commit [`3b28a75`](https://github.com/apache/spark/commit/3b28a7537ed184c88994be176fa1c66d47b43a8e). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-77180148 FYI: There is [a question on SO](http://stackoverflow.com/q/28841940/877069), I believe, about the type of functionality being added in this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76849555 [Test build #28200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28200/consoleFull) for PR 3916 at commit [`28cd35e`](https://github.com/apache/spark/commit/28cd35eb7d7c065f679ef4749e599ad4d31ee5cf). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76875689 [Test build #28205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28205/consoleFull) for PR 3916 at commit [`e2367d2`](https://github.com/apache/spark/commit/e2367d26eab2601e5256da2169ea6ad0c12d959d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76875697 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28205/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76864166 [Test build #28205 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28205/consoleFull) for PR 3916 at commit [`e2367d2`](https://github.com/apache/spark/commit/e2367d26eab2601e5256da2169ea6ad0c12d959d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76862929 [Test build #28200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28200/consoleFull) for PR 3916 at commit [`28cd35e`](https://github.com/apache/spark/commit/28cd35eb7d7c065f679ef4749e599ad4d31ee5cf). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76862941 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28200/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76293224 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28016/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76188802 Sorry I haven't had time to look at this in detail, one thing I was looking for though was some documentation for the high level api. I realize java doc will document most of it, but are we going to add some documentation to say this is available and point users to javadoc? Perhaps to deploy guide. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76230854 I'm ok with it being separate, although in general I think its bad practice. I've seen some features go in without documentation which I think as a user is a major pain. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76227779 @tgravescs that can definitely be added, but I'd rather do it separately at this point. Also, regarding javadoc, I'm not sure how it's generally published for Spark: the unidocs generated from java classes seem to all be empty (while scala documentation is fine, and separately the generated javadocs are also fine). Is the unidoc stuff broken for java code? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76275450 [Test build #28016 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28016/consoleFull) for PR 3916 at commit [`5f4ddcc`](https://github.com/apache/spark/commit/5f4ddccbcd15e712cbd588ee958c6302ce95e352). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76276321 [Test build #28017 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28017/consoleFull) for PR 3916 at commit [`00505f9`](https://github.com/apache/spark/commit/00505f99ca5b1a7ee8d7955bde54f7c1fcf27689). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76278354 [Test build #28018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28018/consoleFull) for PR 3916 at commit [`b1d86b0`](https://github.com/apache/spark/commit/b1d86b00fba76d3e700fc5e8319f071139d3156d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76296105 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28017/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76296090 [Test build #28017 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28017/consoleFull) for PR 3916 at commit [`00505f9`](https://github.com/apache/spark/commit/00505f99ca5b1a7ee8d7955bde54f7c1fcf27689). * This patch **passes all tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76296505 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28018/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76296499 [Test build #28018 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28018/consoleFull) for PR 3916 at commit [`b1d86b0`](https://github.com/apache/spark/commit/b1d86b00fba76d3e700fc5e8319f071139d3156d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76293215 [Test build #28016 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28016/consoleFull) for PR 3916 at commit [`5f4ddcc`](https://github.com/apache/spark/commit/5f4ddccbcd15e712cbd588ee958c6302ce95e352). * This patch **passes all tests**. * This patch **does not merge cleanly**. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76054123 [Test build #27958 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27958/consoleFull) for PR 3916 at commit [`7e66c18`](https://github.com/apache/spark/commit/7e66c18bad6e672c60ec995bd968b3a7db32ee73). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25394351 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitOptionParser.java --- @@ -0,0 +1,226 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +/** + * Parser for spark-submit command line options. + * /p + * This class, although public, is not designed to be used outside of Spark. --- End diff -- wait, is it still public? I thought it's package private now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25396504 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java --- @@ -0,0 +1,335 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; + +import static org.apache.spark.launcher.CommandBuilderUtils.*; + +/** + * Special launcher for handling a CLI invocation of SparkSubmit. + * p/ + * This launcher extends SparkLauncher to add command line parsing compatible with + * SparkSubmit. It handles setting driver-side options and special parsing needed + * for the different specialClasses. + * p/ + * This class has also some special features to aid PySparkLauncher. + */ +class SparkSubmitCommandBuilder extends AbstractCommandBuilder { + + /** + * Name of the app resource used to identify the PySpark shell. The command line parser expects + * the resource name to be the very first argument to spark-submit in this case. + * + * NOTE: this cannot be pyspark-shell since that identifies the PySpark shell to SparkSubmit + * (see java_gateway.py), and can cause this code to enter into an infinite loop. + */ + static final String PYSPARK_SHELL = pyspark-shell-main; + + /** + * This is the actual resource name that identifies the PySpark shell to SparkSubmit. + */ + static final String PYSPARK_SHELL_RESOURCE = pyspark-shell; + + /** + * This map must match the class names for available special classes, since this modifies the way + * command line parsing works. This maps the class name to the resource to use when calling + * spark-submit. + */ + private static final MapString, String specialClasses = new HashMapString, String(); + static { +specialClasses.put(org.apache.spark.repl.Main, spark-shell); + specialClasses.put(org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver, + spark-internal); + } + + private final ListString sparkArgs; + private boolean hasMixedArguments; + + SparkSubmitCommandBuilder() { +this.sparkArgs = new ArrayListString(); + } + + SparkSubmitCommandBuilder(ListString args) { +this(false, args); + } + + SparkSubmitCommandBuilder(boolean hasMixedArguments, ListString args) { +this(); +ListString submitArgs = args; +if (args.size() 0 args.get(0).equals(PYSPARK_SHELL)) { + this.hasMixedArguments = true; + appResource = PYSPARK_SHELL_RESOURCE; + submitArgs = args.subList(1, args.size()); +} else { + this.hasMixedArguments = hasMixedArguments; +} + +new OptionParser().parse(submitArgs); + } + + @Override + public ListString buildCommand(MapString, String env) throws IOException { +if (PYSPARK_SHELL_RESOURCE.equals(appResource)) { + return buildPySparkShellCommand(env); +} else { + return buildSparkSubmitCommand(env); +} + } + + ListString buildSparkSubmitArgs() { +ListString args = new ArrayListString(); +SparkSubmitOptionParser parser = new SparkSubmitOptionParser(); + +if (verbose) { + args.add(parser.VERBOSE); +} + +if (master != null) { + args.add(parser.MASTER); + args.add(master); +} + +if (deployMode != null) { + args.add(parser.DEPLOY_MODE); + args.add(deployMode); +} + +if (appName != null) { + args.add(parser.NAME); + args.add(appName); +} + +for (Map.EntryString, String e : conf.entrySet()) { +
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25400181 --- Diff: launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java --- @@ -0,0 +1,359 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.BufferedReader; +import java.io.File; +import java.io.FileFilter; +import java.io.FileInputStream; +import java.io.InputStreamReader; +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; +import java.util.jar.JarFile; +import java.util.regex.Pattern; + +import static org.apache.spark.launcher.CommandBuilderUtils.*; + +/** + * Abstract command builder that defines common functionality for all builders. + */ +abstract class AbstractCommandBuilder { + + boolean verbose; + String appName; + String appResource; + String deployMode; + String javaHome; + String mainClass; + String master; + String propertiesFile; + final ListString appArgs; + final ListString jars; + final ListString files; + final ListString pyFiles; + final MapString, String childEnv; + final MapString, String conf; + + public AbstractCommandBuilder() { +this(Collections.String, StringemptyMap()); + } + + public AbstractCommandBuilder(MapString, String env) { +this.appArgs = new ArrayListString(); +this.childEnv = new HashMapString, String(env); +this.conf = new HashMapString, String(); +this.files = new ArrayListString(); +this.jars = new ArrayListString(); +this.pyFiles = new ArrayListString(); + } + + /** + * Builds the command like to execute. + * + * @param env A map containing environment variables for the child process. It may already contain + *entries defined by the user (such as SPARK_HOME, or those defined by the + *SparkLauncher constructor that takes an environment), and may be modified to + *include other variables needed by the process to be executed. + */ + abstract ListString buildCommand(MapString, String env) throws IOException; + + ListString buildJavaCommand(String extraClassPath) throws IOException { +ListString cmd = new ArrayListString(); +if (javaHome == null) { + cmd.add(join(File.separator, System.getProperty(java.home), bin, java)); +} else { + cmd.add(join(File.separator, javaHome, bin, java)); +} + +// Load extra JAVA_OPTS from conf/java-opts, if it exists. +File javaOpts = new File(join(File.separator, getConfDir(), java-opts)); +if (javaOpts.isFile()) { + BufferedReader br = new BufferedReader(new InputStreamReader( + new FileInputStream(javaOpts), UTF-8)); + try { +String line; +while ((line = br.readLine()) != null) { + addOptionString(cmd, line); +} + } finally { +br.close(); + } +} + +cmd.add(-cp); +cmd.add(join(File.pathSeparator, buildClassPath(extraClassPath))); +return cmd; + } + + /** + * Adds the default perm gen size option for Spark if the VM requires it and the user hasn't + * set it. + */ + void addPermGenSizeOpt(ListString cmd) { +// Don't set MaxPermSize for Java 8 and later. +String[] version = System.getProperty(java.version).split(\\.); +if (Integer.parseInt(version[0]) 1 || Integer.parseInt(version[1]) 7) { + return; +} + +for (String arg : cmd) { + if (arg.startsWith(-XX:MaxPermSize=)) { +return; + } +} + +cmd.add(-XX:MaxPermSize=128m); + }
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25400174 --- Diff: launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java --- @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.File; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +/** + * Helper methods for command builders. + */ +class CommandBuilderUtils { + + static final String DEFAULT_MEM = 512m; + static final String DEFAULT_PROPERTIES_FILE = spark-defaults.conf; + static final String ENV_SPARK_HOME = SPARK_HOME; + + /** Returns whether the given string is null or empty. */ + static boolean isEmpty(String s) { +return s == null || s.isEmpty(); + } + + /** Joins a list of strings using the given separator. */ + static String join(String sep, String... elements) { +StringBuilder sb = new StringBuilder(); +for (String e : elements) { --- End diff -- I see. I would just call it on `Arrays.asList(elements)` to avoid the duplicate code. Not a big deal if you leave this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25400811 --- Diff: bin/pyspark --- @@ -95,26 +80,13 @@ export PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH # Load the PySpark shell.py script when ./pyspark is used interactively: export OLD_PYTHONSTARTUP=$PYTHONSTARTUP -export PYTHONSTARTUP=$FWDIR/python/pyspark/shell.py - -# Build up arguments list manually to preserve quotes and backslashes. -# We export Spark submit arguments as an environment variable because shell.py must run as a -# PYTHONSTARTUP script, which does not take in arguments. This is required for IPython notebooks. -SUBMIT_USAGE_FUNCTION=usage -gatherSparkSubmitOpts $@ -PYSPARK_SUBMIT_ARGS= -whitespace=[[:space:]] -for i in ${SUBMISSION_OPTS[@]}; do - if [[ $i =~ \ ]]; then i=$(echo $i | sed 's/\/\\\/g'); fi - if [[ $i =~ $whitespace ]]; then i=\$i\; fi - PYSPARK_SUBMIT_ARGS=$PYSPARK_SUBMIT_ARGS $i -done -export PYSPARK_SUBMIT_ARGS +export PYTHONSTARTUP=$SPARK_HOME/python/pyspark/shell.py # For pyspark tests if [[ -n $SPARK_TESTING ]]; then unset YARN_CONF_DIR unset HADOOP_CONF_DIR + export PYSPARK_SUBMIT_ARGS=pyspark-shell --- End diff -- why do we need to do this? Don't we just overwrite this in `SparkSubmitCommandBuilder` anyway? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76114272 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76114251 @vanzin @pwendell As of the latest changes I think this PR is almost ready to be merged. An obvious blocker is that the `SparkLauncher` needs to also handle Windows, but we can fix that easily. One bigger thing I wanted to clarify is the following: The scope of the public API this PR currently exposes is narrowed to the extent that `SparkLauncher` calls a particular distribution's `spark-submit`. IIUC the only backward compatibility requirement this entails is that we don't remove or change any of the fields that the `SparkLauncher` sets (e.g. version, master, app name). Is this correct? This seems like a narrow enough interface to keep unchanged in future versions of Spark such that merging this will be safe. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76115100 [Test build #27971 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27971/consoleFull) for PR 3916 at commit [`4c19196`](https://github.com/apache/spark/commit/4c19196f495813dc2dcd6116b5831932ec406803). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76115110 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27971/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76107021 [Test build #27972 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27972/consoleFull) for PR 3916 at commit [`6184c07`](https://github.com/apache/spark/commit/6184c07d1f7ddb9bf74e898155f28340b7a33f53). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25399842 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java --- @@ -0,0 +1,335 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; + +import static org.apache.spark.launcher.CommandBuilderUtils.*; + +/** + * Special launcher for handling a CLI invocation of SparkSubmit. + * p/ + * This launcher extends SparkLauncher to add command line parsing compatible with + * SparkSubmit. It handles setting driver-side options and special parsing needed + * for the different specialClasses. + * p/ + * This class has also some special features to aid PySparkLauncher. + */ +class SparkSubmitCommandBuilder extends AbstractCommandBuilder { + + /** + * Name of the app resource used to identify the PySpark shell. The command line parser expects + * the resource name to be the very first argument to spark-submit in this case. + * + * NOTE: this cannot be pyspark-shell since that identifies the PySpark shell to SparkSubmit + * (see java_gateway.py), and can cause this code to enter into an infinite loop. + */ + static final String PYSPARK_SHELL = pyspark-shell-main; --- End diff -- I wonder whether we should just stop supporting running python applications through `bin/pyspark`. It's been deprecated since 1.0 and that will already have been 4 minor releases by the time this is merged. If we just stop supporting this then we can simplify this convoluted logic a great deal. Maybe @pwendell has an opinion one way or the other. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25400927 --- Diff: core/src/main/scala/org/apache/spark/launcher/WorkerCommandBuilder.scala --- @@ -0,0 +1,50 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher + +import java.io.File +import java.util.{HashMap = JHashMap, List = JList, Map = JMap} + +import scala.collection.JavaConversions._ + +import org.apache.spark.deploy.Command + +/** + * This class is used by CommandUtils. It uses some package-private APIs in SparkLauncher, and since + * Java doesn't have a feature similar to `private[spark]`, and we don't want that class to be + * public, needs to live in the same package as the rest of the library. + */ +private[spark] class WorkerCommandBuilder(sparkHome: String, memoryMb: Int, command: Command) +extends AbstractCommandBuilder { --- End diff -- unindent --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25401107 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java --- @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.File; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import static org.apache.spark.launcher.CommandBuilderUtils.*; + +/** + * Launcher for Spark applications. + * p/ + * Use this class to start Spark applications programmatically. The class uses a builder pattern + * to allow clients to configure the Spark application and launch it as a child process. + * p/ + * Note that launching Spark applications using this class will not automatically load environment + * variables from the spark-env.sh or spark-env.cmd scripts in the configuration directory. + */ +public class SparkLauncher { + + /** The Spark master. */ + public static final String SPARK_MASTER = spark.master; + + /** Configuration key for the driver memory. */ + public static final String DRIVER_MEMORY = spark.driver.memory; + /** Configuration key for the driver class path. */ + public static final String DRIVER_EXTRA_CLASSPATH = spark.driver.extraClassPath; + /** Configuration key for the driver VM options. */ + public static final String DRIVER_EXTRA_JAVA_OPTIONS = spark.driver.extraJavaOptions; + /** Configuration key for the driver native library path. */ + public static final String DRIVER_EXTRA_LIBRARY_PATH = spark.driver.extraLibraryPath; + + /** Configuration key for the executor memory. */ + public static final String EXECUTOR_MEMORY = spark.executor.memory; + /** Configuration key for the executor class path. */ + public static final String EXECUTOR_EXTRA_CLASSPATH = spark.executor.extraClassPath; + /** Configuration key for the executor VM options. */ + public static final String EXECUTOR_EXTRA_JAVA_OPTIONS = spark.executor.extraJavaOptions; + /** Configuration key for the executor native library path. */ + public static final String EXECUTOR_EXTRA_LIBRARY_PATH = spark.executor.extraLibraryOptions; + /** Configuration key for the number of executor CPU cores. */ + public static final String EXECUTOR_CORES = spark.executor.cores; + + private final SparkSubmitCommandBuilder builder; + + public SparkLauncher() { +this(null); + } + + /** + * Creates a launcher that will set the given environment variables in the child. + * + * @param env Environment variables to set. + */ + public SparkLauncher(MapString, String env) { +this.builder = new SparkSubmitCommandBuilder(); +if (env != null) { + this.builder.childEnv.putAll(env); +} + } + + /** + * Set a custom JAVA_HOME for launching the Spark application. + * + * @param javaHome Path to the JAVA_HOME to use. + * @return This launcher. + */ + public SparkLauncher setJavaHome(String javaHome) { +checkNotNull(javaHome, javaHome); +builder.javaHome = javaHome; +return this; + } + + /** + * Set a custom Spark installation location for the application. + * + * @param sparkHome Path to the Spark installation to use. + * @return This launcher. + */ + public SparkLauncher setSparkHome(String sparkHome) { +checkNotNull(sparkHome, sparkHome); +builder.childEnv.put(ENV_SPARK_HOME, sparkHome); +return this; + } + + /** + * Set a custom properties file with Spark configuration for the application. + * + * @param path Path to custom properties file to use. + * @return This launcher. + */ + public SparkLauncher setPropertiesFile(String path) { +checkNotNull(path, path); +
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76113760 [Test build #27972 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27972/consoleFull) for PR 3916 at commit [`6184c07`](https://github.com/apache/spark/commit/6184c07d1f7ddb9bf74e898155f28340b7a33f53). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76113766 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27972/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76114901 `SparkLauncher` works on Windows. Did you see anything that indicates otherwise? I even managed to get all unit tests under launcher/ to pass on Windows (although I didn't test after the latest changes). The only things that need to be kept for compatibility going forward are: - SparkLauncher's public API. You can change the internal fields all you want. - spark-submit's public API. It needs to keep accepting command lines the way it does today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25399597 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java --- @@ -0,0 +1,335 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; + +import static org.apache.spark.launcher.CommandBuilderUtils.*; + +/** + * Special launcher for handling a CLI invocation of SparkSubmit. + * p/ + * This launcher extends SparkLauncher to add command line parsing compatible with + * SparkSubmit. It handles setting driver-side options and special parsing needed + * for the different specialClasses. + * p/ + * This class has also some special features to aid PySparkLauncher. + */ +class SparkSubmitCommandBuilder extends AbstractCommandBuilder { + + /** + * Name of the app resource used to identify the PySpark shell. The command line parser expects + * the resource name to be the very first argument to spark-submit in this case. + * + * NOTE: this cannot be pyspark-shell since that identifies the PySpark shell to SparkSubmit + * (see java_gateway.py), and can cause this code to enter into an infinite loop. + */ + static final String PYSPARK_SHELL = pyspark-shell-main; + + /** + * This is the actual resource name that identifies the PySpark shell to SparkSubmit. + */ + static final String PYSPARK_SHELL_RESOURCE = pyspark-shell; + + /** + * This map must match the class names for available special classes, since this modifies the way + * command line parsing works. This maps the class name to the resource to use when calling + * spark-submit. + */ + private static final MapString, String specialClasses = new HashMapString, String(); + static { +specialClasses.put(org.apache.spark.repl.Main, spark-shell); + specialClasses.put(org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver, + spark-internal); + } + + private final ListString sparkArgs; + private boolean hasMixedArguments; + + SparkSubmitCommandBuilder() { +this.sparkArgs = new ArrayListString(); + } + + SparkSubmitCommandBuilder(ListString args) { +this(false, args); + } + + SparkSubmitCommandBuilder(boolean hasMixedArguments, ListString args) { +this(); +ListString submitArgs = args; +if (args.size() 0 args.get(0).equals(PYSPARK_SHELL)) { + this.hasMixedArguments = true; + appResource = PYSPARK_SHELL_RESOURCE; + submitArgs = args.subList(1, args.size()); +} else { + this.hasMixedArguments = hasMixedArguments; +} + +new OptionParser().parse(submitArgs); + } + + @Override + public ListString buildCommand(MapString, String env) throws IOException { +if (PYSPARK_SHELL_RESOURCE.equals(appResource)) { + return buildPySparkShellCommand(env); +} else { + return buildSparkSubmitCommand(env); +} + } + + ListString buildSparkSubmitArgs() { +ListString args = new ArrayListString(); +SparkSubmitOptionParser parser = new SparkSubmitOptionParser(); + +if (verbose) { + args.add(parser.VERBOSE); +} + +if (master != null) { + args.add(parser.MASTER); + args.add(master); +} + +if (deployMode != null) { + args.add(parser.DEPLOY_MODE); + args.add(deployMode); +} + +if (appName != null) { + args.add(parser.NAME); + args.add(appName); +} + +for (Map.EntryString, String e : conf.entrySet()) { +
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25400090 --- Diff: launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java --- @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.File; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +/** + * Helper methods for command builders. + */ +class CommandBuilderUtils { + + static final String DEFAULT_MEM = 512m; + static final String DEFAULT_PROPERTIES_FILE = spark-defaults.conf; + static final String ENV_SPARK_HOME = SPARK_HOME; + + /** Returns whether the given string is null or empty. */ + static boolean isEmpty(String s) { +return s == null || s.isEmpty(); + } + + /** Joins a list of strings using the given separator. */ + static String join(String sep, String... elements) { +StringBuilder sb = new StringBuilder(); +for (String e : elements) { + if (e != null) { +if (sb.length() 0) { + sb.append(sep); +} +sb.append(e); + } +} +return sb.toString(); + } + + /** Joins a list of strings using the given separator. */ + static String join(String sep, IterableString elements) { +StringBuilder sb = new StringBuilder(); +for (String e : elements) { + if (e != null) { +if (sb.length() 0) { + sb.append(sep); +} +sb.append(e); + } +} +return sb.toString(); + } + + /** Returns the first value mapped to the given key in the given maps. */ + static String find(String key, Map?, ?... maps) { +for (Map?, ? map : maps) { + String value = (String) map.get(key); + if (!isEmpty(value)) { +return value; + } +} +return null; + } + + /** Returns the first non-empty, non-null string in the given list. */ + static String firstNonEmpty(String... candidates) { +for (String s : candidates) { + if (!isEmpty(s)) { +return s; + } +} +return null; + } + + /** Returns the name of the env variable that holds the native library path. */ + static String getLibPathEnvName() { +if (isWindows()) { + return PATH; +} + +String os = System.getProperty(os.name); +if (os.startsWith(Mac OS X)) { + return DYLD_LIBRARY_PATH; +} else { + return LD_LIBRARY_PATH; +} + } + + /** Returns whether the OS is Windows. */ + static boolean isWindows() { +String os = System.getProperty(os.name); +return os.startsWith(Windows); + } + + /** + * Updates the user environment to contain the merged value of envKey after appending + * the given path list. + */ + static void mergeEnvPathList(MapString, String userEnv, String envKey, String pathList) { +if (!isEmpty(pathList)) { + String current = firstNonEmpty(userEnv.get(envKey), System.getenv(envKey)); + userEnv.put(envKey, join(File.pathSeparator, current, pathList)); +} --- End diff -- sorry, what do you mean? Your comment says: ``` Updates the user environment, appending the given pathList to the existing value of the given environment variable (or setting it if it hasn't yet been set). ``` When `pathList` is empty shouldn't we just append the value of the system env var to an empty string, i.e. just store the system env var in `userEnv`? If that's not the intended behavior then we need to update the docs... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76114681 [Test build #27977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27977/consoleFull) for PR 3916 at commit [`6184c07`](https://github.com/apache/spark/commit/6184c07d1f7ddb9bf74e898155f28340b7a33f53). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25400728 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java --- @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.File; +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import java.util.Map; + +import static org.apache.spark.launcher.CommandBuilderUtils.*; + +/** + * Launcher for Spark applications. + * p/ + * Use this class to start Spark applications programmatically. The class uses a builder pattern + * to allow clients to configure the Spark application and launch it as a child process. + * p/ + * Note that launching Spark applications using this class will not automatically load environment + * variables from the spark-env.sh or spark-env.cmd scripts in the configuration directory. + */ +public class SparkLauncher { + + /** The Spark master. */ + public static final String SPARK_MASTER = spark.master; + + /** Configuration key for the driver memory. */ + public static final String DRIVER_MEMORY = spark.driver.memory; + /** Configuration key for the driver class path. */ + public static final String DRIVER_EXTRA_CLASSPATH = spark.driver.extraClassPath; + /** Configuration key for the driver VM options. */ + public static final String DRIVER_EXTRA_JAVA_OPTIONS = spark.driver.extraJavaOptions; + /** Configuration key for the driver native library path. */ + public static final String DRIVER_EXTRA_LIBRARY_PATH = spark.driver.extraLibraryPath; + + /** Configuration key for the executor memory. */ + public static final String EXECUTOR_MEMORY = spark.executor.memory; + /** Configuration key for the executor class path. */ + public static final String EXECUTOR_EXTRA_CLASSPATH = spark.executor.extraClassPath; + /** Configuration key for the executor VM options. */ + public static final String EXECUTOR_EXTRA_JAVA_OPTIONS = spark.executor.extraJavaOptions; + /** Configuration key for the executor native library path. */ + public static final String EXECUTOR_EXTRA_LIBRARY_PATH = spark.executor.extraLibraryOptions; + /** Configuration key for the number of executor CPU cores. */ + public static final String EXECUTOR_CORES = spark.executor.cores; + + private final SparkSubmitCommandBuilder builder; + + public SparkLauncher() { +this(null); + } + + /** + * Creates a launcher that will set the given environment variables in the child. + * + * @param env Environment variables to set. + */ + public SparkLauncher(MapString, String env) { +this.builder = new SparkSubmitCommandBuilder(); +if (env != null) { + this.builder.childEnv.putAll(env); +} + } + + /** + * Set a custom JAVA_HOME for launching the Spark application. + * + * @param javaHome Path to the JAVA_HOME to use. + * @return This launcher. + */ + public SparkLauncher setJavaHome(String javaHome) { +checkNotNull(javaHome, javaHome); +builder.javaHome = javaHome; +return this; + } + + /** + * Set a custom Spark installation location for the application. + * + * @param sparkHome Path to the Spark installation to use. + * @return This launcher. + */ + public SparkLauncher setSparkHome(String sparkHome) { +checkNotNull(sparkHome, sparkHome); +builder.childEnv.put(ENV_SPARK_HOME, sparkHome); +return this; + } + + /** + * Set a custom properties file with Spark configuration for the application. + * + * @param path Path to custom properties file to use. + * @return This launcher. + */ + public SparkLauncher setPropertiesFile(String path) { +checkNotNull(path, path); +
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25400989 --- Diff: launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java --- @@ -0,0 +1,362 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.BufferedReader; +import java.io.File; +import java.io.FileFilter; +import java.io.FileInputStream; +import java.io.InputStreamReader; +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; +import java.util.jar.JarFile; +import java.util.regex.Pattern; + +import static org.apache.spark.launcher.CommandBuilderUtils.*; + +/** + * Abstract Spark command builder that defines common functionality. + */ +abstract class AbstractCommandBuilder { + + boolean verbose; + String appName; + String appResource; + String deployMode; + String javaHome; + String mainClass; + String master; + String propertiesFile; + final ListString appArgs; + final ListString jars; + final ListString files; + final ListString pyFiles; + final MapString, String childEnv; + final MapString, String conf; + + public AbstractCommandBuilder() { +this.appArgs = new ArrayListString(); +this.childEnv = new HashMapString, String(); +this.conf = new HashMapString, String(); +this.files = new ArrayListString(); +this.jars = new ArrayListString(); +this.pyFiles = new ArrayListString(); + } + + /** + * Builds the command to execute. + * + * @param env A map containing environment variables for the child process. It may already contain + *entries defined by the user (such as SPARK_HOME, or those defined by the + *SparkLauncher constructor that takes an environment), and may be modified to + *include other variables needed by the process to be executed. + */ + abstract ListString buildCommand(MapString, String env) throws IOException; + + /** + * Builds a list of arguments to run java. + * + * This method finds the java executable to use and appends JVM-specific options for running a + * class with Spark in the classpath. It also loads options from the java-opts file in the + * configuration directory being used. + * + * Callers should still add at least the class to run, as well as any arguments to pass to the + * class. + */ + ListString buildJavaCommand(String extraClassPath) throws IOException { +ListString cmd = new ArrayListString(); +if (javaHome == null) { + cmd.add(join(File.separator, System.getProperty(java.home), bin, java)); +} else { + cmd.add(join(File.separator, javaHome, bin, java)); +} + +// Load extra JAVA_OPTS from conf/java-opts, if it exists. +File javaOpts = new File(join(File.separator, getConfDir(), java-opts)); +if (javaOpts.isFile()) { + BufferedReader br = new BufferedReader(new InputStreamReader( + new FileInputStream(javaOpts), UTF-8)); + try { +String line; +while ((line = br.readLine()) != null) { + addOptionString(cmd, line); +} + } finally { +br.close(); + } +} + +cmd.add(-cp); +cmd.add(join(File.pathSeparator, buildClassPath(extraClassPath))); +return cmd; + } + + /** + * Adds the default perm gen size option for Spark if the VM requires it and the user hasn't + * set it. + */ + void addPermGenSizeOpt(ListString cmd) { +// Don't set MaxPermSize for Java 8 and later. +String[] version =
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25400987 --- Diff: launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java --- @@ -0,0 +1,362 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.BufferedReader; +import java.io.File; +import java.io.FileFilter; +import java.io.FileInputStream; +import java.io.InputStreamReader; +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; +import java.util.jar.JarFile; +import java.util.regex.Pattern; + +import static org.apache.spark.launcher.CommandBuilderUtils.*; + +/** + * Abstract Spark command builder that defines common functionality. + */ +abstract class AbstractCommandBuilder { + + boolean verbose; + String appName; + String appResource; + String deployMode; + String javaHome; + String mainClass; + String master; + String propertiesFile; + final ListString appArgs; + final ListString jars; + final ListString files; + final ListString pyFiles; + final MapString, String childEnv; + final MapString, String conf; + + public AbstractCommandBuilder() { +this.appArgs = new ArrayListString(); +this.childEnv = new HashMapString, String(); +this.conf = new HashMapString, String(); +this.files = new ArrayListString(); +this.jars = new ArrayListString(); +this.pyFiles = new ArrayListString(); + } + + /** + * Builds the command to execute. + * + * @param env A map containing environment variables for the child process. It may already contain + *entries defined by the user (such as SPARK_HOME, or those defined by the + *SparkLauncher constructor that takes an environment), and may be modified to + *include other variables needed by the process to be executed. + */ + abstract ListString buildCommand(MapString, String env) throws IOException; + + /** + * Builds a list of arguments to run java. + * + * This method finds the java executable to use and appends JVM-specific options for running a + * class with Spark in the classpath. It also loads options from the java-opts file in the + * configuration directory being used. + * + * Callers should still add at least the class to run, as well as any arguments to pass to the + * class. + */ + ListString buildJavaCommand(String extraClassPath) throws IOException { +ListString cmd = new ArrayListString(); +if (javaHome == null) { + cmd.add(join(File.separator, System.getProperty(java.home), bin, java)); +} else { + cmd.add(join(File.separator, javaHome, bin, java)); +} + +// Load extra JAVA_OPTS from conf/java-opts, if it exists. +File javaOpts = new File(join(File.separator, getConfDir(), java-opts)); +if (javaOpts.isFile()) { + BufferedReader br = new BufferedReader(new InputStreamReader( + new FileInputStream(javaOpts), UTF-8)); + try { +String line; +while ((line = br.readLine()) != null) { + addOptionString(cmd, line); +} + } finally { +br.close(); + } +} + +cmd.add(-cp); +cmd.add(join(File.pathSeparator, buildClassPath(extraClassPath))); +return cmd; + } + + /** + * Adds the default perm gen size option for Spark if the VM requires it and the user hasn't + * set it. + */ + void addPermGenSizeOpt(ListString cmd) { +// Don't set MaxPermSize for Java 8 and later. +String[] version =
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76076202 [Test build #27958 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27958/consoleFull) for PR 3916 at commit [`7e66c18`](https://github.com/apache/spark/commit/7e66c18bad6e672c60ec995bd968b3a7db32ee73). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3916#issuecomment-76076214 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27958/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25388445 --- Diff: launcher/src/main/java/org/apache/spark/launcher/Main.java --- @@ -0,0 +1,106 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashMap; +import java.util.Iterator; +import java.util.List; +import java.util.Map; + +/** + * Command line interface for the Spark launcher. Used internally by Spark scripts. + */ +public class Main extends LauncherCommon { + + /** + * Usage: Main [class] [class args] + * p/ + * This CLI works in two different modes: + * ul + * lispark-submit: if iclass/i is org.apache.spark.deploy.SparkSubmit, the + * {@link SparkLauncher} class is used to launch a Spark application./li + * lispark-class: if another class is provided, an internal Spark class is run./li + * /ul + * + * The ultimate command will not be run in the same process. Instead, the command to be executed + * will be printed to stdout. On Unix systems, this will be one argument per line. On Windows + * systems, this will be a single line containing the command to be executed. + */ + public static void main(String[] argsArray) throws Exception { +checkArgument(argsArray.length 0, Not enough arguments: missing class name.); + +ListString args = new ArrayListString(Arrays.asList(argsArray)); +String className = args.remove(0); + +boolean printLaunchCommand = false; +AbstractLauncher? launcher; +try { + if (className.equals(org.apache.spark.deploy.SparkSubmit)) { +launcher = new SparkSubmitCliLauncher(args); +printLaunchCommand = !isEmpty(System.getenv(SPARK_PRINT_LAUNCH_COMMAND)); + } else if (className.equals(pyspark)) { +launcher = new PySparkLauncher(args); + } else { +launcher = new SparkClassLauncher(className, args); +printLaunchCommand = !isEmpty(System.getenv(SPARK_PRINT_LAUNCH_COMMAND)); + } +} catch (IllegalArgumentException e) { + launcher = new UsageLauncher(); +} + +ListString cmd = launcher.buildLauncherCommand(); +if (printLaunchCommand) { + System.err.println(Spark Command: + join( , cmd)); + System.err.println(); +} + +for (String c : cmd) { + System.out.println(c); +} + } + + /** + * Internal launcher used when command line parsing fails. This will behave differently depending + * on the platform: + * + * - On Unix-like systems, it will print a call to the usage function with argument 1. The + * function is expected to print the command's usage and exit with the provided exit code. + * The script should use export -f usage after declaring a function called usage, so that + * the function is available to downstream scripts. + * + * - On Windows it will set the variable SPARK_LAUNCHER_USAGE_ERROR to 1. The batch script + * should check for this variable and print its usage, since batch scripts don't really support + * the export -f functionality used in bash. + */ + private static class UsageLauncher extends AbstractLauncherUsageLauncher { + +@Override +protected ListString buildLauncherCommand() { + if (isWindows()) { +return Arrays.asList(set SPARK_LAUNCHER_USAGE_ERROR=1); + } else { +return Arrays.asList(usage 1); + } --- End diff -- Anything here will need to be interpreted by the scripts in some way, so the synchronization you talk about will be there one way or another. empty output meaning print usage message seems a bit weird. The values are already
[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/3916#discussion_r25394241 --- Diff: launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java --- @@ -0,0 +1,335 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the License); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an AS IS BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.launcher; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Properties; + +import static org.apache.spark.launcher.CommandBuilderUtils.*; + +/** + * Special launcher for handling a CLI invocation of SparkSubmit. + * p/ + * This launcher extends SparkLauncher to add command line parsing compatible with + * SparkSubmit. It handles setting driver-side options and special parsing needed + * for the different specialClasses. + * p/ + * This class has also some special features to aid PySparkLauncher. + */ +class SparkSubmitCommandBuilder extends AbstractCommandBuilder { + + /** + * Name of the app resource used to identify the PySpark shell. The command line parser expects + * the resource name to be the very first argument to spark-submit in this case. + * + * NOTE: this cannot be pyspark-shell since that identifies the PySpark shell to SparkSubmit + * (see java_gateway.py), and can cause this code to enter into an infinite loop. + */ + static final String PYSPARK_SHELL = pyspark-shell-main; + + /** + * This is the actual resource name that identifies the PySpark shell to SparkSubmit. + */ + static final String PYSPARK_SHELL_RESOURCE = pyspark-shell; + + /** + * This map must match the class names for available special classes, since this modifies the way + * command line parsing works. This maps the class name to the resource to use when calling + * spark-submit. + */ + private static final MapString, String specialClasses = new HashMapString, String(); + static { +specialClasses.put(org.apache.spark.repl.Main, spark-shell); + specialClasses.put(org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver, + spark-internal); + } + + private final ListString sparkArgs; + private boolean hasMixedArguments; + + SparkSubmitCommandBuilder() { +this.sparkArgs = new ArrayListString(); + } + + SparkSubmitCommandBuilder(ListString args) { +this(false, args); + } + + SparkSubmitCommandBuilder(boolean hasMixedArguments, ListString args) { +this(); +ListString submitArgs = args; +if (args.size() 0 args.get(0).equals(PYSPARK_SHELL)) { + this.hasMixedArguments = true; + appResource = PYSPARK_SHELL_RESOURCE; + submitArgs = args.subList(1, args.size()); +} else { + this.hasMixedArguments = hasMixedArguments; +} + +new OptionParser().parse(submitArgs); + } + + @Override + public ListString buildCommand(MapString, String env) throws IOException { +if (PYSPARK_SHELL_RESOURCE.equals(appResource)) { + return buildPySparkShellCommand(env); +} else { + return buildSparkSubmitCommand(env); +} + } + + ListString buildSparkSubmitArgs() { +ListString args = new ArrayListString(); +SparkSubmitOptionParser parser = new SparkSubmitOptionParser(); + +if (verbose) { + args.add(parser.VERBOSE); +} + +if (master != null) { + args.add(parser.MASTER); + args.add(master); +} + +if (deployMode != null) { + args.add(parser.DEPLOY_MODE); + args.add(deployMode); +} + +if (appName != null) { + args.add(parser.NAME); + args.add(appName); +} + +for (Map.EntryString, String e : conf.entrySet()) { +