[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-13 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26401382
  
--- Diff: bin/pyspark ---
@@ -95,26 +83,13 @@ export 
PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
 
 # Load the PySpark shell.py script when ./pyspark is used interactively:
 export OLD_PYTHONSTARTUP=$PYTHONSTARTUP
-export PYTHONSTARTUP=$FWDIR/python/pyspark/shell.py
-
-# Build up arguments list manually to preserve quotes and backslashes.
-# We export Spark submit arguments as an environment variable because 
shell.py must run as a
-# PYTHONSTARTUP script, which does not take in arguments. This is required 
for IPython notebooks.
-SUBMIT_USAGE_FUNCTION=usage
-gatherSparkSubmitOpts $@
-PYSPARK_SUBMIT_ARGS=
-whitespace=[[:space:]]
-for i in ${SUBMISSION_OPTS[@]}; do
-  if [[ $i =~ \ ]]; then i=$(echo $i | sed 's/\/\\\/g'); fi
-  if [[ $i =~ $whitespace ]]; then i=\$i\; fi
-  PYSPARK_SUBMIT_ARGS=$PYSPARK_SUBMIT_ARGS $i
-done
-export PYSPARK_SUBMIT_ARGS
--- End diff --

Actually `pyspark-shell` shouldn't be in PYSPARK_SUBMIT_ARGS at all. It 
probably works out of luck when you add it at the end :-), but it's handled 
internally by the launcher library.

(We should probably file a bug to remove this testing stuff from 
`bin/pyspark`. That would allow other cleanups in these scripts.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-12 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26361718
  
--- Diff: bin/pyspark ---
@@ -95,26 +83,13 @@ export 
PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
 
 # Load the PySpark shell.py script when ./pyspark is used interactively:
 export OLD_PYTHONSTARTUP=$PYTHONSTARTUP
-export PYTHONSTARTUP=$FWDIR/python/pyspark/shell.py
-
-# Build up arguments list manually to preserve quotes and backslashes.
-# We export Spark submit arguments as an environment variable because 
shell.py must run as a
-# PYTHONSTARTUP script, which does not take in arguments. This is required 
for IPython notebooks.
-SUBMIT_USAGE_FUNCTION=usage
-gatherSparkSubmitOpts $@
-PYSPARK_SUBMIT_ARGS=
-whitespace=[[:space:]]
-for i in ${SUBMISSION_OPTS[@]}; do
-  if [[ $i =~ \ ]]; then i=$(echo $i | sed 's/\/\\\/g'); fi
-  if [[ $i =~ $whitespace ]]; then i=\$i\; fi
-  PYSPARK_SUBMIT_ARGS=$PYSPARK_SUBMIT_ARGS $i
-done
-export PYSPARK_SUBMIT_ARGS
--- End diff --

Looks like `pyspark-shell` have to in the end of `PYSPARK_SUBMIT_ARGS`, I 
changed the ordering to `--jars ${KAFKA_ASSEBMLY_JAR} pyspark-shell`, so it 
works. I'm not sure is it a right behavior or just a bug?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-12 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26320982
  
--- Diff: bin/pyspark ---
@@ -95,26 +83,13 @@ export 
PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
 
 # Load the PySpark shell.py script when ./pyspark is used interactively:
 export OLD_PYTHONSTARTUP=$PYTHONSTARTUP
-export PYTHONSTARTUP=$FWDIR/python/pyspark/shell.py
-
-# Build up arguments list manually to preserve quotes and backslashes.
-# We export Spark submit arguments as an environment variable because 
shell.py must run as a
-# PYTHONSTARTUP script, which does not take in arguments. This is required 
for IPython notebooks.
-SUBMIT_USAGE_FUNCTION=usage
-gatherSparkSubmitOpts $@
-PYSPARK_SUBMIT_ARGS=
-whitespace=[[:space:]]
-for i in ${SUBMISSION_OPTS[@]}; do
-  if [[ $i =~ \ ]]; then i=$(echo $i | sed 's/\/\\\/g'); fi
-  if [[ $i =~ $whitespace ]]; then i=\$i\; fi
-  PYSPARK_SUBMIT_ARGS=$PYSPARK_SUBMIT_ARGS $i
-done
-export PYSPARK_SUBMIT_ARGS
--- End diff --

It depends on how the unit tests are run. Are they run through 
spark-submit? The you just pass `--jars` to spark-submit.

If they're run by the code under the `$SPARK_TESTING` in this file, then 
setting that env variable will probably work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-12 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26358490
  
--- Diff: bin/pyspark ---
@@ -95,26 +83,13 @@ export 
PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
 
 # Load the PySpark shell.py script when ./pyspark is used interactively:
 export OLD_PYTHONSTARTUP=$PYTHONSTARTUP
-export PYTHONSTARTUP=$FWDIR/python/pyspark/shell.py
-
-# Build up arguments list manually to preserve quotes and backslashes.
-# We export Spark submit arguments as an environment variable because 
shell.py must run as a
-# PYTHONSTARTUP script, which does not take in arguments. This is required 
for IPython notebooks.
-SUBMIT_USAGE_FUNCTION=usage
-gatherSparkSubmitOpts $@
-PYSPARK_SUBMIT_ARGS=
-whitespace=[[:space:]]
-for i in ${SUBMISSION_OPTS[@]}; do
-  if [[ $i =~ \ ]]; then i=$(echo $i | sed 's/\/\\\/g'); fi
-  if [[ $i =~ $whitespace ]]; then i=\$i\; fi
-  PYSPARK_SUBMIT_ARGS=$PYSPARK_SUBMIT_ARGS $i
-done
-export PYSPARK_SUBMIT_ARGS
--- End diff --

Thanks @vanzin for your reply. Actually the code is running under 
`$SPARK_TESTING`, I tried to use `PYSPARK_SUBMIT_ARGS` to expose the 
environment, like this:

```
export PYSPARK_SUBMIT_ARGS=pyspark-shell --jars ${KAFKA_ASSEMBLY_JAR}
```

I'm not sure is it the right way? But seems the jar cannot add into the 
SparkContext, is there anything else I should take care?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-11 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-78219718
  
Okay cool - LGTM I will pull this in. I just did some local sanity tests, 
built with Maven and ran (a few) Maven tests. We'll need to keep any eye on the 
Maven build tomorrow to see if this is succeeding there, since it's not caught 
by the PRB. Thanks @vanzin for all the work on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-11 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3916


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-11 Thread jerryshao
Github user jerryshao commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26279970
  
--- Diff: bin/pyspark ---
@@ -95,26 +83,13 @@ export 
PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
 
 # Load the PySpark shell.py script when ./pyspark is used interactively:
 export OLD_PYTHONSTARTUP=$PYTHONSTARTUP
-export PYTHONSTARTUP=$FWDIR/python/pyspark/shell.py
-
-# Build up arguments list manually to preserve quotes and backslashes.
-# We export Spark submit arguments as an environment variable because 
shell.py must run as a
-# PYTHONSTARTUP script, which does not take in arguments. This is required 
for IPython notebooks.
-SUBMIT_USAGE_FUNCTION=usage
-gatherSparkSubmitOpts $@
-PYSPARK_SUBMIT_ARGS=
-whitespace=[[:space:]]
-for i in ${SUBMISSION_OPTS[@]}; do
-  if [[ $i =~ \ ]]; then i=$(echo $i | sed 's/\/\\\/g'); fi
-  if [[ $i =~ $whitespace ]]; then i=\$i\; fi
-  PYSPARK_SUBMIT_ARGS=$PYSPARK_SUBMIT_ARGS $i
-done
-export PYSPARK_SUBMIT_ARGS
--- End diff --

Hi @vanzin , I'm now adding the Kafka python unit test, and it requires to 
add Kafka assembly jar with `--jars` argument, as I see now you removed this 
codes, so for now how to add jars with pyspark for unit test, using 
`PYSPARK_SUBMIT_ARGS`? Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-78182670
  
  [Test build #28447 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28447/consoleFull)
 for   PR 3916 at commit 
[`18c7e4d`](https://github.com/apache/spark/commit/18c7e4db3b9713c4bc13487e3a15e59b6bf2dc58).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-78182678
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28447/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-10 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-78143279
  
Yeah this LGTM. There are a few other minor comments I left regarding 
visibility but they don't have to block this PR from going in. I haven't tested 
whether all the cases (Windows, YARN, PySpark...) are maintained as before and 
I won't have the bandwidth to do so this week, but we can always do that 
separately.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-10 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-78003405
  
@andrewor14 are you good with this one? I'd like to merge it soon!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-10 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-78158391
  
FYI I ran our test suite with this change (covers standalone/client, 
yarn/both, spark-shell and pyspark, Linux only) and all looks ok. Also did some 
manual testing on Windows.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-78171114
  
  [Test build #28447 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28447/consoleFull)
 for   PR 3916 at commit 
[`18c7e4d`](https://github.com/apache/spark/commit/18c7e4db3b9713c4bc13487e3a15e59b6bf2dc58).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079948
  
--- Diff: bin/spark-sql ---
@@ -43,15 +46,12 @@ function usage {
   echo
   echo CLI options:
   $FWDIR/bin/spark-class $CLASS --help 21 | grep -v $pattern 12
+  exit $2
--- End diff --

```
exit $2
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079926
  
--- Diff: bin/spark-sql ---
@@ -25,12 +25,15 @@ set -o posix
 
 # NOTE: This exact class name is matched downstream by SparkSubmit.
 # Any changes need to be reflected there.
-CLASS=org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
+export CLASS=org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver
 
 # Figure out where Spark is installed
-FWDIR=$(cd `dirname $0`/..; pwd)
+export FWDIR=$(cd `dirname $0`/..; pwd)
 
 function usage {
+  if [ -n $1 ]; then
+echo $1
--- End diff --

```
echo $1
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079987
  
--- Diff: bin/spark-submit ---
@@ -17,58 +17,18 @@
 # limitations under the License.
 #
 
-# NOTE: Any changes in this file must be reflected in 
SparkSubmitDriverBootstrapper.scala!
-
-export SPARK_HOME=$(cd `dirname $0`/..; pwd)
-ORIG_ARGS=($@)
-
-# Set COLUMNS for progress bar
-export COLUMNS=`tput cols`
-
-while (($#)); do
-  if [ $1 = --deploy-mode ]; then
-SPARK_SUBMIT_DEPLOY_MODE=$2
-  elif [ $1 = --properties-file ]; then
-SPARK_SUBMIT_PROPERTIES_FILE=$2
-  elif [ $1 = --driver-memory ]; then
-export SPARK_SUBMIT_DRIVER_MEMORY=$2
-  elif [ $1 = --driver-library-path ]; then
-export SPARK_SUBMIT_LIBRARY_PATH=$2
-  elif [ $1 = --driver-class-path ]; then
-export SPARK_SUBMIT_CLASSPATH=$2
-  elif [ $1 = --driver-java-options ]; then
-export SPARK_SUBMIT_OPTS=$2
-  elif [ $1 = --master ]; then
-export MASTER=$2
-  fi
-  shift
-done
-
-if [ -z $SPARK_CONF_DIR ]; then
-  export SPARK_CONF_DIR=$SPARK_HOME/conf
-fi
-DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf
-if [ $MASTER == yarn-cluster ]; then
-  SPARK_SUBMIT_DEPLOY_MODE=cluster
+SPARK_HOME=$(cd `dirname $0`/..; pwd)
+
+# Only define a usage function if an upstream script hasn't done so.
+if ! type -t usage /dev/null 21; then
+  usage() {
+if [ -n $1 ]; then
+  echo $1
+fi
+$SPARK_HOME/bin/spark-class org.apache.spark.deploy.SparkSubmit 
--help
+exit $2
--- End diff --

```
exit $2
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079978
  
--- Diff: bin/spark-submit ---
@@ -17,58 +17,18 @@
 # limitations under the License.
 #
 
-# NOTE: Any changes in this file must be reflected in 
SparkSubmitDriverBootstrapper.scala!
-
-export SPARK_HOME=$(cd `dirname $0`/..; pwd)
-ORIG_ARGS=($@)
-
-# Set COLUMNS for progress bar
-export COLUMNS=`tput cols`
-
-while (($#)); do
-  if [ $1 = --deploy-mode ]; then
-SPARK_SUBMIT_DEPLOY_MODE=$2
-  elif [ $1 = --properties-file ]; then
-SPARK_SUBMIT_PROPERTIES_FILE=$2
-  elif [ $1 = --driver-memory ]; then
-export SPARK_SUBMIT_DRIVER_MEMORY=$2
-  elif [ $1 = --driver-library-path ]; then
-export SPARK_SUBMIT_LIBRARY_PATH=$2
-  elif [ $1 = --driver-class-path ]; then
-export SPARK_SUBMIT_CLASSPATH=$2
-  elif [ $1 = --driver-java-options ]; then
-export SPARK_SUBMIT_OPTS=$2
-  elif [ $1 = --master ]; then
-export MASTER=$2
-  fi
-  shift
-done
-
-if [ -z $SPARK_CONF_DIR ]; then
-  export SPARK_CONF_DIR=$SPARK_HOME/conf
-fi
-DEFAULT_PROPERTIES_FILE=$SPARK_CONF_DIR/spark-defaults.conf
-if [ $MASTER == yarn-cluster ]; then
-  SPARK_SUBMIT_DEPLOY_MODE=cluster
+SPARK_HOME=$(cd `dirname $0`/..; pwd)
+
+# Only define a usage function if an upstream script hasn't done so.
+if ! type -t usage /dev/null 21; then
+  usage() {
+if [ -n $1 ]; then
+  echo $1
--- End diff --

```
echo $1
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26080561
  
--- Diff: sbin/spark-daemon.sh ---
@@ -121,45 +121,63 @@ if [ $SPARK_NICENESS =  ]; then
 export SPARK_NICENESS=0
 fi
 
+run_command() {
+  mode=$1
+  shift
 
-case $option in
-
-  (start|spark-submit)
-
-mkdir -p $SPARK_PID_DIR
+  mkdir -p $SPARK_PID_DIR
 
-if [ -f $pid ]; then
-  TARGET_ID=$(cat $pid)
-  if [[ $(ps -p $TARGET_ID -o args=) =~ $command ]]; then
-echo $command running as process $TARGET_ID.  Stop it first.
-exit 1
-  fi
+  if [ -f $pid ]; then
+TARGET_ID=$(cat $pid)
+if [[ $(ps -p $TARGET_ID -o args=) =~ $command ]]; then
+  echo $command running as process $TARGET_ID.  Stop it first.
+  exit 1
 fi
+  fi
 
-if [ $SPARK_MASTER !=  ]; then
-  echo rsync from $SPARK_MASTER
-  rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' 
--exclude='contrib/hod/logs/*' $SPARK_MASTER/ $SPARK_HOME
-fi
+  if [ $SPARK_MASTER !=  ]; then
+echo rsync from $SPARK_MASTER
+rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' 
--exclude='contrib/hod/logs/*' $SPARK_MASTER/ $SPARK_HOME
--- End diff --

```
$SPARK_MASTER/
```

Unless this variable should potentially be interpreted as multiple 
arguments to `rsync`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26080617
  
--- Diff: sbin/spark-daemon.sh ---
@@ -121,45 +121,63 @@ if [ $SPARK_NICENESS =  ]; then
 export SPARK_NICENESS=0
 fi
 
+run_command() {
+  mode=$1
+  shift
 
-case $option in
-
-  (start|spark-submit)
-
-mkdir -p $SPARK_PID_DIR
+  mkdir -p $SPARK_PID_DIR
 
-if [ -f $pid ]; then
-  TARGET_ID=$(cat $pid)
-  if [[ $(ps -p $TARGET_ID -o args=) =~ $command ]]; then
-echo $command running as process $TARGET_ID.  Stop it first.
-exit 1
-  fi
+  if [ -f $pid ]; then
+TARGET_ID=$(cat $pid)
+if [[ $(ps -p $TARGET_ID -o args=) =~ $command ]]; then
+  echo $command running as process $TARGET_ID.  Stop it first.
+  exit 1
 fi
+  fi
 
-if [ $SPARK_MASTER !=  ]; then
-  echo rsync from $SPARK_MASTER
-  rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' 
--exclude='contrib/hod/logs/*' $SPARK_MASTER/ $SPARK_HOME
-fi
+  if [ $SPARK_MASTER !=  ]; then
+echo rsync from $SPARK_MASTER
+rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' 
--exclude='contrib/hod/logs/*' $SPARK_MASTER/ $SPARK_HOME
+  fi
 
-spark_rotate_log $log
-echo starting $command, logging to $log
-if [ $option == spark-submit ]; then
-  source $SPARK_HOME/bin/utils.sh
-  gatherSparkSubmitOpts $@
-  nohup nice -n $SPARK_NICENESS $SPARK_PREFIX/bin/spark-submit 
--class $command \
-${SUBMISSION_OPTS[@]} spark-internal ${APPLICATION_OPTS[@]}  
$log 21  /dev/null 
-else
+  spark_rotate_log $log
+  echo starting $command, logging to $log
+
+  case $mode in
--- End diff --

```
case $mode in
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77929101
  
Hey Marcelo,

I think this is really close to being ready to merge. I noticed some extra 
output (a '0') when running usage on `spark-shell`:

```
 ./bin/spark-shell --help
0
Usage: ./bin/spark-shell [options]
Spark assembly has been built with Hive, including Datanucleus jars in 
classpath.
```

I think whether this should be an exposed API is still a question. However, 
I'm fine to merge v1 of this as an exposed API and get feedback. It would be 
helpful to hear from potential users whether this is useful to them... so far I 
haven't heard much in that direction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26080422
  
--- Diff: sbin/spark-daemon.sh ---
@@ -121,45 +121,63 @@ if [ $SPARK_NICENESS =  ]; then
 export SPARK_NICENESS=0
 fi
 
+run_command() {
+  mode=$1
--- End diff --

```
mode=$1
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26081687
  
--- Diff: bin/pyspark2.cmd ---
@@ -17,59 +17,22 @@ rem See the License for the specific language governing 
permissions and
 rem limitations under the License.
 rem
 
-set SCALA_VERSION=2.10
-
 rem Figure out where the Spark framework is installed
-set FWDIR=%~dp0..\
-
-rem Export this as SPARK_HOME
-set SPARK_HOME=%FWDIR%
-
-rem Test whether the user has built Spark
-if exist %FWDIR%RELEASE goto skip_build_test
-set FOUND_JAR=0
-for %%d in 
(%FWDIR%assembly\target\scala-%SCALA_VERSION%\spark-assembly*hadoop*.jar) do (
-  set FOUND_JAR=1
-)
-if [%FOUND_JAR%] == [0] (
-  echo Failed to find Spark assembly JAR.
-  echo You need to build Spark before running this program.
-  goto exit
-)
-:skip_build_test
+set SPARK_HOME=%~dp0..
 
 rem Load environment variables from conf\spark-env.cmd, if it exists
-if exist %FWDIR%conf\spark-env.cmd call %FWDIR%conf\spark-env.cmd
+if exist %SPARK_HOME%\conf\spark-env.cmd call 
%SPARK_HOME%\conf\spark-env.cmd
 
 rem Figure out which Python to use.
-if [%PYSPARK_PYTHON%] == [] set PYSPARK_PYTHON=python
+if x%PYSPARK_DRIVER_PYTHON%==x (
+  set PYSPARK_DRIVER_PYTHON=python
+  if not [%PYSPARK_PYTHON%] == [] set 
PYSPARK_DRIVER_PYTHON=%PYSPARK_PYTHON%
+)
 
-set PYTHONPATH=%FWDIR%python;%PYTHONPATH%
-set PYTHONPATH=%FWDIR%python\lib\py4j-0.8.2.1-src.zip;%PYTHONPATH%
+set PYTHONPATH=%SPARK_HOME%\python;%PYTHONPATH%
+set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.8.2.1-src.zip;%PYTHONPATH%
 
 set OLD_PYTHONSTARTUP=%PYTHONSTARTUP%
-set PYTHONSTARTUP=%FWDIR%python\pyspark\shell.py
-set PYSPARK_SUBMIT_ARGS=%*
-
-echo Running %PYSPARK_PYTHON% with PYTHONPATH=%PYTHONPATH%
-
-rem Check whether the argument is a file
-for /f %%i in ('echo %1^| findstr /R \.py') do (
-  set PYTHON_FILE=%%i
-)
-
-if [%PYTHON_FILE%] == [] (
-  if [%IPYTHON%] == [1] (
-   ipython %IPYTHON_OPTS%
-  ) else (
-   %PYSPARK_PYTHON%
-  ) 
-) else (
-  echo.
-  echo WARNING: Running python applications through ./bin/pyspark.cmd is 
deprecated as of Spark 1.0.
-  echo Use ./bin/spark-submit ^python file^
-  echo.
-  %FWDIR%\bin\spark-submit.cmd %PYSPARK_SUBMIT_ARGS%
-)
+set PYTHONSTARTUP=%SPARK_HOME%\python\pyspark\shell.py
--- End diff --

OK, good to know.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77954145
  
Sorry to spam the PR with comments about Bash word splitting, @vanzin. I 
think what you have Bash-wise looks good, but as a matter of consistency I just 
went through and marked the places where Bash word splitting bugs are 
theoretically possible.

As a habit, I think we should always quote Bash output (whether of 
variables, subshell, etc.), though I can see if in some cases you feel it would 
be unnecessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079243
  
--- Diff: bin/spark-class ---
@@ -110,83 +39,48 @@ else
 exit 1
   fi
 fi
-JAVA_VERSION=$($RUNNER -version 21 | grep 'version' | sed 's/.* 
version \(.*\)\.\(.*\)\..*/\1\2/; 1q')
-
-# Set JAVA_OPTS to be able to load native libraries and to set heap size
-if [ $JAVA_VERSION -ge 18 ]; then
-  JAVA_OPTS=$OUR_JAVA_OPTS
-else
-  JAVA_OPTS=-XX:MaxPermSize=128m $OUR_JAVA_OPTS
-fi
-JAVA_OPTS=$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM
-
-# Load extra JAVA_OPTS from conf/java-opts, if it exists
-if [ -e $SPARK_CONF_DIR/java-opts ] ; then
-  JAVA_OPTS=$JAVA_OPTS `cat $SPARK_CONF_DIR/java-opts`
-fi
-
-# Attention: when changing the way the JAVA_OPTS are assembled, the change 
must be reflected in CommandUtils.scala!
-
-TOOLS_DIR=$FWDIR/tools
-SPARK_TOOLS_JAR=
-if [ -e 
$TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then
-  # Use the JAR from the SBT build
-  export SPARK_TOOLS_JAR=`ls 
$TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar`
-fi
-if [ -e $TOOLS_DIR/target/spark-tools*[0-9Tg].jar ]; then
-  # Use the JAR from the Maven build
-  # TODO: this also needs to become an assembly!
-  export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/spark-tools*[0-9Tg].jar`
-fi
 
-# Compute classpath using external script
-classpath_output=$($FWDIR/bin/compute-classpath.sh)
-if [[ $? != 0 ]]; then
-  echo $classpath_output
-  exit 1
-else
-  CLASSPATH=$classpath_output
-fi
+# Look for the launcher. In non-release mode, add the compiled classes 
directly to the classpath
+# instead of looking for a jar file.
+SPARK_LAUNCHER_CP=
+if [ -f $SPARK_HOME/RELEASE ]; then
+  LAUNCHER_DIR=$SPARK_HOME/lib
+  num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc 
-l)
--- End diff --

Programmatically consuming the output of `ls` is [generally 
discouraged](http://mywiki.wooledge.org/ParsingLs).

The previous approach using a loop and globbing, or alternate approaches 
using `find` are safer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26078786
  
--- Diff: bin/pyspark2.cmd ---
@@ -17,59 +17,22 @@ rem See the License for the specific language governing 
permissions and
 rem limitations under the License.
 rem
 
-set SCALA_VERSION=2.10
-
 rem Figure out where the Spark framework is installed
-set FWDIR=%~dp0..\
-
-rem Export this as SPARK_HOME
-set SPARK_HOME=%FWDIR%
-
-rem Test whether the user has built Spark
-if exist %FWDIR%RELEASE goto skip_build_test
-set FOUND_JAR=0
-for %%d in 
(%FWDIR%assembly\target\scala-%SCALA_VERSION%\spark-assembly*hadoop*.jar) do (
-  set FOUND_JAR=1
-)
-if [%FOUND_JAR%] == [0] (
-  echo Failed to find Spark assembly JAR.
-  echo You need to build Spark before running this program.
-  goto exit
-)
-:skip_build_test
+set SPARK_HOME=%~dp0..
 
 rem Load environment variables from conf\spark-env.cmd, if it exists
-if exist %FWDIR%conf\spark-env.cmd call %FWDIR%conf\spark-env.cmd
+if exist %SPARK_HOME%\conf\spark-env.cmd call 
%SPARK_HOME%\conf\spark-env.cmd
 
 rem Figure out which Python to use.
-if [%PYSPARK_PYTHON%] == [] set PYSPARK_PYTHON=python
+if x%PYSPARK_DRIVER_PYTHON%==x (
+  set PYSPARK_DRIVER_PYTHON=python
+  if not [%PYSPARK_PYTHON%] == [] set 
PYSPARK_DRIVER_PYTHON=%PYSPARK_PYTHON%
+)
 
-set PYTHONPATH=%FWDIR%python;%PYTHONPATH%
-set PYTHONPATH=%FWDIR%python\lib\py4j-0.8.2.1-src.zip;%PYTHONPATH%
+set PYTHONPATH=%SPARK_HOME%\python;%PYTHONPATH%
+set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.8.2.1-src.zip;%PYTHONPATH%
 
 set OLD_PYTHONSTARTUP=%PYTHONSTARTUP%
-set PYTHONSTARTUP=%FWDIR%python\pyspark\shell.py
-set PYSPARK_SUBMIT_ARGS=%*
-
-echo Running %PYSPARK_PYTHON% with PYTHONPATH=%PYTHONPATH%
-
-rem Check whether the argument is a file
-for /f %%i in ('echo %1^| findstr /R \.py') do (
-  set PYTHON_FILE=%%i
-)
-
-if [%PYTHON_FILE%] == [] (
-  if [%IPYTHON%] == [1] (
-   ipython %IPYTHON_OPTS%
-  ) else (
-   %PYSPARK_PYTHON%
-  ) 
-) else (
-  echo.
-  echo WARNING: Running python applications through ./bin/pyspark.cmd is 
deprecated as of Spark 1.0.
-  echo Use ./bin/spark-submit ^python file^
-  echo.
-  %FWDIR%\bin\spark-submit.cmd %PYSPARK_SUBMIT_ARGS%
-)
+set PYTHONSTARTUP=%SPARK_HOME%\python\pyspark\shell.py
--- End diff --

Does Windows batch script suffer from the same types of word splitting 
problems that Bash does?

For example, if the value of `%SPARK_HOME%` contains a space, will this 
line still work correctly?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26078620
  
--- Diff: bin/pyspark ---
@@ -18,36 +18,24 @@
 #
 
 # Figure out where Spark is installed
-FWDIR=$(cd `dirname $0`/..; pwd)
+export SPARK_HOME=$(cd `dirname $0`/..; pwd)
 
-# Export this as SPARK_HOME
-export SPARK_HOME=$FWDIR
-
-source $FWDIR/bin/utils.sh
-
-source $FWDIR/bin/load-spark-env.sh
+source $SPARK_HOME/bin/load-spark-env.sh
 
 function usage() {
+  if [ -n $1 ]; then
+echo $1
+  fi
   echo Usage: ./bin/pyspark [options] 12
-  $FWDIR/bin/spark-submit --help 21 | grep -v Usage 12
-  exit 0
+  $SPARK_HOME/bin/spark-submit --help 21 | grep -v Usage 12
+  exit $2
--- End diff --

Similarly, regarding word splitting, I suggest:

```
exit $2
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079792
  
--- Diff: bin/spark-class ---
@@ -110,83 +39,48 @@ else
 exit 1
   fi
 fi
-JAVA_VERSION=$($RUNNER -version 21 | grep 'version' | sed 's/.* 
version \(.*\)\.\(.*\)\..*/\1\2/; 1q')
-
-# Set JAVA_OPTS to be able to load native libraries and to set heap size
-if [ $JAVA_VERSION -ge 18 ]; then
-  JAVA_OPTS=$OUR_JAVA_OPTS
-else
-  JAVA_OPTS=-XX:MaxPermSize=128m $OUR_JAVA_OPTS
-fi
-JAVA_OPTS=$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM
-
-# Load extra JAVA_OPTS from conf/java-opts, if it exists
-if [ -e $SPARK_CONF_DIR/java-opts ] ; then
-  JAVA_OPTS=$JAVA_OPTS `cat $SPARK_CONF_DIR/java-opts`
-fi
-
-# Attention: when changing the way the JAVA_OPTS are assembled, the change 
must be reflected in CommandUtils.scala!
-
-TOOLS_DIR=$FWDIR/tools
-SPARK_TOOLS_JAR=
-if [ -e 
$TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then
-  # Use the JAR from the SBT build
-  export SPARK_TOOLS_JAR=`ls 
$TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar`
-fi
-if [ -e $TOOLS_DIR/target/spark-tools*[0-9Tg].jar ]; then
-  # Use the JAR from the Maven build
-  # TODO: this also needs to become an assembly!
-  export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/spark-tools*[0-9Tg].jar`
-fi
 
-# Compute classpath using external script
-classpath_output=$($FWDIR/bin/compute-classpath.sh)
-if [[ $? != 0 ]]; then
-  echo $classpath_output
-  exit 1
-else
-  CLASSPATH=$classpath_output
-fi
+# Look for the launcher. In non-release mode, add the compiled classes 
directly to the classpath
+# instead of looking for a jar file.
+SPARK_LAUNCHER_CP=
+if [ -f $SPARK_HOME/RELEASE ]; then
+  LAUNCHER_DIR=$SPARK_HOME/lib
+  num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc 
-l)
+  if [ $num_jars -eq 0 -a -z $SPARK_LAUNCHER_CP ]; then
+echo Failed to find Spark launcher in $LAUNCHER_DIR. 12
+echo You need to build Spark before running this program. 12
+exit 1
+  fi
 
-if [[ $1 =~ org.apache.spark.tools.* ]]; then
-  if test -z $SPARK_TOOLS_JAR; then
-echo Failed to find Spark Tools Jar in 
$FWDIR/tools/target/scala-$SPARK_SCALA_VERSION/ 12
-echo You need to run \build/sbt tools/package\ before running $1. 
12
+  LAUNCHER_JARS=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ 
|| true)
+  if [ $num_jars -gt 1 ]; then
+echo Found multiple Spark launcher jars in $LAUNCHER_DIR: 12
+echo $LAUNCHER_JARS 12
+echo Please remove all but one jar. 12
 exit 1
   fi
-  CLASSPATH=$CLASSPATH:$SPARK_TOOLS_JAR
-fi
 
-if $cygwin; then
-  CLASSPATH=`cygpath -wp $CLASSPATH`
-  if [ $1 == org.apache.spark.tools.JavaAPICompletenessChecker ]; then
-export SPARK_TOOLS_JAR=`cygpath -w $SPARK_TOOLS_JAR`
+  SPARK_LAUNCHER_CP=${LAUNCHER_DIR}/${LAUNCHER_JARS}
+else
+  LAUNCHER_DIR=$SPARK_HOME/launcher/target/scala-$SPARK_SCALA_VERSION
+  if [ ! -d $LAUNCHER_DIR/classes ]; then
+echo Failed to find Spark launcher classes in $LAUNCHER_DIR. 12
+echo You need to build Spark before running this program. 12
+exit 1
   fi
+  SPARK_LAUNCHER_CP=$LAUNCHER_DIR/classes
 fi
-export CLASSPATH
 
-# In Spark submit client mode, the driver is launched in the same JVM as 
Spark submit itself.
-# Here we must parse the properties file for relevant spark.driver.* 
configs before launching
-# the driver JVM itself. Instead of handling this complexity in Bash, we 
launch a separate JVM
-# to prepare the launch environment of this driver JVM.
+# The launcher library will print arguments separated by a NULL character, 
to allow arguments with
+# characters that would be otherwise interpreted by the shell. Read that 
in a while loop, populating
+# an array that will be used to exec the final command.
+CMD=()
+while IFS= read -d '' -r ARG; do
+  CMD+=($ARG)
+done  ($RUNNER -cp $SPARK_LAUNCHER_CP org.apache.spark.launcher.Main 
$@)
--- End diff --

Is `$RUNNER` safe to leave unquoted here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079694
  
--- Diff: bin/spark-class ---
@@ -110,83 +39,48 @@ else
 exit 1
   fi
 fi
-JAVA_VERSION=$($RUNNER -version 21 | grep 'version' | sed 's/.* 
version \(.*\)\.\(.*\)\..*/\1\2/; 1q')
-
-# Set JAVA_OPTS to be able to load native libraries and to set heap size
-if [ $JAVA_VERSION -ge 18 ]; then
-  JAVA_OPTS=$OUR_JAVA_OPTS
-else
-  JAVA_OPTS=-XX:MaxPermSize=128m $OUR_JAVA_OPTS
-fi
-JAVA_OPTS=$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM
-
-# Load extra JAVA_OPTS from conf/java-opts, if it exists
-if [ -e $SPARK_CONF_DIR/java-opts ] ; then
-  JAVA_OPTS=$JAVA_OPTS `cat $SPARK_CONF_DIR/java-opts`
-fi
-
-# Attention: when changing the way the JAVA_OPTS are assembled, the change 
must be reflected in CommandUtils.scala!
-
-TOOLS_DIR=$FWDIR/tools
-SPARK_TOOLS_JAR=
-if [ -e 
$TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then
-  # Use the JAR from the SBT build
-  export SPARK_TOOLS_JAR=`ls 
$TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar`
-fi
-if [ -e $TOOLS_DIR/target/spark-tools*[0-9Tg].jar ]; then
-  # Use the JAR from the Maven build
-  # TODO: this also needs to become an assembly!
-  export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/spark-tools*[0-9Tg].jar`
-fi
 
-# Compute classpath using external script
-classpath_output=$($FWDIR/bin/compute-classpath.sh)
-if [[ $? != 0 ]]; then
-  echo $classpath_output
-  exit 1
-else
-  CLASSPATH=$classpath_output
-fi
+# Look for the launcher. In non-release mode, add the compiled classes 
directly to the classpath
+# instead of looking for a jar file.
+SPARK_LAUNCHER_CP=
+if [ -f $SPARK_HOME/RELEASE ]; then
+  LAUNCHER_DIR=$SPARK_HOME/lib
+  num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc 
-l)
--- End diff --

While I understand what the article in the link you posted says, I fail to 
understand how it applies to this line. The grep command filters what we're 
looking for, and if you have things that would trigger the cases described in 
the article, they'd either be filtered out, or you could say you already have a 
busted build directory that doesn't match the expected.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079633
  
--- Diff: bin/spark-class ---
@@ -110,83 +39,48 @@ else
 exit 1
   fi
 fi
-JAVA_VERSION=$($RUNNER -version 21 | grep 'version' | sed 's/.* 
version \(.*\)\.\(.*\)\..*/\1\2/; 1q')
-
-# Set JAVA_OPTS to be able to load native libraries and to set heap size
-if [ $JAVA_VERSION -ge 18 ]; then
-  JAVA_OPTS=$OUR_JAVA_OPTS
-else
-  JAVA_OPTS=-XX:MaxPermSize=128m $OUR_JAVA_OPTS
-fi
-JAVA_OPTS=$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM
-
-# Load extra JAVA_OPTS from conf/java-opts, if it exists
-if [ -e $SPARK_CONF_DIR/java-opts ] ; then
-  JAVA_OPTS=$JAVA_OPTS `cat $SPARK_CONF_DIR/java-opts`
-fi
-
-# Attention: when changing the way the JAVA_OPTS are assembled, the change 
must be reflected in CommandUtils.scala!
-
-TOOLS_DIR=$FWDIR/tools
-SPARK_TOOLS_JAR=
-if [ -e 
$TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then
-  # Use the JAR from the SBT build
-  export SPARK_TOOLS_JAR=`ls 
$TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar`
-fi
-if [ -e $TOOLS_DIR/target/spark-tools*[0-9Tg].jar ]; then
-  # Use the JAR from the Maven build
-  # TODO: this also needs to become an assembly!
-  export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/spark-tools*[0-9Tg].jar`
-fi
 
-# Compute classpath using external script
-classpath_output=$($FWDIR/bin/compute-classpath.sh)
-if [[ $? != 0 ]]; then
-  echo $classpath_output
-  exit 1
-else
-  CLASSPATH=$classpath_output
-fi
+# Look for the launcher. In non-release mode, add the compiled classes 
directly to the classpath
+# instead of looking for a jar file.
+SPARK_LAUNCHER_CP=
+if [ -f $SPARK_HOME/RELEASE ]; then
+  LAUNCHER_DIR=$SPARK_HOME/lib
+  num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc 
-l)
+  if [ $num_jars -eq 0 -a -z $SPARK_LAUNCHER_CP ]; then
+echo Failed to find Spark launcher in $LAUNCHER_DIR. 12
+echo You need to build Spark before running this program. 12
+exit 1
+  fi
 
-if [[ $1 =~ org.apache.spark.tools.* ]]; then
-  if test -z $SPARK_TOOLS_JAR; then
-echo Failed to find Spark Tools Jar in 
$FWDIR/tools/target/scala-$SPARK_SCALA_VERSION/ 12
-echo You need to run \build/sbt tools/package\ before running $1. 
12
+  LAUNCHER_JARS=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ 
|| true)
--- End diff --

Same here. This is definitely a minor point relative to the scope of this 
patch, but I would prefer if we didn't use `ls` like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079837
  
--- Diff: bin/spark-shell ---
@@ -28,25 +28,24 @@ esac
 # Enter posix mode for bash
 set -o posix
 
-## Global script variables
-FWDIR=$(cd `dirname $0`/..; pwd)
+export FWDIR=$(cd `dirname $0`/..; pwd)
 
-function usage() {
+usage() {
+  if [ -n $1 ]; then
+echo $1
--- End diff --

Same as above. I'd prefer:

```
echo $1
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079752
  
--- Diff: bin/spark-class ---
@@ -110,83 +39,48 @@ else
 exit 1
   fi
 fi
-JAVA_VERSION=$($RUNNER -version 21 | grep 'version' | sed 's/.* 
version \(.*\)\.\(.*\)\..*/\1\2/; 1q')
-
-# Set JAVA_OPTS to be able to load native libraries and to set heap size
-if [ $JAVA_VERSION -ge 18 ]; then
-  JAVA_OPTS=$OUR_JAVA_OPTS
-else
-  JAVA_OPTS=-XX:MaxPermSize=128m $OUR_JAVA_OPTS
-fi
-JAVA_OPTS=$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM
-
-# Load extra JAVA_OPTS from conf/java-opts, if it exists
-if [ -e $SPARK_CONF_DIR/java-opts ] ; then
-  JAVA_OPTS=$JAVA_OPTS `cat $SPARK_CONF_DIR/java-opts`
-fi
-
-# Attention: when changing the way the JAVA_OPTS are assembled, the change 
must be reflected in CommandUtils.scala!
-
-TOOLS_DIR=$FWDIR/tools
-SPARK_TOOLS_JAR=
-if [ -e 
$TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then
-  # Use the JAR from the SBT build
-  export SPARK_TOOLS_JAR=`ls 
$TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar`
-fi
-if [ -e $TOOLS_DIR/target/spark-tools*[0-9Tg].jar ]; then
-  # Use the JAR from the Maven build
-  # TODO: this also needs to become an assembly!
-  export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/spark-tools*[0-9Tg].jar`
-fi
 
-# Compute classpath using external script
-classpath_output=$($FWDIR/bin/compute-classpath.sh)
-if [[ $? != 0 ]]; then
-  echo $classpath_output
-  exit 1
-else
-  CLASSPATH=$classpath_output
-fi
+# Look for the launcher. In non-release mode, add the compiled classes 
directly to the classpath
+# instead of looking for a jar file.
+SPARK_LAUNCHER_CP=
+if [ -f $SPARK_HOME/RELEASE ]; then
+  LAUNCHER_DIR=$SPARK_HOME/lib
+  num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc 
-l)
+  if [ $num_jars -eq 0 -a -z $SPARK_LAUNCHER_CP ]; then
+echo Failed to find Spark launcher in $LAUNCHER_DIR. 12
+echo You need to build Spark before running this program. 12
+exit 1
+  fi
 
-if [[ $1 =~ org.apache.spark.tools.* ]]; then
-  if test -z $SPARK_TOOLS_JAR; then
-echo Failed to find Spark Tools Jar in 
$FWDIR/tools/target/scala-$SPARK_SCALA_VERSION/ 12
-echo You need to run \build/sbt tools/package\ before running $1. 
12
+  LAUNCHER_JARS=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ 
|| true)
--- End diff --

Same explanation. The `grep` is there to perform the desired match.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26080472
  
--- Diff: sbin/spark-daemon.sh ---
@@ -121,45 +121,63 @@ if [ $SPARK_NICENESS =  ]; then
 export SPARK_NICENESS=0
 fi
 
+run_command() {
+  mode=$1
+  shift
 
-case $option in
-
-  (start|spark-submit)
-
-mkdir -p $SPARK_PID_DIR
+  mkdir -p $SPARK_PID_DIR
 
-if [ -f $pid ]; then
-  TARGET_ID=$(cat $pid)
-  if [[ $(ps -p $TARGET_ID -o args=) =~ $command ]]; then
-echo $command running as process $TARGET_ID.  Stop it first.
-exit 1
-  fi
+  if [ -f $pid ]; then
--- End diff --

```
if [ -f $pid ]; then
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26081000
  
--- Diff: bin/spark-class ---
@@ -110,83 +39,48 @@ else
 exit 1
   fi
 fi
-JAVA_VERSION=$($RUNNER -version 21 | grep 'version' | sed 's/.* 
version \(.*\)\.\(.*\)\..*/\1\2/; 1q')
-
-# Set JAVA_OPTS to be able to load native libraries and to set heap size
-if [ $JAVA_VERSION -ge 18 ]; then
-  JAVA_OPTS=$OUR_JAVA_OPTS
-else
-  JAVA_OPTS=-XX:MaxPermSize=128m $OUR_JAVA_OPTS
-fi
-JAVA_OPTS=$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM
-
-# Load extra JAVA_OPTS from conf/java-opts, if it exists
-if [ -e $SPARK_CONF_DIR/java-opts ] ; then
-  JAVA_OPTS=$JAVA_OPTS `cat $SPARK_CONF_DIR/java-opts`
-fi
-
-# Attention: when changing the way the JAVA_OPTS are assembled, the change 
must be reflected in CommandUtils.scala!
-
-TOOLS_DIR=$FWDIR/tools
-SPARK_TOOLS_JAR=
-if [ -e 
$TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then
-  # Use the JAR from the SBT build
-  export SPARK_TOOLS_JAR=`ls 
$TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar`
-fi
-if [ -e $TOOLS_DIR/target/spark-tools*[0-9Tg].jar ]; then
-  # Use the JAR from the Maven build
-  # TODO: this also needs to become an assembly!
-  export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/spark-tools*[0-9Tg].jar`
-fi
 
-# Compute classpath using external script
-classpath_output=$($FWDIR/bin/compute-classpath.sh)
-if [[ $? != 0 ]]; then
-  echo $classpath_output
-  exit 1
-else
-  CLASSPATH=$classpath_output
-fi
+# Look for the launcher. In non-release mode, add the compiled classes 
directly to the classpath
+# instead of looking for a jar file.
+SPARK_LAUNCHER_CP=
+if [ -f $SPARK_HOME/RELEASE ]; then
+  LAUNCHER_DIR=$SPARK_HOME/lib
+  num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc 
-l)
+  if [ $num_jars -eq 0 -a -z $SPARK_LAUNCHER_CP ]; then
+echo Failed to find Spark launcher in $LAUNCHER_DIR. 12
+echo You need to build Spark before running this program. 12
+exit 1
+  fi
 
-if [[ $1 =~ org.apache.spark.tools.* ]]; then
-  if test -z $SPARK_TOOLS_JAR; then
-echo Failed to find Spark Tools Jar in 
$FWDIR/tools/target/scala-$SPARK_SCALA_VERSION/ 12
-echo You need to run \build/sbt tools/package\ before running $1. 
12
+  LAUNCHER_JARS=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ 
|| true)
--- End diff --

Agreed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26080991
  
--- Diff: bin/spark-class ---
@@ -110,83 +39,48 @@ else
 exit 1
   fi
 fi
-JAVA_VERSION=$($RUNNER -version 21 | grep 'version' | sed 's/.* 
version \(.*\)\.\(.*\)\..*/\1\2/; 1q')
-
-# Set JAVA_OPTS to be able to load native libraries and to set heap size
-if [ $JAVA_VERSION -ge 18 ]; then
-  JAVA_OPTS=$OUR_JAVA_OPTS
-else
-  JAVA_OPTS=-XX:MaxPermSize=128m $OUR_JAVA_OPTS
-fi
-JAVA_OPTS=$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM
-
-# Load extra JAVA_OPTS from conf/java-opts, if it exists
-if [ -e $SPARK_CONF_DIR/java-opts ] ; then
-  JAVA_OPTS=$JAVA_OPTS `cat $SPARK_CONF_DIR/java-opts`
-fi
-
-# Attention: when changing the way the JAVA_OPTS are assembled, the change 
must be reflected in CommandUtils.scala!
-
-TOOLS_DIR=$FWDIR/tools
-SPARK_TOOLS_JAR=
-if [ -e 
$TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar ]; then
-  # Use the JAR from the SBT build
-  export SPARK_TOOLS_JAR=`ls 
$TOOLS_DIR/target/scala-$SPARK_SCALA_VERSION/spark-tools*[0-9Tg].jar`
-fi
-if [ -e $TOOLS_DIR/target/spark-tools*[0-9Tg].jar ]; then
-  # Use the JAR from the Maven build
-  # TODO: this also needs to become an assembly!
-  export SPARK_TOOLS_JAR=`ls $TOOLS_DIR/target/spark-tools*[0-9Tg].jar`
-fi
 
-# Compute classpath using external script
-classpath_output=$($FWDIR/bin/compute-classpath.sh)
-if [[ $? != 0 ]]; then
-  echo $classpath_output
-  exit 1
-else
-  CLASSPATH=$classpath_output
-fi
+# Look for the launcher. In non-release mode, add the compiled classes 
directly to the classpath
+# instead of looking for a jar file.
+SPARK_LAUNCHER_CP=
+if [ -f $SPARK_HOME/RELEASE ]; then
+  LAUNCHER_DIR=$SPARK_HOME/lib
+  num_jars=$(ls -1 $LAUNCHER_DIR | grep ^spark-launcher.*\.jar$ | wc 
-l)
--- End diff --

That makes sense. Using `ls` in a general context and using it here where 
the world is very limited in what we expect are different things, so I agree. 
It's fine to leave things at busted build directory if things really go wrong 
with `ls`'s output.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079852
  
--- Diff: bin/spark-shell ---
@@ -28,25 +28,24 @@ esac
 # Enter posix mode for bash
 set -o posix
 
-## Global script variables
-FWDIR=$(cd `dirname $0`/..; pwd)
+export FWDIR=$(cd `dirname $0`/..; pwd)
 
-function usage() {
+usage() {
+  if [ -n $1 ]; then
+echo $1
+  fi
   echo Usage: ./bin/spark-shell [options]
   $FWDIR/bin/spark-submit --help 21 | grep -v Usage 12
-  exit 0
+  exit $2
--- End diff --

```
exit $2
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26081238
  
--- Diff: sbin/start-thriftserver.sh ---
@@ -52,4 +52,4 @@ fi
 
 export SUBMIT_USAGE_FUNCTION=usage
 
-exec $FWDIR/sbin/spark-daemon.sh spark-submit $CLASS 1 $@
+exec $FWDIR/sbin/spark-daemon.sh submit $CLASS 1 $@
--- End diff --

```
$CLASS
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26078579
  
--- Diff: bin/pyspark ---
@@ -18,36 +18,24 @@
 #
 
 # Figure out where Spark is installed
-FWDIR=$(cd `dirname $0`/..; pwd)
+export SPARK_HOME=$(cd `dirname $0`/..; pwd)
 
-# Export this as SPARK_HOME
-export SPARK_HOME=$FWDIR
-
-source $FWDIR/bin/utils.sh
-
-source $FWDIR/bin/load-spark-env.sh
+source $SPARK_HOME/bin/load-spark-env.sh
 
 function usage() {
+  if [ -n $1 ]; then
+echo $1
--- End diff --

Word splitting with unintended side effects is possible here.

I suggest changing this to:

```
echo $1
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26079464
  
--- Diff: bin/pyspark2.cmd ---
@@ -17,59 +17,22 @@ rem See the License for the specific language governing 
permissions and
 rem limitations under the License.
 rem
 
-set SCALA_VERSION=2.10
-
 rem Figure out where the Spark framework is installed
-set FWDIR=%~dp0..\
-
-rem Export this as SPARK_HOME
-set SPARK_HOME=%FWDIR%
-
-rem Test whether the user has built Spark
-if exist %FWDIR%RELEASE goto skip_build_test
-set FOUND_JAR=0
-for %%d in 
(%FWDIR%assembly\target\scala-%SCALA_VERSION%\spark-assembly*hadoop*.jar) do (
-  set FOUND_JAR=1
-)
-if [%FOUND_JAR%] == [0] (
-  echo Failed to find Spark assembly JAR.
-  echo You need to build Spark before running this program.
-  goto exit
-)
-:skip_build_test
+set SPARK_HOME=%~dp0..
 
 rem Load environment variables from conf\spark-env.cmd, if it exists
-if exist %FWDIR%conf\spark-env.cmd call %FWDIR%conf\spark-env.cmd
+if exist %SPARK_HOME%\conf\spark-env.cmd call 
%SPARK_HOME%\conf\spark-env.cmd
 
 rem Figure out which Python to use.
-if [%PYSPARK_PYTHON%] == [] set PYSPARK_PYTHON=python
+if x%PYSPARK_DRIVER_PYTHON%==x (
+  set PYSPARK_DRIVER_PYTHON=python
+  if not [%PYSPARK_PYTHON%] == [] set 
PYSPARK_DRIVER_PYTHON=%PYSPARK_PYTHON%
+)
 
-set PYTHONPATH=%FWDIR%python;%PYTHONPATH%
-set PYTHONPATH=%FWDIR%python\lib\py4j-0.8.2.1-src.zip;%PYTHONPATH%
+set PYTHONPATH=%SPARK_HOME%\python;%PYTHONPATH%
+set PYTHONPATH=%SPARK_HOME%\python\lib\py4j-0.8.2.1-src.zip;%PYTHONPATH%
 
 set OLD_PYTHONSTARTUP=%PYTHONSTARTUP%
-set PYTHONSTARTUP=%FWDIR%python\pyspark\shell.py
-set PYSPARK_SUBMIT_ARGS=%*
-
-echo Running %PYSPARK_PYTHON% with PYTHONPATH=%PYTHONPATH%
-
-rem Check whether the argument is a file
-for /f %%i in ('echo %1^| findstr /R \.py') do (
-  set PYTHON_FILE=%%i
-)
-
-if [%PYTHON_FILE%] == [] (
-  if [%IPYTHON%] == [1] (
-   ipython %IPYTHON_OPTS%
-  ) else (
-   %PYSPARK_PYTHON%
-  ) 
-) else (
-  echo.
-  echo WARNING: Running python applications through ./bin/pyspark.cmd is 
deprecated as of Spark 1.0.
-  echo Use ./bin/spark-submit ^python file^
-  echo.
-  %FWDIR%\bin\spark-submit.cmd %PYSPARK_SUBMIT_ARGS%
-)
+set PYTHONSTARTUP=%SPARK_HOME%\python\pyspark\shell.py
--- End diff --

Windows has all sorts of weird rules about spaces, they have absolutely no 
correlation to what happens in the bash world. This line works correctly as is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread nchammas
Github user nchammas commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26081219
  
--- Diff: sbin/spark-daemon.sh ---
@@ -121,45 +121,63 @@ if [ $SPARK_NICENESS =  ]; then
 export SPARK_NICENESS=0
 fi
 
+run_command() {
+  mode=$1
+  shift
 
-case $option in
-
-  (start|spark-submit)
-
-mkdir -p $SPARK_PID_DIR
+  mkdir -p $SPARK_PID_DIR
 
-if [ -f $pid ]; then
-  TARGET_ID=$(cat $pid)
-  if [[ $(ps -p $TARGET_ID -o args=) =~ $command ]]; then
-echo $command running as process $TARGET_ID.  Stop it first.
-exit 1
-  fi
+  if [ -f $pid ]; then
+TARGET_ID=$(cat $pid)
+if [[ $(ps -p $TARGET_ID -o args=) =~ $command ]]; then
+  echo $command running as process $TARGET_ID.  Stop it first.
+  exit 1
 fi
+  fi
 
-if [ $SPARK_MASTER !=  ]; then
-  echo rsync from $SPARK_MASTER
-  rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' 
--exclude='contrib/hod/logs/*' $SPARK_MASTER/ $SPARK_HOME
-fi
+  if [ $SPARK_MASTER !=  ]; then
+echo rsync from $SPARK_MASTER
+rsync -a -e ssh --delete --exclude=.svn --exclude='logs/*' 
--exclude='contrib/hod/logs/*' $SPARK_MASTER/ $SPARK_HOME
+  fi
 
-spark_rotate_log $log
-echo starting $command, logging to $log
-if [ $option == spark-submit ]; then
-  source $SPARK_HOME/bin/utils.sh
-  gatherSparkSubmitOpts $@
-  nohup nice -n $SPARK_NICENESS $SPARK_PREFIX/bin/spark-submit 
--class $command \
-${SUBMISSION_OPTS[@]} spark-internal ${APPLICATION_OPTS[@]}  
$log 21  /dev/null 
-else
+  spark_rotate_log $log
+  echo starting $command, logging to $log
+
+  case $mode in
+(class)
   nohup nice -n $SPARK_NICENESS $SPARK_PREFIX/bin/spark-class 
$command $@  $log 21  /dev/null 
-fi
-newpid=$!
-echo $newpid  $pid
-sleep 2
-# Check if the process has died; in that case we'll tail the log so 
the user can see
-if [[ ! $(ps -p $newpid -o args=) =~ $command ]]; then
-  echo failed to launch $command:
-  tail -2 $log | sed 's/^/  /'
-  echo full log in $log
-fi
+  newpid=$!
+  ;;
+
+(submit)
+  nohup nice -n $SPARK_NICENESS $SPARK_PREFIX/bin/spark-submit 
--class $command $@  $log 21  /dev/null 
--- End diff --

For consistency, and out of paranoia, I suggest for this block:

```
$SPARK_NICENESS
$!
$newpid
$pid
```

And so forth.

I know most of these variables are never supposed to contain a space, but 
this counts as good Bash hygiene, as annoying as it is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77936105
  
  [Test build #28401 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28401/consoleFull)
 for   PR 3916 at commit 
[`897141f`](https://github.com/apache/spark/commit/897141fa7bc521f95f42729fca89467b46fbf412).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77935980
  
  [Test build #28401 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28401/consoleFull)
 for   PR 3916 at commit 
[`897141f`](https://github.com/apache/spark/commit/897141fa7bc521f95f42729fca89467b46fbf412).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77937027
  
  [Test build #28403 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28403/consoleFull)
 for   PR 3916 at commit 
[`a1b8af1`](https://github.com/apache/spark/commit/a1b8af1830fc9fada698098a235691e12bd519dd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77937171
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28403/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77937990
  
  [Test build #28404 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28404/consoleFull)
 for   PR 3916 at commit 
[`3b28a75`](https://github.com/apache/spark/commit/3b28a7537ed184c88994be176fa1c66d47b43a8e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26065985
  
--- Diff: launcher/src/main/java/org/apache/spark/launcher/Main.java ---
@@ -0,0 +1,173 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.spark.launcher.CommandBuilderUtils.*;
+
+/**
+ * Command line interface for the Spark launcher. Used internally by Spark 
scripts.
+ */
+class Main {
+
+  /**
+   * Usage: Main [class] [class args]
+   * p/
+   * This CLI works in two different modes:
+   * ul
+   *   lispark-submit: if iclass/i is 
org.apache.spark.deploy.SparkSubmit, the
+   *   {@link SparkLauncher} class is used to launch a Spark 
application./li
+   *   lispark-class: if another class is provided, an internal Spark 
class is run./li
+   * /ul
+   *
+   * This class works in tandem with the bin/spark-class script on 
Unix-like systems, and
+   * bin/spark-class2.cmd batch script on Windows to execute the final 
command.
+   * p/
+   * On Unix-like systems, the output is a list of command arguments, 
separated by the NULL
+   * character. On Windows, the output is single command line suitable for 
direct execution
+   * form the script.
--- End diff --

from


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77936109
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28401/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26067056
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java 
---
@@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+
+import static org.apache.spark.launcher.CommandBuilderUtils.*;
+
+/**
+ * Special launcher for handling a CLI invocation of SparkSubmit.
+ * p/
+ * This launcher extends SparkLauncher to add command line parsing 
compatible with
--- End diff --

I'm not sure this comment is correct. I think a `SparkLauncher` 
encapsulates a `SparkSubmitCommandBuilder` rather than the latter extending the 
former.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r26067102
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java 
---
@@ -0,0 +1,327 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+
+import static org.apache.spark.launcher.CommandBuilderUtils.*;
+
+/**
+ * Special launcher for handling a CLI invocation of SparkSubmit.
--- End diff --

I don't think this is still a sub type of `Launcher`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77937167
  
  [Test build #28403 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28403/consoleFull)
 for   PR 3916 at commit 
[`a1b8af1`](https://github.com/apache/spark/commit/a1b8af1830fc9fada698098a235691e12bd519dd).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77970428
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28407/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77970418
  
  [Test build #28407 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28407/consoleFull)
 for   PR 3916 at commit 
[`2ce741f`](https://github.com/apache/spark/commit/2ce741ff59c17f64ffea1a0fd91eb922e4e87204).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77958210
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28404/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77957236
  
  [Test build #28407 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28407/consoleFull)
 for   PR 3916 at commit 
[`2ce741f`](https://github.com/apache/spark/commit/2ce741ff59c17f64ffea1a0fd91eb922e4e87204).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-09 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77958202
  
  [Test build #28404 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28404/consoleFull)
 for   PR 3916 at commit 
[`3b28a75`](https://github.com/apache/spark/commit/3b28a7537ed184c88994be176fa1c66d47b43a8e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-04 Thread nchammas
Github user nchammas commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-77180148
  
FYI: There is [a question on 
SO](http://stackoverflow.com/q/28841940/877069), I believe, about the type of 
functionality being added in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76849555
  
  [Test build #28200 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28200/consoleFull)
 for   PR 3916 at commit 
[`28cd35e`](https://github.com/apache/spark/commit/28cd35eb7d7c065f679ef4749e599ad4d31ee5cf).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76875689
  
  [Test build #28205 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28205/consoleFull)
 for   PR 3916 at commit 
[`e2367d2`](https://github.com/apache/spark/commit/e2367d26eab2601e5256da2169ea6ad0c12d959d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76875697
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28205/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76864166
  
  [Test build #28205 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28205/consoleFull)
 for   PR 3916 at commit 
[`e2367d2`](https://github.com/apache/spark/commit/e2367d26eab2601e5256da2169ea6ad0c12d959d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-02 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76862929
  
  [Test build #28200 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28200/consoleFull)
 for   PR 3916 at commit 
[`28cd35e`](https://github.com/apache/spark/commit/28cd35eb7d7c065f679ef4749e599ad4d31ee5cf).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-03-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76862941
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28200/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76293224
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28016/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-26 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76188802
  
Sorry I haven't had time to look at this in detail, one thing I was looking 
for though was some documentation for the high level api.  I realize java doc 
will document most of it, but are we going to add some documentation to say  
this is available and point users to javadoc?  Perhaps to deploy guide.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-26 Thread tgravescs
Github user tgravescs commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76230854
  
I'm ok with it being separate, although in general I think its bad 
practice. I've seen some features go in without documentation which I think as 
a user is a major pain. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-26 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76227779
  
@tgravescs that can definitely be added, but I'd rather do it separately at 
this point. Also, regarding javadoc, I'm not sure how it's generally published 
for Spark: the unidocs generated from java classes seem to all be empty 
(while scala documentation is fine, and separately the generated javadocs are 
also fine). Is the unidoc stuff broken for java code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76275450
  
  [Test build #28016 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28016/consoleFull)
 for   PR 3916 at commit 
[`5f4ddcc`](https://github.com/apache/spark/commit/5f4ddccbcd15e712cbd588ee958c6302ce95e352).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76276321
  
  [Test build #28017 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28017/consoleFull)
 for   PR 3916 at commit 
[`00505f9`](https://github.com/apache/spark/commit/00505f99ca5b1a7ee8d7955bde54f7c1fcf27689).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76278354
  
  [Test build #28018 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28018/consoleFull)
 for   PR 3916 at commit 
[`b1d86b0`](https://github.com/apache/spark/commit/b1d86b00fba76d3e700fc5e8319f071139d3156d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76296105
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28017/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76296090
  
  [Test build #28017 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28017/consoleFull)
 for   PR 3916 at commit 
[`00505f9`](https://github.com/apache/spark/commit/00505f99ca5b1a7ee8d7955bde54f7c1fcf27689).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76296505
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/28018/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76296499
  
  [Test build #28018 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28018/consoleFull)
 for   PR 3916 at commit 
[`b1d86b0`](https://github.com/apache/spark/commit/b1d86b00fba76d3e700fc5e8319f071139d3156d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76293215
  
  [Test build #28016 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/28016/consoleFull)
 for   PR 3916 at commit 
[`5f4ddcc`](https://github.com/apache/spark/commit/5f4ddccbcd15e712cbd588ee958c6302ce95e352).
 * This patch **passes all tests**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76054123
  
  [Test build #27958 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27958/consoleFull)
 for   PR 3916 at commit 
[`7e66c18`](https://github.com/apache/spark/commit/7e66c18bad6e672c60ec995bd968b3a7db32ee73).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25394351
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitOptionParser.java 
---
@@ -0,0 +1,226 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.util.List;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+/**
+ * Parser for spark-submit command line options.
+ * /p
+ * This class, although public, is not designed to be used outside of 
Spark.
--- End diff --

wait, is it still public? I thought it's package private now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25396504
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java 
---
@@ -0,0 +1,335 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+
+import static org.apache.spark.launcher.CommandBuilderUtils.*;
+
+/**
+ * Special launcher for handling a CLI invocation of SparkSubmit.
+ * p/
+ * This launcher extends SparkLauncher to add command line parsing 
compatible with
+ * SparkSubmit. It handles setting driver-side options and special parsing 
needed
+ * for the different specialClasses.
+ * p/
+ * This class has also some special features to aid PySparkLauncher.
+ */
+class SparkSubmitCommandBuilder extends AbstractCommandBuilder {
+
+  /**
+   * Name of the app resource used to identify the PySpark shell. The 
command line parser expects
+   * the resource name to be the very first argument to spark-submit in 
this case.
+   *
+   * NOTE: this cannot be pyspark-shell since that identifies the 
PySpark shell to SparkSubmit
+   * (see java_gateway.py), and can cause this code to enter into an 
infinite loop.
+   */
+  static final String PYSPARK_SHELL = pyspark-shell-main;
+
+  /**
+   * This is the actual resource name that identifies the PySpark shell to 
SparkSubmit.
+   */
+  static final String PYSPARK_SHELL_RESOURCE = pyspark-shell;
+
+  /**
+   * This map must match the class names for available special classes, 
since this modifies the way
+   * command line parsing works. This maps the class name to the resource 
to use when calling
+   * spark-submit.
+   */
+  private static final MapString, String specialClasses = new 
HashMapString, String();
+  static {
+specialClasses.put(org.apache.spark.repl.Main, spark-shell);
+
specialClasses.put(org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver,
+  spark-internal);
+  }
+
+  private final ListString sparkArgs;
+  private boolean hasMixedArguments;
+
+  SparkSubmitCommandBuilder() {
+this.sparkArgs = new ArrayListString();
+  }
+
+  SparkSubmitCommandBuilder(ListString args) {
+this(false, args);
+  }
+
+  SparkSubmitCommandBuilder(boolean hasMixedArguments, ListString args) {
+this();
+ListString submitArgs = args;
+if (args.size()  0  args.get(0).equals(PYSPARK_SHELL)) {
+  this.hasMixedArguments = true;
+  appResource = PYSPARK_SHELL_RESOURCE;
+  submitArgs = args.subList(1, args.size());
+} else {
+  this.hasMixedArguments = hasMixedArguments;
+}
+
+new OptionParser().parse(submitArgs);
+  }
+
+  @Override
+  public ListString buildCommand(MapString, String env) throws 
IOException {
+if (PYSPARK_SHELL_RESOURCE.equals(appResource)) {
+  return buildPySparkShellCommand(env);
+} else {
+  return buildSparkSubmitCommand(env);
+}
+  }
+
+  ListString buildSparkSubmitArgs() {
+ListString args = new ArrayListString();
+SparkSubmitOptionParser parser = new SparkSubmitOptionParser();
+
+if (verbose) {
+  args.add(parser.VERBOSE);
+}
+
+if (master != null) {
+  args.add(parser.MASTER);
+  args.add(master);
+}
+
+if (deployMode != null) {
+  args.add(parser.DEPLOY_MODE);
+  args.add(deployMode);
+}
+
+if (appName != null) {
+  args.add(parser.NAME);
+  args.add(appName);
+}
+
+for (Map.EntryString, String e : conf.entrySet()) {
+  

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25400181
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java ---
@@ -0,0 +1,359 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileFilter;
+import java.io.FileInputStream;
+import java.io.InputStreamReader;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.jar.JarFile;
+import java.util.regex.Pattern;
+
+import static org.apache.spark.launcher.CommandBuilderUtils.*;
+
+/**
+ * Abstract command builder that defines common functionality for all 
builders.
+ */
+abstract class AbstractCommandBuilder {
+
+  boolean verbose;
+  String appName;
+  String appResource;
+  String deployMode;
+  String javaHome;
+  String mainClass;
+  String master;
+  String propertiesFile;
+  final ListString appArgs;
+  final ListString jars;
+  final ListString files;
+  final ListString pyFiles;
+  final MapString, String childEnv;
+  final MapString, String conf;
+
+  public AbstractCommandBuilder() {
+this(Collections.String, StringemptyMap());
+  }
+
+  public AbstractCommandBuilder(MapString, String env) {
+this.appArgs = new ArrayListString();
+this.childEnv = new HashMapString, String(env);
+this.conf = new HashMapString, String();
+this.files = new ArrayListString();
+this.jars = new ArrayListString();
+this.pyFiles = new ArrayListString();
+  }
+
+  /**
+   * Builds the command like to execute.
+   *
+   * @param env A map containing environment variables for the child 
process. It may already contain
+   *entries defined by the user (such as SPARK_HOME, or those 
defined by the
+   *SparkLauncher constructor that takes an environment), and 
may be modified to
+   *include other variables needed by the process to be 
executed.
+   */
+  abstract ListString buildCommand(MapString, String env) throws 
IOException;
+
+  ListString buildJavaCommand(String extraClassPath) throws IOException {
+ListString cmd = new ArrayListString();
+if (javaHome == null) {
+  cmd.add(join(File.separator, System.getProperty(java.home), bin, 
java));
+} else {
+  cmd.add(join(File.separator, javaHome, bin, java));
+}
+
+// Load extra JAVA_OPTS from conf/java-opts, if it exists.
+File javaOpts = new File(join(File.separator, getConfDir(), 
java-opts));
+if (javaOpts.isFile()) {
+  BufferedReader br = new BufferedReader(new InputStreamReader(
+  new FileInputStream(javaOpts), UTF-8));
+  try {
+String line;
+while ((line = br.readLine()) != null) {
+  addOptionString(cmd, line);
+}
+  } finally {
+br.close();
+  }
+}
+
+cmd.add(-cp);
+cmd.add(join(File.pathSeparator, buildClassPath(extraClassPath)));
+return cmd;
+  }
+
+  /**
+   * Adds the default perm gen size option for Spark if the VM requires it 
and the user hasn't
+   * set it.
+   */
+  void addPermGenSizeOpt(ListString cmd) {
+// Don't set MaxPermSize for Java 8 and later.
+String[] version = System.getProperty(java.version).split(\\.);
+if (Integer.parseInt(version[0])  1 || Integer.parseInt(version[1])  
7) {
+  return;
+}
+
+for (String arg : cmd) {
+  if (arg.startsWith(-XX:MaxPermSize=)) {
+return;
+  }
+}
+
+cmd.add(-XX:MaxPermSize=128m);
+  }

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25400174
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java ---
@@ -0,0 +1,228 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Helper methods for command builders.
+ */
+class CommandBuilderUtils {
+
+  static final String DEFAULT_MEM = 512m;
+  static final String DEFAULT_PROPERTIES_FILE = spark-defaults.conf;
+  static final String ENV_SPARK_HOME = SPARK_HOME;
+
+  /** Returns whether the given string is null or empty. */
+  static boolean isEmpty(String s) {
+return s == null || s.isEmpty();
+  }
+
+  /** Joins a list of strings using the given separator. */
+  static String join(String sep, String... elements) {
+StringBuilder sb = new StringBuilder();
+for (String e : elements) {
--- End diff --

I see. I would just call it on `Arrays.asList(elements)` to avoid the 
duplicate code. Not a big deal if you leave this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25400811
  
--- Diff: bin/pyspark ---
@@ -95,26 +80,13 @@ export 
PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH
 
 # Load the PySpark shell.py script when ./pyspark is used interactively:
 export OLD_PYTHONSTARTUP=$PYTHONSTARTUP
-export PYTHONSTARTUP=$FWDIR/python/pyspark/shell.py
-
-# Build up arguments list manually to preserve quotes and backslashes.
-# We export Spark submit arguments as an environment variable because 
shell.py must run as a
-# PYTHONSTARTUP script, which does not take in arguments. This is required 
for IPython notebooks.
-SUBMIT_USAGE_FUNCTION=usage
-gatherSparkSubmitOpts $@
-PYSPARK_SUBMIT_ARGS=
-whitespace=[[:space:]]
-for i in ${SUBMISSION_OPTS[@]}; do
-  if [[ $i =~ \ ]]; then i=$(echo $i | sed 's/\/\\\/g'); fi
-  if [[ $i =~ $whitespace ]]; then i=\$i\; fi
-  PYSPARK_SUBMIT_ARGS=$PYSPARK_SUBMIT_ARGS $i
-done
-export PYSPARK_SUBMIT_ARGS
+export PYTHONSTARTUP=$SPARK_HOME/python/pyspark/shell.py
 
 # For pyspark tests
 if [[ -n $SPARK_TESTING ]]; then
   unset YARN_CONF_DIR
   unset HADOOP_CONF_DIR
+  export PYSPARK_SUBMIT_ARGS=pyspark-shell
--- End diff --

why do we need to do this? Don't we just overwrite this in 
`SparkSubmitCommandBuilder` anyway?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76114272
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76114251
  
@vanzin @pwendell As of the latest changes I think this PR is almost ready 
to be merged. An obvious blocker is that the `SparkLauncher` needs to also 
handle Windows, but we can fix that easily.

One bigger thing I wanted to clarify is the following: The scope of the 
public API this PR currently exposes is narrowed to the extent that 
`SparkLauncher` calls a particular distribution's `spark-submit`. IIUC the only 
backward compatibility requirement this entails is that we don't remove or 
change any of the fields that the `SparkLauncher` sets (e.g. version, master, 
app name). Is this correct? This seems like a narrow enough interface to keep 
unchanged in future versions of Spark such that merging this will be safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76115100
  
  [Test build #27971 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27971/consoleFull)
 for   PR 3916 at commit 
[`4c19196`](https://github.com/apache/spark/commit/4c19196f495813dc2dcd6116b5831932ec406803).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76115110
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27971/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76107021
  
  [Test build #27972 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27972/consoleFull)
 for   PR 3916 at commit 
[`6184c07`](https://github.com/apache/spark/commit/6184c07d1f7ddb9bf74e898155f28340b7a33f53).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25399842
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java 
---
@@ -0,0 +1,335 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+
+import static org.apache.spark.launcher.CommandBuilderUtils.*;
+
+/**
+ * Special launcher for handling a CLI invocation of SparkSubmit.
+ * p/
+ * This launcher extends SparkLauncher to add command line parsing 
compatible with
+ * SparkSubmit. It handles setting driver-side options and special parsing 
needed
+ * for the different specialClasses.
+ * p/
+ * This class has also some special features to aid PySparkLauncher.
+ */
+class SparkSubmitCommandBuilder extends AbstractCommandBuilder {
+
+  /**
+   * Name of the app resource used to identify the PySpark shell. The 
command line parser expects
+   * the resource name to be the very first argument to spark-submit in 
this case.
+   *
+   * NOTE: this cannot be pyspark-shell since that identifies the 
PySpark shell to SparkSubmit
+   * (see java_gateway.py), and can cause this code to enter into an 
infinite loop.
+   */
+  static final String PYSPARK_SHELL = pyspark-shell-main;
--- End diff --

I wonder whether we should just stop supporting running python applications 
through `bin/pyspark`. It's been deprecated since 1.0 and that will already 
have been 4 minor releases by the time this is merged. If we just stop 
supporting this then we can simplify this convoluted logic a great deal. Maybe 
@pwendell has an opinion one way or the other.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25400927
  
--- Diff: 
core/src/main/scala/org/apache/spark/launcher/WorkerCommandBuilder.scala ---
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher
+
+import java.io.File
+import java.util.{HashMap = JHashMap, List = JList, Map = JMap}
+
+import scala.collection.JavaConversions._
+
+import org.apache.spark.deploy.Command
+
+/**
+ * This class is used by CommandUtils. It uses some package-private APIs 
in SparkLauncher, and since
+ * Java doesn't have a feature similar to `private[spark]`, and we don't 
want that class to be
+ * public, needs to live in the same package as the rest of the library.
+ */
+private[spark] class WorkerCommandBuilder(sparkHome: String, memoryMb: 
Int, command: Command)
+extends AbstractCommandBuilder {
--- End diff --

unindent


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25401107
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java ---
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.spark.launcher.CommandBuilderUtils.*;
+
+/**
+ * Launcher for Spark applications.
+ * p/
+ * Use this class to start Spark applications programmatically. The class 
uses a builder pattern
+ * to allow clients to configure the Spark application and launch it as a 
child process.
+ * p/
+ * Note that launching Spark applications using this class will not 
automatically load environment
+ * variables from the spark-env.sh or spark-env.cmd scripts in the 
configuration directory.
+ */
+public class SparkLauncher {
+
+  /** The Spark master. */
+  public static final String SPARK_MASTER = spark.master;
+
+  /** Configuration key for the driver memory. */
+  public static final String DRIVER_MEMORY = spark.driver.memory;
+  /** Configuration key for the driver class path. */
+  public static final String DRIVER_EXTRA_CLASSPATH = 
spark.driver.extraClassPath;
+  /** Configuration key for the driver VM options. */
+  public static final String DRIVER_EXTRA_JAVA_OPTIONS = 
spark.driver.extraJavaOptions;
+  /** Configuration key for the driver native library path. */
+  public static final String DRIVER_EXTRA_LIBRARY_PATH = 
spark.driver.extraLibraryPath;
+
+  /** Configuration key for the executor memory. */
+  public static final String EXECUTOR_MEMORY = spark.executor.memory;
+  /** Configuration key for the executor class path. */
+  public static final String EXECUTOR_EXTRA_CLASSPATH = 
spark.executor.extraClassPath;
+  /** Configuration key for the executor VM options. */
+  public static final String EXECUTOR_EXTRA_JAVA_OPTIONS = 
spark.executor.extraJavaOptions;
+  /** Configuration key for the executor native library path. */
+  public static final String EXECUTOR_EXTRA_LIBRARY_PATH = 
spark.executor.extraLibraryOptions;
+  /** Configuration key for the number of executor CPU cores. */
+  public static final String EXECUTOR_CORES = spark.executor.cores;
+
+  private final SparkSubmitCommandBuilder builder;
+
+  public SparkLauncher() {
+this(null);
+  }
+
+  /**
+   * Creates a launcher that will set the given environment variables in 
the child.
+   *
+   * @param env Environment variables to set.
+   */
+  public SparkLauncher(MapString, String env) {
+this.builder = new SparkSubmitCommandBuilder();
+if (env != null) {
+  this.builder.childEnv.putAll(env);
+}
+  }
+
+  /**
+   * Set a custom JAVA_HOME for launching the Spark application.
+   *
+   * @param javaHome Path to the JAVA_HOME to use.
+   * @return This launcher.
+   */
+  public SparkLauncher setJavaHome(String javaHome) {
+checkNotNull(javaHome, javaHome);
+builder.javaHome = javaHome;
+return this;
+  }
+
+  /**
+   * Set a custom Spark installation location for the application.
+   *
+   * @param sparkHome Path to the Spark installation to use.
+   * @return This launcher.
+   */
+  public SparkLauncher setSparkHome(String sparkHome) {
+checkNotNull(sparkHome, sparkHome);
+builder.childEnv.put(ENV_SPARK_HOME, sparkHome);
+return this;
+  }
+
+  /**
+   * Set a custom properties file with Spark configuration for the 
application.
+   *
+   * @param path Path to custom properties file to use.
+   * @return This launcher.
+   */
+  public SparkLauncher setPropertiesFile(String path) {
+checkNotNull(path, path);
+

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76113760
  
  [Test build #27972 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27972/consoleFull)
 for   PR 3916 at commit 
[`6184c07`](https://github.com/apache/spark/commit/6184c07d1f7ddb9bf74e898155f28340b7a33f53).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76113766
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27972/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76114901
  
`SparkLauncher` works on Windows. Did you see anything that indicates 
otherwise? I even managed to get all unit tests under launcher/ to pass on 
Windows (although I didn't test after the latest changes).

The only things that need to be kept for compatibility going forward are:
- SparkLauncher's public API. You can change the internal fields all you 
want.
- spark-submit's public API. It needs to keep accepting command lines the 
way it does today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25399597
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java 
---
@@ -0,0 +1,335 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+
+import static org.apache.spark.launcher.CommandBuilderUtils.*;
+
+/**
+ * Special launcher for handling a CLI invocation of SparkSubmit.
+ * p/
+ * This launcher extends SparkLauncher to add command line parsing 
compatible with
+ * SparkSubmit. It handles setting driver-side options and special parsing 
needed
+ * for the different specialClasses.
+ * p/
+ * This class has also some special features to aid PySparkLauncher.
+ */
+class SparkSubmitCommandBuilder extends AbstractCommandBuilder {
+
+  /**
+   * Name of the app resource used to identify the PySpark shell. The 
command line parser expects
+   * the resource name to be the very first argument to spark-submit in 
this case.
+   *
+   * NOTE: this cannot be pyspark-shell since that identifies the 
PySpark shell to SparkSubmit
+   * (see java_gateway.py), and can cause this code to enter into an 
infinite loop.
+   */
+  static final String PYSPARK_SHELL = pyspark-shell-main;
+
+  /**
+   * This is the actual resource name that identifies the PySpark shell to 
SparkSubmit.
+   */
+  static final String PYSPARK_SHELL_RESOURCE = pyspark-shell;
+
+  /**
+   * This map must match the class names for available special classes, 
since this modifies the way
+   * command line parsing works. This maps the class name to the resource 
to use when calling
+   * spark-submit.
+   */
+  private static final MapString, String specialClasses = new 
HashMapString, String();
+  static {
+specialClasses.put(org.apache.spark.repl.Main, spark-shell);
+
specialClasses.put(org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver,
+  spark-internal);
+  }
+
+  private final ListString sparkArgs;
+  private boolean hasMixedArguments;
+
+  SparkSubmitCommandBuilder() {
+this.sparkArgs = new ArrayListString();
+  }
+
+  SparkSubmitCommandBuilder(ListString args) {
+this(false, args);
+  }
+
+  SparkSubmitCommandBuilder(boolean hasMixedArguments, ListString args) {
+this();
+ListString submitArgs = args;
+if (args.size()  0  args.get(0).equals(PYSPARK_SHELL)) {
+  this.hasMixedArguments = true;
+  appResource = PYSPARK_SHELL_RESOURCE;
+  submitArgs = args.subList(1, args.size());
+} else {
+  this.hasMixedArguments = hasMixedArguments;
+}
+
+new OptionParser().parse(submitArgs);
+  }
+
+  @Override
+  public ListString buildCommand(MapString, String env) throws 
IOException {
+if (PYSPARK_SHELL_RESOURCE.equals(appResource)) {
+  return buildPySparkShellCommand(env);
+} else {
+  return buildSparkSubmitCommand(env);
+}
+  }
+
+  ListString buildSparkSubmitArgs() {
+ListString args = new ArrayListString();
+SparkSubmitOptionParser parser = new SparkSubmitOptionParser();
+
+if (verbose) {
+  args.add(parser.VERBOSE);
+}
+
+if (master != null) {
+  args.add(parser.MASTER);
+  args.add(master);
+}
+
+if (deployMode != null) {
+  args.add(parser.DEPLOY_MODE);
+  args.add(deployMode);
+}
+
+if (appName != null) {
+  args.add(parser.NAME);
+  args.add(appName);
+}
+
+for (Map.EntryString, String e : conf.entrySet()) {
+  

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25400090
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/CommandBuilderUtils.java ---
@@ -0,0 +1,228 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Helper methods for command builders.
+ */
+class CommandBuilderUtils {
+
+  static final String DEFAULT_MEM = 512m;
+  static final String DEFAULT_PROPERTIES_FILE = spark-defaults.conf;
+  static final String ENV_SPARK_HOME = SPARK_HOME;
+
+  /** Returns whether the given string is null or empty. */
+  static boolean isEmpty(String s) {
+return s == null || s.isEmpty();
+  }
+
+  /** Joins a list of strings using the given separator. */
+  static String join(String sep, String... elements) {
+StringBuilder sb = new StringBuilder();
+for (String e : elements) {
+  if (e != null) {
+if (sb.length()  0) {
+  sb.append(sep);
+}
+sb.append(e);
+  }
+}
+return sb.toString();
+  }
+
+  /** Joins a list of strings using the given separator. */
+  static String join(String sep, IterableString elements) {
+StringBuilder sb = new StringBuilder();
+for (String e : elements) {
+  if (e != null) {
+if (sb.length()  0) {
+  sb.append(sep);
+}
+sb.append(e);
+  }
+}
+return sb.toString();
+  }
+
+  /** Returns the first value mapped to the given key in the given maps. */
+  static String find(String key, Map?, ?... maps) {
+for (Map?, ? map : maps) {
+  String value = (String) map.get(key);
+  if (!isEmpty(value)) {
+return value;
+  }
+}
+return null;
+  }
+
+  /** Returns the first non-empty, non-null string in the given list. */
+  static String firstNonEmpty(String... candidates) {
+for (String s : candidates) {
+  if (!isEmpty(s)) {
+return s;
+  }
+}
+return null;
+  }
+
+  /** Returns the name of the env variable that holds the native library 
path. */
+  static String getLibPathEnvName() {
+if (isWindows()) {
+  return PATH;
+}
+
+String os = System.getProperty(os.name);
+if (os.startsWith(Mac OS X)) {
+  return DYLD_LIBRARY_PATH;
+} else {
+  return LD_LIBRARY_PATH;
+}
+  }
+
+  /** Returns whether the OS is Windows. */
+  static boolean isWindows() {
+String os = System.getProperty(os.name);
+return os.startsWith(Windows);
+  }
+
+  /**
+   * Updates the user environment to contain the merged value of envKey 
after appending
+   * the given path list.
+   */
+  static void mergeEnvPathList(MapString, String userEnv, String envKey, 
String pathList) {
+if (!isEmpty(pathList)) {
+  String current = firstNonEmpty(userEnv.get(envKey), 
System.getenv(envKey));
+  userEnv.put(envKey, join(File.pathSeparator, current, pathList));
+}
--- End diff --

sorry, what do you mean? Your comment says:
```
Updates the user environment, appending the given pathList to the existing
value of the given environment variable (or setting it if it hasn't yet 
been set).
```
When `pathList` is empty shouldn't we just append the value of the system 
env var to an empty string, i.e. just store the system env var in `userEnv`? If 
that's not the intended behavior then we need to update the docs...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact 

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76114681
  
  [Test build #27977 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27977/consoleFull)
 for   PR 3916 at commit 
[`6184c07`](https://github.com/apache/spark/commit/6184c07d1f7ddb9bf74e898155f28340b7a33f53).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25400728
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java ---
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import static org.apache.spark.launcher.CommandBuilderUtils.*;
+
+/**
+ * Launcher for Spark applications.
+ * p/
+ * Use this class to start Spark applications programmatically. The class 
uses a builder pattern
+ * to allow clients to configure the Spark application and launch it as a 
child process.
+ * p/
+ * Note that launching Spark applications using this class will not 
automatically load environment
+ * variables from the spark-env.sh or spark-env.cmd scripts in the 
configuration directory.
+ */
+public class SparkLauncher {
+
+  /** The Spark master. */
+  public static final String SPARK_MASTER = spark.master;
+
+  /** Configuration key for the driver memory. */
+  public static final String DRIVER_MEMORY = spark.driver.memory;
+  /** Configuration key for the driver class path. */
+  public static final String DRIVER_EXTRA_CLASSPATH = 
spark.driver.extraClassPath;
+  /** Configuration key for the driver VM options. */
+  public static final String DRIVER_EXTRA_JAVA_OPTIONS = 
spark.driver.extraJavaOptions;
+  /** Configuration key for the driver native library path. */
+  public static final String DRIVER_EXTRA_LIBRARY_PATH = 
spark.driver.extraLibraryPath;
+
+  /** Configuration key for the executor memory. */
+  public static final String EXECUTOR_MEMORY = spark.executor.memory;
+  /** Configuration key for the executor class path. */
+  public static final String EXECUTOR_EXTRA_CLASSPATH = 
spark.executor.extraClassPath;
+  /** Configuration key for the executor VM options. */
+  public static final String EXECUTOR_EXTRA_JAVA_OPTIONS = 
spark.executor.extraJavaOptions;
+  /** Configuration key for the executor native library path. */
+  public static final String EXECUTOR_EXTRA_LIBRARY_PATH = 
spark.executor.extraLibraryOptions;
+  /** Configuration key for the number of executor CPU cores. */
+  public static final String EXECUTOR_CORES = spark.executor.cores;
+
+  private final SparkSubmitCommandBuilder builder;
+
+  public SparkLauncher() {
+this(null);
+  }
+
+  /**
+   * Creates a launcher that will set the given environment variables in 
the child.
+   *
+   * @param env Environment variables to set.
+   */
+  public SparkLauncher(MapString, String env) {
+this.builder = new SparkSubmitCommandBuilder();
+if (env != null) {
+  this.builder.childEnv.putAll(env);
+}
+  }
+
+  /**
+   * Set a custom JAVA_HOME for launching the Spark application.
+   *
+   * @param javaHome Path to the JAVA_HOME to use.
+   * @return This launcher.
+   */
+  public SparkLauncher setJavaHome(String javaHome) {
+checkNotNull(javaHome, javaHome);
+builder.javaHome = javaHome;
+return this;
+  }
+
+  /**
+   * Set a custom Spark installation location for the application.
+   *
+   * @param sparkHome Path to the Spark installation to use.
+   * @return This launcher.
+   */
+  public SparkLauncher setSparkHome(String sparkHome) {
+checkNotNull(sparkHome, sparkHome);
+builder.childEnv.put(ENV_SPARK_HOME, sparkHome);
+return this;
+  }
+
+  /**
+   * Set a custom properties file with Spark configuration for the 
application.
+   *
+   * @param path Path to custom properties file to use.
+   * @return This launcher.
+   */
+  public SparkLauncher setPropertiesFile(String path) {
+checkNotNull(path, path);
+

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25400989
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java ---
@@ -0,0 +1,362 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileFilter;
+import java.io.FileInputStream;
+import java.io.InputStreamReader;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.jar.JarFile;
+import java.util.regex.Pattern;
+
+import static org.apache.spark.launcher.CommandBuilderUtils.*;
+
+/**
+ * Abstract Spark command builder that defines common functionality.
+ */
+abstract class AbstractCommandBuilder {
+
+  boolean verbose;
+  String appName;
+  String appResource;
+  String deployMode;
+  String javaHome;
+  String mainClass;
+  String master;
+  String propertiesFile;
+  final ListString appArgs;
+  final ListString jars;
+  final ListString files;
+  final ListString pyFiles;
+  final MapString, String childEnv;
+  final MapString, String conf;
+
+  public AbstractCommandBuilder() {
+this.appArgs = new ArrayListString();
+this.childEnv = new HashMapString, String();
+this.conf = new HashMapString, String();
+this.files = new ArrayListString();
+this.jars = new ArrayListString();
+this.pyFiles = new ArrayListString();
+  }
+
+  /**
+   * Builds the command to execute.
+   *
+   * @param env A map containing environment variables for the child 
process. It may already contain
+   *entries defined by the user (such as SPARK_HOME, or those 
defined by the
+   *SparkLauncher constructor that takes an environment), and 
may be modified to
+   *include other variables needed by the process to be 
executed.
+   */
+  abstract ListString buildCommand(MapString, String env) throws 
IOException;
+
+  /**
+   * Builds a list of arguments to run java.
+   *
+   * This method finds the java executable to use and appends JVM-specific 
options for running a
+   * class with Spark in the classpath. It also loads options from the 
java-opts file in the
+   * configuration directory being used.
+   *
+   * Callers should still add at least the class to run, as well as any 
arguments to pass to the
+   * class.
+   */
+  ListString buildJavaCommand(String extraClassPath) throws IOException {
+ListString cmd = new ArrayListString();
+if (javaHome == null) {
+  cmd.add(join(File.separator, System.getProperty(java.home), bin, 
java));
+} else {
+  cmd.add(join(File.separator, javaHome, bin, java));
+}
+
+// Load extra JAVA_OPTS from conf/java-opts, if it exists.
+File javaOpts = new File(join(File.separator, getConfDir(), 
java-opts));
+if (javaOpts.isFile()) {
+  BufferedReader br = new BufferedReader(new InputStreamReader(
+  new FileInputStream(javaOpts), UTF-8));
+  try {
+String line;
+while ((line = br.readLine()) != null) {
+  addOptionString(cmd, line);
+}
+  } finally {
+br.close();
+  }
+}
+
+cmd.add(-cp);
+cmd.add(join(File.pathSeparator, buildClassPath(extraClassPath)));
+return cmd;
+  }
+
+  /**
+   * Adds the default perm gen size option for Spark if the VM requires it 
and the user hasn't
+   * set it.
+   */
+  void addPermGenSizeOpt(ListString cmd) {
+// Don't set MaxPermSize for Java 8 and later.
+String[] version = 

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25400987
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/AbstractCommandBuilder.java ---
@@ -0,0 +1,362 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileFilter;
+import java.io.FileInputStream;
+import java.io.InputStreamReader;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.jar.JarFile;
+import java.util.regex.Pattern;
+
+import static org.apache.spark.launcher.CommandBuilderUtils.*;
+
+/**
+ * Abstract Spark command builder that defines common functionality.
+ */
+abstract class AbstractCommandBuilder {
+
+  boolean verbose;
+  String appName;
+  String appResource;
+  String deployMode;
+  String javaHome;
+  String mainClass;
+  String master;
+  String propertiesFile;
+  final ListString appArgs;
+  final ListString jars;
+  final ListString files;
+  final ListString pyFiles;
+  final MapString, String childEnv;
+  final MapString, String conf;
+
+  public AbstractCommandBuilder() {
+this.appArgs = new ArrayListString();
+this.childEnv = new HashMapString, String();
+this.conf = new HashMapString, String();
+this.files = new ArrayListString();
+this.jars = new ArrayListString();
+this.pyFiles = new ArrayListString();
+  }
+
+  /**
+   * Builds the command to execute.
+   *
+   * @param env A map containing environment variables for the child 
process. It may already contain
+   *entries defined by the user (such as SPARK_HOME, or those 
defined by the
+   *SparkLauncher constructor that takes an environment), and 
may be modified to
+   *include other variables needed by the process to be 
executed.
+   */
+  abstract ListString buildCommand(MapString, String env) throws 
IOException;
+
+  /**
+   * Builds a list of arguments to run java.
+   *
+   * This method finds the java executable to use and appends JVM-specific 
options for running a
+   * class with Spark in the classpath. It also loads options from the 
java-opts file in the
+   * configuration directory being used.
+   *
+   * Callers should still add at least the class to run, as well as any 
arguments to pass to the
+   * class.
+   */
+  ListString buildJavaCommand(String extraClassPath) throws IOException {
+ListString cmd = new ArrayListString();
+if (javaHome == null) {
+  cmd.add(join(File.separator, System.getProperty(java.home), bin, 
java));
+} else {
+  cmd.add(join(File.separator, javaHome, bin, java));
+}
+
+// Load extra JAVA_OPTS from conf/java-opts, if it exists.
+File javaOpts = new File(join(File.separator, getConfDir(), 
java-opts));
+if (javaOpts.isFile()) {
+  BufferedReader br = new BufferedReader(new InputStreamReader(
+  new FileInputStream(javaOpts), UTF-8));
+  try {
+String line;
+while ((line = br.readLine()) != null) {
+  addOptionString(cmd, line);
+}
+  } finally {
+br.close();
+  }
+}
+
+cmd.add(-cp);
+cmd.add(join(File.pathSeparator, buildClassPath(extraClassPath)));
+return cmd;
+  }
+
+  /**
+   * Adds the default perm gen size option for Spark if the VM requires it 
and the user hasn't
+   * set it.
+   */
+  void addPermGenSizeOpt(ListString cmd) {
+// Don't set MaxPermSize for Java 8 and later.
+String[] version = 

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76076202
  
  [Test build #27958 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27958/consoleFull)
 for   PR 3916 at commit 
[`7e66c18`](https://github.com/apache/spark/commit/7e66c18bad6e672c60ec995bd968b3a7db32ee73).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-76076214
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27958/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25388445
  
--- Diff: launcher/src/main/java/org/apache/spark/launcher/Main.java ---
@@ -0,0 +1,106 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Command line interface for the Spark launcher. Used internally by Spark 
scripts.
+ */
+public class Main extends LauncherCommon {
+
+  /**
+   * Usage: Main [class] [class args]
+   * p/
+   * This CLI works in two different modes:
+   * ul
+   *   lispark-submit: if iclass/i is 
org.apache.spark.deploy.SparkSubmit, the
+   *   {@link SparkLauncher} class is used to launch a Spark 
application./li
+   *   lispark-class: if another class is provided, an internal Spark 
class is run./li
+   * /ul
+   *
+   * The ultimate command will not be run in the same process. Instead, 
the command to be executed
+   * will be printed to stdout. On Unix systems, this will be one argument 
per line. On Windows
+   * systems, this will be a single line containing the command to be 
executed.
+   */
+  public static void main(String[] argsArray) throws Exception {
+checkArgument(argsArray.length  0, Not enough arguments: missing 
class name.);
+
+ListString args = new ArrayListString(Arrays.asList(argsArray));
+String className = args.remove(0);
+
+boolean printLaunchCommand = false;
+AbstractLauncher? launcher;
+try {
+  if (className.equals(org.apache.spark.deploy.SparkSubmit)) {
+launcher = new SparkSubmitCliLauncher(args);
+printLaunchCommand = 
!isEmpty(System.getenv(SPARK_PRINT_LAUNCH_COMMAND));
+  } else if (className.equals(pyspark)) {
+launcher = new PySparkLauncher(args);
+  } else {
+launcher = new SparkClassLauncher(className, args);
+printLaunchCommand = 
!isEmpty(System.getenv(SPARK_PRINT_LAUNCH_COMMAND));
+  }
+} catch (IllegalArgumentException e) {
+  launcher = new UsageLauncher();
+}
+
+ListString cmd = launcher.buildLauncherCommand();
+if (printLaunchCommand) {
+  System.err.println(Spark Command:  + join( , cmd));
+  System.err.println();
+}
+
+for (String c : cmd) {
+  System.out.println(c);
+}
+  }
+
+  /**
+   * Internal launcher used when command line parsing fails. This will 
behave differently depending
+   * on the platform:
+   *
+   * - On Unix-like systems, it will print a call to the usage function 
with argument 1. The
+   *   function is expected to print the command's usage and exit with the 
provided exit code.
+   *   The script should use export -f usage after declaring a function 
called usage, so that
+   *   the function is available to downstream scripts.
+   *
+   * - On Windows it will set the variable SPARK_LAUNCHER_USAGE_ERROR to 
1. The batch script
+   *   should check for this variable and print its usage, since batch 
scripts don't really support
+   *   the export -f functionality used in bash.
+   */
+  private static class UsageLauncher extends 
AbstractLauncherUsageLauncher {
+
+@Override
+protected ListString buildLauncherCommand() {
+  if (isWindows()) {
+return Arrays.asList(set SPARK_LAUNCHER_USAGE_ERROR=1);
+  } else {
+return Arrays.asList(usage 1);
+  }
--- End diff --

Anything here will need to be interpreted by the scripts in some way, so 
the synchronization you talk about will be there one way or another. empty 
output meaning print usage message seems a bit weird.

The values are already 

[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-02-25 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r25394241
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/SparkSubmitCommandBuilder.java 
---
@@ -0,0 +1,335 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+
+import static org.apache.spark.launcher.CommandBuilderUtils.*;
+
+/**
+ * Special launcher for handling a CLI invocation of SparkSubmit.
+ * p/
+ * This launcher extends SparkLauncher to add command line parsing 
compatible with
+ * SparkSubmit. It handles setting driver-side options and special parsing 
needed
+ * for the different specialClasses.
+ * p/
+ * This class has also some special features to aid PySparkLauncher.
+ */
+class SparkSubmitCommandBuilder extends AbstractCommandBuilder {
+
+  /**
+   * Name of the app resource used to identify the PySpark shell. The 
command line parser expects
+   * the resource name to be the very first argument to spark-submit in 
this case.
+   *
+   * NOTE: this cannot be pyspark-shell since that identifies the 
PySpark shell to SparkSubmit
+   * (see java_gateway.py), and can cause this code to enter into an 
infinite loop.
+   */
+  static final String PYSPARK_SHELL = pyspark-shell-main;
+
+  /**
+   * This is the actual resource name that identifies the PySpark shell to 
SparkSubmit.
+   */
+  static final String PYSPARK_SHELL_RESOURCE = pyspark-shell;
+
+  /**
+   * This map must match the class names for available special classes, 
since this modifies the way
+   * command line parsing works. This maps the class name to the resource 
to use when calling
+   * spark-submit.
+   */
+  private static final MapString, String specialClasses = new 
HashMapString, String();
+  static {
+specialClasses.put(org.apache.spark.repl.Main, spark-shell);
+
specialClasses.put(org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver,
+  spark-internal);
+  }
+
+  private final ListString sparkArgs;
+  private boolean hasMixedArguments;
+
+  SparkSubmitCommandBuilder() {
+this.sparkArgs = new ArrayListString();
+  }
+
+  SparkSubmitCommandBuilder(ListString args) {
+this(false, args);
+  }
+
+  SparkSubmitCommandBuilder(boolean hasMixedArguments, ListString args) {
+this();
+ListString submitArgs = args;
+if (args.size()  0  args.get(0).equals(PYSPARK_SHELL)) {
+  this.hasMixedArguments = true;
+  appResource = PYSPARK_SHELL_RESOURCE;
+  submitArgs = args.subList(1, args.size());
+} else {
+  this.hasMixedArguments = hasMixedArguments;
+}
+
+new OptionParser().parse(submitArgs);
+  }
+
+  @Override
+  public ListString buildCommand(MapString, String env) throws 
IOException {
+if (PYSPARK_SHELL_RESOURCE.equals(appResource)) {
+  return buildPySparkShellCommand(env);
+} else {
+  return buildSparkSubmitCommand(env);
+}
+  }
+
+  ListString buildSparkSubmitArgs() {
+ListString args = new ArrayListString();
+SparkSubmitOptionParser parser = new SparkSubmitOptionParser();
+
+if (verbose) {
+  args.add(parser.VERBOSE);
+}
+
+if (master != null) {
+  args.add(parser.MASTER);
+  args.add(master);
+}
+
+if (deployMode != null) {
+  args.add(parser.DEPLOY_MODE);
+  args.add(deployMode);
+}
+
+if (appName != null) {
+  args.add(parser.NAME);
+  args.add(appName);
+}
+
+for (Map.EntryString, String e : conf.entrySet()) {
+  

  1   2   3   4   5   >