subject:"\[GitHub\] spark pull request\: Spark Core \- \[SPARK\-3620\] \- Refactor of SparkS..."

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-12-14 Thread tigerquoll

Github user tigerquoll commented on the pull request:

https://github.com/apache/spark/pull/2516#issuecomment-66909268
  
No probs, it was actually a nice way of starting to poke through the code 
to figure out how things are put together.  I'll stick to smaller jobs from now 
on in.
Regards,Dale.

Date: Tue, 9 Dec 2014 19:11:25 -0800
From: notificati...@github.com
To: sp...@noreply.github.com
CC: tigerqu...@outlook.com
Subject: Re: [spark] Spark Core - [SPARK-3620] - Refactor of SparkSubmit 
Argument parsing code (#2516)

Hey @tigerquoll usually for large patches like this we require a design doc 
on the JIRA. Especially because the priority of this is not super important, I 
would recommend that we close this issue for now, and maybe open a new one 
later once there is a consensus on how we should restructure Spark submit. 
Thanks for your work so far.


â
Reply to this email directly or view it on GitHub.

  


  
  
  =


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-12-10 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2516


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-12-09 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2516#issuecomment-66398367
  
Hey @tigerquoll usually for large patches like this we require a design doc 
on the JIRA. Especially because the priority of this is not super important, I 
would recommend that we close this issue for now, and maybe open a new one 
later once there is a consensus on how we should restructure Spark submit. 
Thanks for your work so far.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-12-04 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/2516#issuecomment-65682500
  
@pwendell I was less interested in the refactoring part than in formalizing 
the precedence for the options in a more obvious manner in the code. Right now 
that's a little confusing.

But yeah, this patch is rather large, and a lot has changed since it was 
last updated...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-11-04 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r19793754
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -83,216 +79,163 @@ object SparkSubmit {
*   (4) the main class for the child
*/
   private[spark] def createLaunchEnv(args: SparkSubmitArguments)
-  : (ArrayBuffer[String], ArrayBuffer[String], Map[String, String], 
String) = {
+  : (mutable.ArrayBuffer[String], mutable.ArrayBuffer[String], 
Map[String, String], String) = {
 
 // Values to return
-val childArgs = new ArrayBuffer[String]()
-val childClasspath = new ArrayBuffer[String]()
-val sysProps = new HashMap[String, String]()
+val childArgs = new mutable.ArrayBuffer[String]()
+val childClasspath = new mutable.ArrayBuffer[String]()
+val sysProps = new mutable.HashMap[String, String]()
 var childMainClass = 
 
-// Set the cluster manager
-val clusterManager: Int = args.master match {
-  case m if m.startsWith(yarn) = YARN
-  case m if m.startsWith(spark) = STANDALONE
-  case m if m.startsWith(mesos) = MESOS
-  case m if m.startsWith(local) = LOCAL
-  case _ = printErrorAndExit(Master must start with yarn, spark, 
mesos, or local); -1
-}
-
-// Set the deploy mode; default is client mode
-var deployMode: Int = args.deployMode match {
-  case client | null = CLIENT
-  case cluster = CLUSTER
-  case _ = printErrorAndExit(Deploy mode must be either client or 
cluster); -1
-}
-
-// Because yarn-cluster and yarn-client encapsulate both the master
-// and deploy mode, we have some logic to infer the master and deploy 
mode
-// from each other if only one is specified, or exit early if they are 
at odds.
-if (clusterManager == YARN) {
-  if (args.master == yarn-standalone) {
-printWarning(\yarn-standalone\ is deprecated. Use 
\yarn-cluster\ instead.)
-args.master = yarn-cluster
-  }
-  (args.master, args.deployMode) match {
-case (yarn-cluster, null) =
-  deployMode = CLUSTER
-case (yarn-cluster, client) =
-  printErrorAndExit(Client deploy mode is not compatible with 
master \yarn-cluster\)
-case (yarn-client, cluster) =
-  printErrorAndExit(Cluster deploy mode is not compatible with 
master \yarn-client\)
-case (_, mode) =
-  args.master = yarn- + Option(mode).getOrElse(client)
-  }
-
+if (args.clusterManagerFlag == CM_YARN) {
   // Make sure YARN is included in our build if we're trying to use it
   if (!Utils.classIsLoadable(org.apache.spark.deploy.yarn.Client)  
!Utils.isTesting) {
 printErrorAndExit(
   Could not load YARN classes.  +
   This copy of Spark may not have been compiled with YARN 
support.)
   }
-}
-
-// The following modes are not supported or applicable
-(clusterManager, deployMode) match {
-  case (MESOS, CLUSTER) =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for Mesos clusters.)
-  case (_, CLUSTER) if args.isPython =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for python applications.)
-  case (_, CLUSTER) if isShell(args.primaryResource) =
-printErrorAndExit(Cluster deploy mode is not applicable to Spark 
shells.)
-  case _ =
+  val hasHadoopEnv = sys.env.contains(HADOOP_CONF_DIR) || 
sys.env.contains(YARN_CONF_DIR)
+  if (!hasHadoopEnv  !Utils.isTesting) {
+throw new Exception(When running with master ' + args.master + 
' +
+  either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the 
environment.)
+  }
 }
 
 // If we're running a python app, set the main class to our specific 
python runner
 if (args.isPython) {
   if (args.primaryResource == PYSPARK_SHELL) {
-args.mainClass = py4j.GatewayServer
-args.childArgs = ArrayBuffer(--die-on-broken-pipe, 0)
+args.mainClass = PY4J_GATEWAYSERVER
+args.childArgs = mutable.ArrayBuffer(--die-on-broken-pipe, 0)
   } else {
 // If a python file is provided, add it to the child arguments and 
list of files to deploy.
 // Usage: PythonAppRunner main python file extra python files 
[app arguments]
-args.mainClass = org.apache.spark.deploy.PythonRunner
-args.childArgs = ArrayBuffer(args.primaryResource, args.pyFiles) 
++ args.childArgs
-args.files = mergeFileLists(args.files, args.primaryResource)
+args.mainClass = PYTHON_RUNNER
+

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-11-04 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r19793870
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -33,30 +34,25 @@ import org.apache.spark.util.Utils
  * a layer over the different cluster managers and deploy modes that Spark 
supports.
  */
 object SparkSubmit {
-
-  // Cluster managers
-  private val YARN = 1
-  private val STANDALONE = 2
-  private val MESOS = 4
-  private val LOCAL = 8
-  private val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
-
-  // Deploy modes
-  private val CLIENT = 1
-  private val CLUSTER = 2
-  private val ALL_DEPLOY_MODES = CLIENT | CLUSTER
-
   // A special jar name that indicates the class being run is inside of 
Spark itself, and therefore
   // no user jar is needed.
-  private val SPARK_INTERNAL = spark-internal
+  val SPARK_INTERNAL = spark-internal
 
   // Special primary resource names that represent shells rather than 
application jars.
-  private val SPARK_SHELL = spark-shell
-  private val PYSPARK_SHELL = pyspark-shell
+  val SPARK_SHELL = spark-shell
+  val PYSPARK_SHELL = pyspark-shell
+
+  // Special python classes
+  val PY4J_GATEWAYSERVER: String = py4j.GatewayServer
+  val PYTHON_RUNNER: String = org.apache.spark.deploy.PythonRunner
--- End diff --

done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-11-04 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r19794246
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -83,216 +79,163 @@ object SparkSubmit {
*   (4) the main class for the child
*/
   private[spark] def createLaunchEnv(args: SparkSubmitArguments)
-  : (ArrayBuffer[String], ArrayBuffer[String], Map[String, String], 
String) = {
+  : (mutable.ArrayBuffer[String], mutable.ArrayBuffer[String], 
Map[String, String], String) = {
 
 // Values to return
-val childArgs = new ArrayBuffer[String]()
-val childClasspath = new ArrayBuffer[String]()
-val sysProps = new HashMap[String, String]()
+val childArgs = new mutable.ArrayBuffer[String]()
+val childClasspath = new mutable.ArrayBuffer[String]()
+val sysProps = new mutable.HashMap[String, String]()
 var childMainClass = 
 
-// Set the cluster manager
-val clusterManager: Int = args.master match {
-  case m if m.startsWith(yarn) = YARN
-  case m if m.startsWith(spark) = STANDALONE
-  case m if m.startsWith(mesos) = MESOS
-  case m if m.startsWith(local) = LOCAL
-  case _ = printErrorAndExit(Master must start with yarn, spark, 
mesos, or local); -1
-}
-
-// Set the deploy mode; default is client mode
-var deployMode: Int = args.deployMode match {
-  case client | null = CLIENT
-  case cluster = CLUSTER
-  case _ = printErrorAndExit(Deploy mode must be either client or 
cluster); -1
-}
-
-// Because yarn-cluster and yarn-client encapsulate both the master
-// and deploy mode, we have some logic to infer the master and deploy 
mode
-// from each other if only one is specified, or exit early if they are 
at odds.
-if (clusterManager == YARN) {
-  if (args.master == yarn-standalone) {
-printWarning(\yarn-standalone\ is deprecated. Use 
\yarn-cluster\ instead.)
-args.master = yarn-cluster
-  }
-  (args.master, args.deployMode) match {
-case (yarn-cluster, null) =
-  deployMode = CLUSTER
-case (yarn-cluster, client) =
-  printErrorAndExit(Client deploy mode is not compatible with 
master \yarn-cluster\)
-case (yarn-client, cluster) =
-  printErrorAndExit(Cluster deploy mode is not compatible with 
master \yarn-client\)
-case (_, mode) =
-  args.master = yarn- + Option(mode).getOrElse(client)
-  }
-
+if (args.clusterManagerFlag == CM_YARN) {
   // Make sure YARN is included in our build if we're trying to use it
   if (!Utils.classIsLoadable(org.apache.spark.deploy.yarn.Client)  
!Utils.isTesting) {
 printErrorAndExit(
   Could not load YARN classes.  +
   This copy of Spark may not have been compiled with YARN 
support.)
   }
-}
-
-// The following modes are not supported or applicable
-(clusterManager, deployMode) match {
-  case (MESOS, CLUSTER) =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for Mesos clusters.)
-  case (_, CLUSTER) if args.isPython =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for python applications.)
-  case (_, CLUSTER) if isShell(args.primaryResource) =
-printErrorAndExit(Cluster deploy mode is not applicable to Spark 
shells.)
-  case _ =
+  val hasHadoopEnv = sys.env.contains(HADOOP_CONF_DIR) || 
sys.env.contains(YARN_CONF_DIR)
+  if (!hasHadoopEnv  !Utils.isTesting) {
+throw new Exception(When running with master ' + args.master + 
' +
+  either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the 
environment.)
+  }
 }
 
 // If we're running a python app, set the main class to our specific 
python runner
 if (args.isPython) {
   if (args.primaryResource == PYSPARK_SHELL) {
-args.mainClass = py4j.GatewayServer
-args.childArgs = ArrayBuffer(--die-on-broken-pipe, 0)
+args.mainClass = PY4J_GATEWAYSERVER
+args.childArgs = mutable.ArrayBuffer(--die-on-broken-pipe, 0)
   } else {
 // If a python file is provided, add it to the child arguments and 
list of files to deploy.
 // Usage: PythonAppRunner main python file extra python files 
[app arguments]
-args.mainClass = org.apache.spark.deploy.PythonRunner
-args.childArgs = ArrayBuffer(args.primaryResource, args.pyFiles) 
++ args.childArgs
-args.files = mergeFileLists(args.files, args.primaryResource)
+args.mainClass = PYTHON_RUNNER
+

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-11-04 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r19794638
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,201 +17,286 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io._
 import java.util.jar.JarFile
 
-import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import scala.collection._
 
 import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
+import org.apache.spark.deploy.SparkSubmitArguments._
 
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- * The env argument is used for testing.
- */
-private[spark] class SparkSubmitArguments(args: Seq[String], env: 
Map[String, String] = sys.env) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
+ * Pulls and validates configuration information together in order of 
priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Legacy environment variables
+ * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 5. hard coded defaults
+ *
+*/
+private[spark] class SparkSubmitArguments(args: Seq[String]) {
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala.
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SPARK_MASTER)
+  def master_= (value: String):Unit = conf.put(SPARK_MASTER, value)
+
+  def executorMemory = conf(SPARK_EXECUTOR_MEMORY)
+  def executorMemory_= (value: String):Unit = 
conf.put(SPARK_EXECUTOR_MEMORY, value)
+
+  def executorCores = conf(SPARK_EXECUTOR_CORES)
+  def executorCores_= (value: String):Unit = 
conf.put(SPARK_EXECUTOR_CORES, value)
+
+  def totalExecutorCores = conf.get(SPARK_CORES_MAX)
+  def totalExecutorCores_= (value: String):Unit = 
conf.put(SPARK_CORES_MAX, value)
+
+  def driverMemory = conf(SPARK_DRIVER_MEMORY)
+  def driverMemory_= (value: String):Unit = conf.put(SPARK_DRIVER_MEMORY, 
value)
+
+  def driverExtraClassPath = conf.get(SPARK_DRIVER_EXTRA_CLASSPATH)
+  def driverExtraClassPath_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_CLASSPATH, value)
+
+  def driverExtraLibraryPath = conf.get(SPARK_DRIVER_EXTRA_LIBRARY_PATH)
+  def driverExtraLibraryPath_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_LIBRARY_PATH, value)
+
+  def driverExtraJavaOptions = conf.get(SPARK_DRIVER_EXTRA_JAVA_OPTIONS)
+  def driverExtraJavaOptions_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_JAVA_OPTIONS, value)

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-11-04 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r19795115
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,201 +17,286 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io._
 import java.util.jar.JarFile
 
-import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import scala.collection._
 
 import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
+import org.apache.spark.deploy.SparkSubmitArguments._
 
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- * The env argument is used for testing.
- */
-private[spark] class SparkSubmitArguments(args: Seq[String], env: 
Map[String, String] = sys.env) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
+ * Pulls and validates configuration information together in order of 
priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Legacy environment variables
+ * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 5. hard coded defaults
+ *
+*/
+private[spark] class SparkSubmitArguments(args: Seq[String]) {
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala.
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SPARK_MASTER)
+  def master_= (value: String):Unit = conf.put(SPARK_MASTER, value)
+
+  def executorMemory = conf(SPARK_EXECUTOR_MEMORY)
+  def executorMemory_= (value: String):Unit = 
conf.put(SPARK_EXECUTOR_MEMORY, value)
+
+  def executorCores = conf(SPARK_EXECUTOR_CORES)
+  def executorCores_= (value: String):Unit = 
conf.put(SPARK_EXECUTOR_CORES, value)
+
+  def totalExecutorCores = conf.get(SPARK_CORES_MAX)
+  def totalExecutorCores_= (value: String):Unit = 
conf.put(SPARK_CORES_MAX, value)
+
+  def driverMemory = conf(SPARK_DRIVER_MEMORY)
+  def driverMemory_= (value: String):Unit = conf.put(SPARK_DRIVER_MEMORY, 
value)
+
+  def driverExtraClassPath = conf.get(SPARK_DRIVER_EXTRA_CLASSPATH)
+  def driverExtraClassPath_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_CLASSPATH, value)
+
+  def driverExtraLibraryPath = conf.get(SPARK_DRIVER_EXTRA_LIBRARY_PATH)
+  def driverExtraLibraryPath_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_LIBRARY_PATH, value)
+
+  def driverExtraJavaOptions = conf.get(SPARK_DRIVER_EXTRA_JAVA_OPTIONS)
+  def driverExtraJavaOptions_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_JAVA_OPTIONS, value)

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-11-04 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r19795245
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -227,91 +312,92 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, St
  */
 def parse(opts: Seq[String]): Unit = opts match {
   case (--name) :: value :: tail =
-name = value
+cmdLineConfig.put(SPARK_APP_NAME, value)
 parse(tail)
 
   case (--master) :: value :: tail =
-master = value
+cmdLineConfig.put(SPARK_MASTER, value)
 parse(tail)
 
   case (--class) :: value :: tail =
-mainClass = value
+cmdLineConfig.put(SPARK_APP_CLASS, value)
 parse(tail)
 
   case (--deploy-mode) :: value :: tail =
-if (value != client  value != cluster) {
-  SparkSubmit.printErrorAndExit(--deploy-mode must be either 
\client\ or \cluster\)
-}
-deployMode = value
+cmdLineConfig.put(SPARK_DEPLOY_MODE, value)
 parse(tail)
 
   case (--num-executors) :: value :: tail =
-numExecutors = value
+cmdLineConfig.put(SPARK_EXECUTOR_INSTANCES, value)
 parse(tail)
 
   case (--total-executor-cores) :: value :: tail =
-totalExecutorCores = value
+cmdLineConfig.put(SPARK_CORES_MAX, value)
 parse(tail)
 
   case (--executor-cores) :: value :: tail =
-executorCores = value
+cmdLineConfig.put(SPARK_EXECUTOR_CORES, value)
 parse(tail)
 
   case (--executor-memory) :: value :: tail =
-executorMemory = value
+cmdLineConfig.put(SPARK_EXECUTOR_MEMORY, value)
 parse(tail)
 
   case (--driver-memory) :: value :: tail =
-driverMemory = value
+cmdLineConfig.put(SPARK_DRIVER_MEMORY, value)
 parse(tail)
 
   case (--driver-cores) :: value :: tail =
-driverCores = value
+cmdLineConfig.put(SPARK_DRIVER_CORES, value)
 parse(tail)
 
   case (--driver-class-path) :: value :: tail =
-driverExtraClassPath = value
+cmdLineConfig.put(SPARK_DRIVER_EXTRA_CLASSPATH, value)
 parse(tail)
 
   case (--driver-java-options) :: value :: tail =
-driverExtraJavaOptions = value
+cmdLineConfig.put(SPARK_DRIVER_EXTRA_JAVA_OPTIONS, value)
 parse(tail)
 
   case (--driver-library-path) :: value :: tail =
-driverExtraLibraryPath = value
+cmdLineConfig.put(SPARK_DRIVER_EXTRA_LIBRARY_PATH, value)
 parse(tail)
 
   case (--properties-file) :: value :: tail =
-propertiesFile = value
+/*  We merge the property file config options into the rest of the 
command lines options
+ *  after we have finished the rest of the command line processing 
as property files
+ *  cannot override explicit command line options .
+ */
+cmdLinePropertyFileValues ++= 
Utils.getPropertyValuesFromFile(value)
--- End diff --

I've changed back to old behaviour with the most recent merge form current, 
but I will print a warning to the user about what is going on if we detect 
multiple property files options


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18908910
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -83,216 +79,163 @@ object SparkSubmit {
*   (4) the main class for the child
*/
   private[spark] def createLaunchEnv(args: SparkSubmitArguments)
-  : (ArrayBuffer[String], ArrayBuffer[String], Map[String, String], 
String) = {
+  : (mutable.ArrayBuffer[String], mutable.ArrayBuffer[String], 
Map[String, String], String) = {
 
 // Values to return
-val childArgs = new ArrayBuffer[String]()
-val childClasspath = new ArrayBuffer[String]()
-val sysProps = new HashMap[String, String]()
+val childArgs = new mutable.ArrayBuffer[String]()
+val childClasspath = new mutable.ArrayBuffer[String]()
+val sysProps = new mutable.HashMap[String, String]()
 var childMainClass = 
 
-// Set the cluster manager
-val clusterManager: Int = args.master match {
-  case m if m.startsWith(yarn) = YARN
-  case m if m.startsWith(spark) = STANDALONE
-  case m if m.startsWith(mesos) = MESOS
-  case m if m.startsWith(local) = LOCAL
-  case _ = printErrorAndExit(Master must start with yarn, spark, 
mesos, or local); -1
-}
-
-// Set the deploy mode; default is client mode
-var deployMode: Int = args.deployMode match {
-  case client | null = CLIENT
-  case cluster = CLUSTER
-  case _ = printErrorAndExit(Deploy mode must be either client or 
cluster); -1
-}
-
-// Because yarn-cluster and yarn-client encapsulate both the master
-// and deploy mode, we have some logic to infer the master and deploy 
mode
-// from each other if only one is specified, or exit early if they are 
at odds.
-if (clusterManager == YARN) {
-  if (args.master == yarn-standalone) {
-printWarning(\yarn-standalone\ is deprecated. Use 
\yarn-cluster\ instead.)
-args.master = yarn-cluster
-  }
-  (args.master, args.deployMode) match {
-case (yarn-cluster, null) =
-  deployMode = CLUSTER
-case (yarn-cluster, client) =
-  printErrorAndExit(Client deploy mode is not compatible with 
master \yarn-cluster\)
-case (yarn-client, cluster) =
-  printErrorAndExit(Cluster deploy mode is not compatible with 
master \yarn-client\)
-case (_, mode) =
-  args.master = yarn- + Option(mode).getOrElse(client)
-  }
-
+if (args.clusterManagerFlag == CM_YARN) {
   // Make sure YARN is included in our build if we're trying to use it
   if (!Utils.classIsLoadable(org.apache.spark.deploy.yarn.Client)  
!Utils.isTesting) {
 printErrorAndExit(
   Could not load YARN classes.  +
   This copy of Spark may not have been compiled with YARN 
support.)
   }
-}
-
-// The following modes are not supported or applicable
-(clusterManager, deployMode) match {
-  case (MESOS, CLUSTER) =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for Mesos clusters.)
-  case (_, CLUSTER) if args.isPython =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for python applications.)
-  case (_, CLUSTER) if isShell(args.primaryResource) =
-printErrorAndExit(Cluster deploy mode is not applicable to Spark 
shells.)
-  case _ =
+  val hasHadoopEnv = sys.env.contains(HADOOP_CONF_DIR) || 
sys.env.contains(YARN_CONF_DIR)
+  if (!hasHadoopEnv  !Utils.isTesting) {
+throw new Exception(When running with master ' + args.master + 
' +
+  either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the 
environment.)
+  }
 }
 
 // If we're running a python app, set the main class to our specific 
python runner
 if (args.isPython) {
   if (args.primaryResource == PYSPARK_SHELL) {
-args.mainClass = py4j.GatewayServer
-args.childArgs = ArrayBuffer(--die-on-broken-pipe, 0)
+args.mainClass = PY4J_GATEWAYSERVER
+args.childArgs = mutable.ArrayBuffer(--die-on-broken-pipe, 0)
   } else {
 // If a python file is provided, add it to the child arguments and 
list of files to deploy.
 // Usage: PythonAppRunner main python file extra python files 
[app arguments]
-args.mainClass = org.apache.spark.deploy.PythonRunner
-args.childArgs = ArrayBuffer(args.primaryResource, args.pyFiles) 
++ args.childArgs
-args.files = mergeFileLists(args.files, args.primaryResource)
+args.mainClass = PYTHON_RUNNER
+args.childArgs =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18909075
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -33,30 +34,25 @@ import org.apache.spark.util.Utils
  * a layer over the different cluster managers and deploy modes that Spark 
supports.
  */
 object SparkSubmit {
-
-  // Cluster managers
-  private val YARN = 1
-  private val STANDALONE = 2
-  private val MESOS = 4
-  private val LOCAL = 8
-  private val ALL_CLUSTER_MGRS = YARN | STANDALONE | MESOS | LOCAL
-
-  // Deploy modes
-  private val CLIENT = 1
-  private val CLUSTER = 2
-  private val ALL_DEPLOY_MODES = CLIENT | CLUSTER
-
   // A special jar name that indicates the class being run is inside of 
Spark itself, and therefore
   // no user jar is needed.
-  private val SPARK_INTERNAL = spark-internal
+  val SPARK_INTERNAL = spark-internal
 
   // Special primary resource names that represent shells rather than 
application jars.
-  private val SPARK_SHELL = spark-shell
-  private val PYSPARK_SHELL = pyspark-shell
+  val SPARK_SHELL = spark-shell
+  val PYSPARK_SHELL = pyspark-shell
+
+  // Special python classes
+  val PY4J_GATEWAYSERVER: String = py4j.GatewayServer
+  val PYTHON_RUNNER: String = org.apache.spark.deploy.PythonRunner
--- End diff --

Instead, I'd use `PythonRunner.getClass.getName.stripSuffix($)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18909183
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -83,216 +79,163 @@ object SparkSubmit {
*   (4) the main class for the child
*/
   private[spark] def createLaunchEnv(args: SparkSubmitArguments)
-  : (ArrayBuffer[String], ArrayBuffer[String], Map[String, String], 
String) = {
+  : (mutable.ArrayBuffer[String], mutable.ArrayBuffer[String], 
Map[String, String], String) = {
 
 // Values to return
-val childArgs = new ArrayBuffer[String]()
-val childClasspath = new ArrayBuffer[String]()
-val sysProps = new HashMap[String, String]()
+val childArgs = new mutable.ArrayBuffer[String]()
+val childClasspath = new mutable.ArrayBuffer[String]()
+val sysProps = new mutable.HashMap[String, String]()
 var childMainClass = 
 
-// Set the cluster manager
-val clusterManager: Int = args.master match {
-  case m if m.startsWith(yarn) = YARN
-  case m if m.startsWith(spark) = STANDALONE
-  case m if m.startsWith(mesos) = MESOS
-  case m if m.startsWith(local) = LOCAL
-  case _ = printErrorAndExit(Master must start with yarn, spark, 
mesos, or local); -1
-}
-
-// Set the deploy mode; default is client mode
-var deployMode: Int = args.deployMode match {
-  case client | null = CLIENT
-  case cluster = CLUSTER
-  case _ = printErrorAndExit(Deploy mode must be either client or 
cluster); -1
-}
-
-// Because yarn-cluster and yarn-client encapsulate both the master
-// and deploy mode, we have some logic to infer the master and deploy 
mode
-// from each other if only one is specified, or exit early if they are 
at odds.
-if (clusterManager == YARN) {
-  if (args.master == yarn-standalone) {
-printWarning(\yarn-standalone\ is deprecated. Use 
\yarn-cluster\ instead.)
-args.master = yarn-cluster
-  }
-  (args.master, args.deployMode) match {
-case (yarn-cluster, null) =
-  deployMode = CLUSTER
-case (yarn-cluster, client) =
-  printErrorAndExit(Client deploy mode is not compatible with 
master \yarn-cluster\)
-case (yarn-client, cluster) =
-  printErrorAndExit(Cluster deploy mode is not compatible with 
master \yarn-client\)
-case (_, mode) =
-  args.master = yarn- + Option(mode).getOrElse(client)
-  }
-
+if (args.clusterManagerFlag == CM_YARN) {
   // Make sure YARN is included in our build if we're trying to use it
   if (!Utils.classIsLoadable(org.apache.spark.deploy.yarn.Client)  
!Utils.isTesting) {
 printErrorAndExit(
   Could not load YARN classes.  +
   This copy of Spark may not have been compiled with YARN 
support.)
   }
-}
-
-// The following modes are not supported or applicable
-(clusterManager, deployMode) match {
-  case (MESOS, CLUSTER) =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for Mesos clusters.)
-  case (_, CLUSTER) if args.isPython =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for python applications.)
-  case (_, CLUSTER) if isShell(args.primaryResource) =
-printErrorAndExit(Cluster deploy mode is not applicable to Spark 
shells.)
-  case _ =
+  val hasHadoopEnv = sys.env.contains(HADOOP_CONF_DIR) || 
sys.env.contains(YARN_CONF_DIR)
+  if (!hasHadoopEnv  !Utils.isTesting) {
+throw new Exception(When running with master ' + args.master + 
' +
+  either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the 
environment.)
+  }
 }
 
 // If we're running a python app, set the main class to our specific 
python runner
 if (args.isPython) {
   if (args.primaryResource == PYSPARK_SHELL) {
-args.mainClass = py4j.GatewayServer
-args.childArgs = ArrayBuffer(--die-on-broken-pipe, 0)
+args.mainClass = PY4J_GATEWAYSERVER
+args.childArgs = mutable.ArrayBuffer(--die-on-broken-pipe, 0)
   } else {
 // If a python file is provided, add it to the child arguments and 
list of files to deploy.
 // Usage: PythonAppRunner main python file extra python files 
[app arguments]
-args.mainClass = org.apache.spark.deploy.PythonRunner
-args.childArgs = ArrayBuffer(args.primaryResource, args.pyFiles) 
++ args.childArgs
-args.files = mergeFileLists(args.files, args.primaryResource)
+args.mainClass = PYTHON_RUNNER
+args.childArgs =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18909253
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -83,216 +79,163 @@ object SparkSubmit {
*   (4) the main class for the child
*/
   private[spark] def createLaunchEnv(args: SparkSubmitArguments)
-  : (ArrayBuffer[String], ArrayBuffer[String], Map[String, String], 
String) = {
+  : (mutable.ArrayBuffer[String], mutable.ArrayBuffer[String], 
Map[String, String], String) = {
 
 // Values to return
-val childArgs = new ArrayBuffer[String]()
-val childClasspath = new ArrayBuffer[String]()
-val sysProps = new HashMap[String, String]()
+val childArgs = new mutable.ArrayBuffer[String]()
+val childClasspath = new mutable.ArrayBuffer[String]()
+val sysProps = new mutable.HashMap[String, String]()
 var childMainClass = 
 
-// Set the cluster manager
-val clusterManager: Int = args.master match {
-  case m if m.startsWith(yarn) = YARN
-  case m if m.startsWith(spark) = STANDALONE
-  case m if m.startsWith(mesos) = MESOS
-  case m if m.startsWith(local) = LOCAL
-  case _ = printErrorAndExit(Master must start with yarn, spark, 
mesos, or local); -1
-}
-
-// Set the deploy mode; default is client mode
-var deployMode: Int = args.deployMode match {
-  case client | null = CLIENT
-  case cluster = CLUSTER
-  case _ = printErrorAndExit(Deploy mode must be either client or 
cluster); -1
-}
-
-// Because yarn-cluster and yarn-client encapsulate both the master
-// and deploy mode, we have some logic to infer the master and deploy 
mode
-// from each other if only one is specified, or exit early if they are 
at odds.
-if (clusterManager == YARN) {
-  if (args.master == yarn-standalone) {
-printWarning(\yarn-standalone\ is deprecated. Use 
\yarn-cluster\ instead.)
-args.master = yarn-cluster
-  }
-  (args.master, args.deployMode) match {
-case (yarn-cluster, null) =
-  deployMode = CLUSTER
-case (yarn-cluster, client) =
-  printErrorAndExit(Client deploy mode is not compatible with 
master \yarn-cluster\)
-case (yarn-client, cluster) =
-  printErrorAndExit(Cluster deploy mode is not compatible with 
master \yarn-client\)
-case (_, mode) =
-  args.master = yarn- + Option(mode).getOrElse(client)
-  }
-
+if (args.clusterManagerFlag == CM_YARN) {
   // Make sure YARN is included in our build if we're trying to use it
   if (!Utils.classIsLoadable(org.apache.spark.deploy.yarn.Client)  
!Utils.isTesting) {
 printErrorAndExit(
   Could not load YARN classes.  +
   This copy of Spark may not have been compiled with YARN 
support.)
   }
-}
-
-// The following modes are not supported or applicable
-(clusterManager, deployMode) match {
-  case (MESOS, CLUSTER) =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for Mesos clusters.)
-  case (_, CLUSTER) if args.isPython =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for python applications.)
-  case (_, CLUSTER) if isShell(args.primaryResource) =
-printErrorAndExit(Cluster deploy mode is not applicable to Spark 
shells.)
-  case _ =
+  val hasHadoopEnv = sys.env.contains(HADOOP_CONF_DIR) || 
sys.env.contains(YARN_CONF_DIR)
+  if (!hasHadoopEnv  !Utils.isTesting) {
+throw new Exception(When running with master ' + args.master + 
' +
+  either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the 
environment.)
+  }
 }
 
 // If we're running a python app, set the main class to our specific 
python runner
 if (args.isPython) {
   if (args.primaryResource == PYSPARK_SHELL) {
-args.mainClass = py4j.GatewayServer
-args.childArgs = ArrayBuffer(--die-on-broken-pipe, 0)
+args.mainClass = PY4J_GATEWAYSERVER
+args.childArgs = mutable.ArrayBuffer(--die-on-broken-pipe, 0)
   } else {
 // If a python file is provided, add it to the child arguments and 
list of files to deploy.
 // Usage: PythonAppRunner main python file extra python files 
[app arguments]
-args.mainClass = org.apache.spark.deploy.PythonRunner
-args.childArgs = ArrayBuffer(args.primaryResource, args.pyFiles) 
++ args.childArgs
-args.files = mergeFileLists(args.files, args.primaryResource)
+args.mainClass = PYTHON_RUNNER
+args.childArgs =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18909359
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -83,216 +79,163 @@ object SparkSubmit {
*   (4) the main class for the child
*/
   private[spark] def createLaunchEnv(args: SparkSubmitArguments)
-  : (ArrayBuffer[String], ArrayBuffer[String], Map[String, String], 
String) = {
+  : (mutable.ArrayBuffer[String], mutable.ArrayBuffer[String], 
Map[String, String], String) = {
 
 // Values to return
-val childArgs = new ArrayBuffer[String]()
-val childClasspath = new ArrayBuffer[String]()
-val sysProps = new HashMap[String, String]()
+val childArgs = new mutable.ArrayBuffer[String]()
+val childClasspath = new mutable.ArrayBuffer[String]()
+val sysProps = new mutable.HashMap[String, String]()
 var childMainClass = 
 
-// Set the cluster manager
-val clusterManager: Int = args.master match {
-  case m if m.startsWith(yarn) = YARN
-  case m if m.startsWith(spark) = STANDALONE
-  case m if m.startsWith(mesos) = MESOS
-  case m if m.startsWith(local) = LOCAL
-  case _ = printErrorAndExit(Master must start with yarn, spark, 
mesos, or local); -1
-}
-
-// Set the deploy mode; default is client mode
-var deployMode: Int = args.deployMode match {
-  case client | null = CLIENT
-  case cluster = CLUSTER
-  case _ = printErrorAndExit(Deploy mode must be either client or 
cluster); -1
-}
-
-// Because yarn-cluster and yarn-client encapsulate both the master
-// and deploy mode, we have some logic to infer the master and deploy 
mode
-// from each other if only one is specified, or exit early if they are 
at odds.
-if (clusterManager == YARN) {
-  if (args.master == yarn-standalone) {
-printWarning(\yarn-standalone\ is deprecated. Use 
\yarn-cluster\ instead.)
-args.master = yarn-cluster
-  }
-  (args.master, args.deployMode) match {
-case (yarn-cluster, null) =
-  deployMode = CLUSTER
-case (yarn-cluster, client) =
-  printErrorAndExit(Client deploy mode is not compatible with 
master \yarn-cluster\)
-case (yarn-client, cluster) =
-  printErrorAndExit(Cluster deploy mode is not compatible with 
master \yarn-client\)
-case (_, mode) =
-  args.master = yarn- + Option(mode).getOrElse(client)
-  }
-
+if (args.clusterManagerFlag == CM_YARN) {
   // Make sure YARN is included in our build if we're trying to use it
   if (!Utils.classIsLoadable(org.apache.spark.deploy.yarn.Client)  
!Utils.isTesting) {
 printErrorAndExit(
   Could not load YARN classes.  +
   This copy of Spark may not have been compiled with YARN 
support.)
   }
-}
-
-// The following modes are not supported or applicable
-(clusterManager, deployMode) match {
-  case (MESOS, CLUSTER) =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for Mesos clusters.)
-  case (_, CLUSTER) if args.isPython =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for python applications.)
-  case (_, CLUSTER) if isShell(args.primaryResource) =
-printErrorAndExit(Cluster deploy mode is not applicable to Spark 
shells.)
-  case _ =
+  val hasHadoopEnv = sys.env.contains(HADOOP_CONF_DIR) || 
sys.env.contains(YARN_CONF_DIR)
+  if (!hasHadoopEnv  !Utils.isTesting) {
+throw new Exception(When running with master ' + args.master + 
' +
+  either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the 
environment.)
+  }
 }
 
 // If we're running a python app, set the main class to our specific 
python runner
 if (args.isPython) {
   if (args.primaryResource == PYSPARK_SHELL) {
-args.mainClass = py4j.GatewayServer
-args.childArgs = ArrayBuffer(--die-on-broken-pipe, 0)
+args.mainClass = PY4J_GATEWAYSERVER
+args.childArgs = mutable.ArrayBuffer(--die-on-broken-pipe, 0)
   } else {
 // If a python file is provided, add it to the child arguments and 
list of files to deploy.
 // Usage: PythonAppRunner main python file extra python files 
[app arguments]
-args.mainClass = org.apache.spark.deploy.PythonRunner
-args.childArgs = ArrayBuffer(args.primaryResource, args.pyFiles) 
++ args.childArgs
-args.files = mergeFileLists(args.files, args.primaryResource)
+args.mainClass = PYTHON_RUNNER
+args.childArgs =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18909554
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -83,216 +79,163 @@ object SparkSubmit {
*   (4) the main class for the child
*/
   private[spark] def createLaunchEnv(args: SparkSubmitArguments)
-  : (ArrayBuffer[String], ArrayBuffer[String], Map[String, String], 
String) = {
+  : (mutable.ArrayBuffer[String], mutable.ArrayBuffer[String], 
Map[String, String], String) = {
 
 // Values to return
-val childArgs = new ArrayBuffer[String]()
-val childClasspath = new ArrayBuffer[String]()
-val sysProps = new HashMap[String, String]()
+val childArgs = new mutable.ArrayBuffer[String]()
+val childClasspath = new mutable.ArrayBuffer[String]()
+val sysProps = new mutable.HashMap[String, String]()
 var childMainClass = 
 
-// Set the cluster manager
-val clusterManager: Int = args.master match {
-  case m if m.startsWith(yarn) = YARN
-  case m if m.startsWith(spark) = STANDALONE
-  case m if m.startsWith(mesos) = MESOS
-  case m if m.startsWith(local) = LOCAL
-  case _ = printErrorAndExit(Master must start with yarn, spark, 
mesos, or local); -1
-}
-
-// Set the deploy mode; default is client mode
-var deployMode: Int = args.deployMode match {
-  case client | null = CLIENT
-  case cluster = CLUSTER
-  case _ = printErrorAndExit(Deploy mode must be either client or 
cluster); -1
-}
-
-// Because yarn-cluster and yarn-client encapsulate both the master
-// and deploy mode, we have some logic to infer the master and deploy 
mode
-// from each other if only one is specified, or exit early if they are 
at odds.
-if (clusterManager == YARN) {
-  if (args.master == yarn-standalone) {
-printWarning(\yarn-standalone\ is deprecated. Use 
\yarn-cluster\ instead.)
-args.master = yarn-cluster
-  }
-  (args.master, args.deployMode) match {
-case (yarn-cluster, null) =
-  deployMode = CLUSTER
-case (yarn-cluster, client) =
-  printErrorAndExit(Client deploy mode is not compatible with 
master \yarn-cluster\)
-case (yarn-client, cluster) =
-  printErrorAndExit(Cluster deploy mode is not compatible with 
master \yarn-client\)
-case (_, mode) =
-  args.master = yarn- + Option(mode).getOrElse(client)
-  }
-
+if (args.clusterManagerFlag == CM_YARN) {
   // Make sure YARN is included in our build if we're trying to use it
   if (!Utils.classIsLoadable(org.apache.spark.deploy.yarn.Client)  
!Utils.isTesting) {
 printErrorAndExit(
   Could not load YARN classes.  +
   This copy of Spark may not have been compiled with YARN 
support.)
   }
-}
-
-// The following modes are not supported or applicable
-(clusterManager, deployMode) match {
-  case (MESOS, CLUSTER) =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for Mesos clusters.)
-  case (_, CLUSTER) if args.isPython =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for python applications.)
-  case (_, CLUSTER) if isShell(args.primaryResource) =
-printErrorAndExit(Cluster deploy mode is not applicable to Spark 
shells.)
-  case _ =
+  val hasHadoopEnv = sys.env.contains(HADOOP_CONF_DIR) || 
sys.env.contains(YARN_CONF_DIR)
+  if (!hasHadoopEnv  !Utils.isTesting) {
+throw new Exception(When running with master ' + args.master + 
' +
+  either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the 
environment.)
+  }
 }
 
 // If we're running a python app, set the main class to our specific 
python runner
 if (args.isPython) {
   if (args.primaryResource == PYSPARK_SHELL) {
-args.mainClass = py4j.GatewayServer
-args.childArgs = ArrayBuffer(--die-on-broken-pipe, 0)
+args.mainClass = PY4J_GATEWAYSERVER
+args.childArgs = mutable.ArrayBuffer(--die-on-broken-pipe, 0)
   } else {
 // If a python file is provided, add it to the child arguments and 
list of files to deploy.
 // Usage: PythonAppRunner main python file extra python files 
[app arguments]
-args.mainClass = org.apache.spark.deploy.PythonRunner
-args.childArgs = ArrayBuffer(args.primaryResource, args.pyFiles) 
++ args.childArgs
-args.files = mergeFileLists(args.files, args.primaryResource)
+args.mainClass = PYTHON_RUNNER
+args.childArgs =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18909952
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -83,216 +79,163 @@ object SparkSubmit {
*   (4) the main class for the child
*/
   private[spark] def createLaunchEnv(args: SparkSubmitArguments)
-  : (ArrayBuffer[String], ArrayBuffer[String], Map[String, String], 
String) = {
+  : (mutable.ArrayBuffer[String], mutable.ArrayBuffer[String], 
Map[String, String], String) = {
 
 // Values to return
-val childArgs = new ArrayBuffer[String]()
-val childClasspath = new ArrayBuffer[String]()
-val sysProps = new HashMap[String, String]()
+val childArgs = new mutable.ArrayBuffer[String]()
+val childClasspath = new mutable.ArrayBuffer[String]()
+val sysProps = new mutable.HashMap[String, String]()
 var childMainClass = 
 
-// Set the cluster manager
-val clusterManager: Int = args.master match {
-  case m if m.startsWith(yarn) = YARN
-  case m if m.startsWith(spark) = STANDALONE
-  case m if m.startsWith(mesos) = MESOS
-  case m if m.startsWith(local) = LOCAL
-  case _ = printErrorAndExit(Master must start with yarn, spark, 
mesos, or local); -1
-}
-
-// Set the deploy mode; default is client mode
-var deployMode: Int = args.deployMode match {
-  case client | null = CLIENT
-  case cluster = CLUSTER
-  case _ = printErrorAndExit(Deploy mode must be either client or 
cluster); -1
-}
-
-// Because yarn-cluster and yarn-client encapsulate both the master
-// and deploy mode, we have some logic to infer the master and deploy 
mode
-// from each other if only one is specified, or exit early if they are 
at odds.
-if (clusterManager == YARN) {
-  if (args.master == yarn-standalone) {
-printWarning(\yarn-standalone\ is deprecated. Use 
\yarn-cluster\ instead.)
-args.master = yarn-cluster
-  }
-  (args.master, args.deployMode) match {
-case (yarn-cluster, null) =
-  deployMode = CLUSTER
-case (yarn-cluster, client) =
-  printErrorAndExit(Client deploy mode is not compatible with 
master \yarn-cluster\)
-case (yarn-client, cluster) =
-  printErrorAndExit(Cluster deploy mode is not compatible with 
master \yarn-client\)
-case (_, mode) =
-  args.master = yarn- + Option(mode).getOrElse(client)
-  }
-
+if (args.clusterManagerFlag == CM_YARN) {
   // Make sure YARN is included in our build if we're trying to use it
   if (!Utils.classIsLoadable(org.apache.spark.deploy.yarn.Client)  
!Utils.isTesting) {
 printErrorAndExit(
   Could not load YARN classes.  +
   This copy of Spark may not have been compiled with YARN 
support.)
   }
-}
-
-// The following modes are not supported or applicable
-(clusterManager, deployMode) match {
-  case (MESOS, CLUSTER) =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for Mesos clusters.)
-  case (_, CLUSTER) if args.isPython =
-printErrorAndExit(Cluster deploy mode is currently not supported 
for python applications.)
-  case (_, CLUSTER) if isShell(args.primaryResource) =
-printErrorAndExit(Cluster deploy mode is not applicable to Spark 
shells.)
-  case _ =
+  val hasHadoopEnv = sys.env.contains(HADOOP_CONF_DIR) || 
sys.env.contains(YARN_CONF_DIR)
+  if (!hasHadoopEnv  !Utils.isTesting) {
+throw new Exception(When running with master ' + args.master + 
' +
+  either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the 
environment.)
+  }
 }
 
 // If we're running a python app, set the main class to our specific 
python runner
 if (args.isPython) {
   if (args.primaryResource == PYSPARK_SHELL) {
-args.mainClass = py4j.GatewayServer
-args.childArgs = ArrayBuffer(--die-on-broken-pipe, 0)
+args.mainClass = PY4J_GATEWAYSERVER
+args.childArgs = mutable.ArrayBuffer(--die-on-broken-pipe, 0)
   } else {
 // If a python file is provided, add it to the child arguments and 
list of files to deploy.
 // Usage: PythonAppRunner main python file extra python files 
[app arguments]
-args.mainClass = org.apache.spark.deploy.PythonRunner
-args.childArgs = ArrayBuffer(args.primaryResource, args.pyFiles) 
++ args.childArgs
-args.files = mergeFileLists(args.files, args.primaryResource)
+args.mainClass = PYTHON_RUNNER
+args.childArgs =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18910093
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -397,9 +347,9 @@ object SparkSubmit {
  * Provides an indirection layer for passing arguments as system 
properties or flags to
  * the user's driver program or to downstream launcher tools.
  */
-private[spark] case class OptionAssigner(
-value: String,
-clusterManager: Int,
-deployMode: Int,
-clOption: String = null,
-sysProp: String = null)
+private[spark] case class OptionAssigner(configKey: String,
+ clusterManager: Int,
--- End diff --

Still broken here too. Style is:

def  foo(
arg1: Blah,
arg2: Blah) {



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18910401
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,201 +17,286 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io._
 import java.util.jar.JarFile
 
-import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import scala.collection._
 
 import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
+import org.apache.spark.deploy.SparkSubmitArguments._
 
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- * The env argument is used for testing.
- */
-private[spark] class SparkSubmitArguments(args: Seq[String], env: 
Map[String, String] = sys.env) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
+ * Pulls and validates configuration information together in order of 
priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Legacy environment variables
+ * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 5. hard coded defaults
+ *
+*/
+private[spark] class SparkSubmitArguments(args: Seq[String]) {
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala.
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SPARK_MASTER)
+  def master_= (value: String):Unit = conf.put(SPARK_MASTER, value)
+
+  def executorMemory = conf(SPARK_EXECUTOR_MEMORY)
+  def executorMemory_= (value: String):Unit = 
conf.put(SPARK_EXECUTOR_MEMORY, value)
+
+  def executorCores = conf(SPARK_EXECUTOR_CORES)
+  def executorCores_= (value: String):Unit = 
conf.put(SPARK_EXECUTOR_CORES, value)
+
+  def totalExecutorCores = conf.get(SPARK_CORES_MAX)
+  def totalExecutorCores_= (value: String):Unit = 
conf.put(SPARK_CORES_MAX, value)
+
+  def driverMemory = conf(SPARK_DRIVER_MEMORY)
+  def driverMemory_= (value: String):Unit = conf.put(SPARK_DRIVER_MEMORY, 
value)
+
+  def driverExtraClassPath = conf.get(SPARK_DRIVER_EXTRA_CLASSPATH)
+  def driverExtraClassPath_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_CLASSPATH, value)
+
+  def driverExtraLibraryPath = conf.get(SPARK_DRIVER_EXTRA_LIBRARY_PATH)
+  def driverExtraLibraryPath_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_LIBRARY_PATH, value)
+
+  def driverExtraJavaOptions = conf.get(SPARK_DRIVER_EXTRA_JAVA_OPTIONS)
+  def driverExtraJavaOptions_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_JAVA_OPTIONS, value)
+

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18910527
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,201 +17,286 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io._
 import java.util.jar.JarFile
 
-import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import scala.collection._
 
 import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
+import org.apache.spark.deploy.SparkSubmitArguments._
 
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- * The env argument is used for testing.
- */
-private[spark] class SparkSubmitArguments(args: Seq[String], env: 
Map[String, String] = sys.env) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
+ * Pulls and validates configuration information together in order of 
priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Legacy environment variables
+ * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 5. hard coded defaults
+ *
+*/
+private[spark] class SparkSubmitArguments(args: Seq[String]) {
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala.
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SPARK_MASTER)
+  def master_= (value: String):Unit = conf.put(SPARK_MASTER, value)
+
+  def executorMemory = conf(SPARK_EXECUTOR_MEMORY)
+  def executorMemory_= (value: String):Unit = 
conf.put(SPARK_EXECUTOR_MEMORY, value)
+
+  def executorCores = conf(SPARK_EXECUTOR_CORES)
+  def executorCores_= (value: String):Unit = 
conf.put(SPARK_EXECUTOR_CORES, value)
+
+  def totalExecutorCores = conf.get(SPARK_CORES_MAX)
+  def totalExecutorCores_= (value: String):Unit = 
conf.put(SPARK_CORES_MAX, value)
+
+  def driverMemory = conf(SPARK_DRIVER_MEMORY)
+  def driverMemory_= (value: String):Unit = conf.put(SPARK_DRIVER_MEMORY, 
value)
+
+  def driverExtraClassPath = conf.get(SPARK_DRIVER_EXTRA_CLASSPATH)
+  def driverExtraClassPath_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_CLASSPATH, value)
+
+  def driverExtraLibraryPath = conf.get(SPARK_DRIVER_EXTRA_LIBRARY_PATH)
+  def driverExtraLibraryPath_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_LIBRARY_PATH, value)
+
+  def driverExtraJavaOptions = conf.get(SPARK_DRIVER_EXTRA_JAVA_OPTIONS)
+  def driverExtraJavaOptions_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_JAVA_OPTIONS, value)
+

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18910722
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,201 +17,286 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io._
 import java.util.jar.JarFile
 
-import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import scala.collection._
 
 import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
+import org.apache.spark.deploy.SparkSubmitArguments._
 
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- * The env argument is used for testing.
- */
-private[spark] class SparkSubmitArguments(args: Seq[String], env: 
Map[String, String] = sys.env) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
+ * Pulls and validates configuration information together in order of 
priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Legacy environment variables
+ * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 5. hard coded defaults
+ *
+*/
+private[spark] class SparkSubmitArguments(args: Seq[String]) {
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala.
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SPARK_MASTER)
+  def master_= (value: String):Unit = conf.put(SPARK_MASTER, value)
+
+  def executorMemory = conf(SPARK_EXECUTOR_MEMORY)
+  def executorMemory_= (value: String):Unit = 
conf.put(SPARK_EXECUTOR_MEMORY, value)
+
+  def executorCores = conf(SPARK_EXECUTOR_CORES)
+  def executorCores_= (value: String):Unit = 
conf.put(SPARK_EXECUTOR_CORES, value)
+
+  def totalExecutorCores = conf.get(SPARK_CORES_MAX)
+  def totalExecutorCores_= (value: String):Unit = 
conf.put(SPARK_CORES_MAX, value)
+
+  def driverMemory = conf(SPARK_DRIVER_MEMORY)
+  def driverMemory_= (value: String):Unit = conf.put(SPARK_DRIVER_MEMORY, 
value)
+
+  def driverExtraClassPath = conf.get(SPARK_DRIVER_EXTRA_CLASSPATH)
+  def driverExtraClassPath_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_CLASSPATH, value)
+
+  def driverExtraLibraryPath = conf.get(SPARK_DRIVER_EXTRA_LIBRARY_PATH)
+  def driverExtraLibraryPath_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_LIBRARY_PATH, value)
+
+  def driverExtraJavaOptions = conf.get(SPARK_DRIVER_EXTRA_JAVA_OPTIONS)
+  def driverExtraJavaOptions_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_JAVA_OPTIONS, value)
+

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18911158
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -227,91 +312,92 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, St
  */
 def parse(opts: Seq[String]): Unit = opts match {
   case (--name) :: value :: tail =
-name = value
+cmdLineConfig.put(SPARK_APP_NAME, value)
 parse(tail)
 
   case (--master) :: value :: tail =
-master = value
+cmdLineConfig.put(SPARK_MASTER, value)
 parse(tail)
 
   case (--class) :: value :: tail =
-mainClass = value
+cmdLineConfig.put(SPARK_APP_CLASS, value)
 parse(tail)
 
   case (--deploy-mode) :: value :: tail =
-if (value != client  value != cluster) {
-  SparkSubmit.printErrorAndExit(--deploy-mode must be either 
\client\ or \cluster\)
-}
-deployMode = value
+cmdLineConfig.put(SPARK_DEPLOY_MODE, value)
 parse(tail)
 
   case (--num-executors) :: value :: tail =
-numExecutors = value
+cmdLineConfig.put(SPARK_EXECUTOR_INSTANCES, value)
 parse(tail)
 
   case (--total-executor-cores) :: value :: tail =
-totalExecutorCores = value
+cmdLineConfig.put(SPARK_CORES_MAX, value)
 parse(tail)
 
   case (--executor-cores) :: value :: tail =
-executorCores = value
+cmdLineConfig.put(SPARK_EXECUTOR_CORES, value)
 parse(tail)
 
   case (--executor-memory) :: value :: tail =
-executorMemory = value
+cmdLineConfig.put(SPARK_EXECUTOR_MEMORY, value)
 parse(tail)
 
   case (--driver-memory) :: value :: tail =
-driverMemory = value
+cmdLineConfig.put(SPARK_DRIVER_MEMORY, value)
 parse(tail)
 
   case (--driver-cores) :: value :: tail =
-driverCores = value
+cmdLineConfig.put(SPARK_DRIVER_CORES, value)
 parse(tail)
 
   case (--driver-class-path) :: value :: tail =
-driverExtraClassPath = value
+cmdLineConfig.put(SPARK_DRIVER_EXTRA_CLASSPATH, value)
 parse(tail)
 
   case (--driver-java-options) :: value :: tail =
-driverExtraJavaOptions = value
+cmdLineConfig.put(SPARK_DRIVER_EXTRA_JAVA_OPTIONS, value)
 parse(tail)
 
   case (--driver-library-path) :: value :: tail =
-driverExtraLibraryPath = value
+cmdLineConfig.put(SPARK_DRIVER_EXTRA_LIBRARY_PATH, value)
 parse(tail)
 
   case (--properties-file) :: value :: tail =
-propertiesFile = value
+/*  We merge the property file config options into the rest of the 
command lines options
--- End diff --

nit: see previous comment about multi-line comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18911121
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,201 +17,286 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io._
 import java.util.jar.JarFile
 
-import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import scala.collection._
 
 import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
+import org.apache.spark.deploy.SparkSubmitArguments._
 
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- * The env argument is used for testing.
- */
-private[spark] class SparkSubmitArguments(args: Seq[String], env: 
Map[String, String] = sys.env) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
+ * Pulls and validates configuration information together in order of 
priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Legacy environment variables
+ * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 5. hard coded defaults
+ *
+*/
+private[spark] class SparkSubmitArguments(args: Seq[String]) {
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala.
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SPARK_MASTER)
+  def master_= (value: String):Unit = conf.put(SPARK_MASTER, value)
+
+  def executorMemory = conf(SPARK_EXECUTOR_MEMORY)
+  def executorMemory_= (value: String):Unit = 
conf.put(SPARK_EXECUTOR_MEMORY, value)
+
+  def executorCores = conf(SPARK_EXECUTOR_CORES)
+  def executorCores_= (value: String):Unit = 
conf.put(SPARK_EXECUTOR_CORES, value)
+
+  def totalExecutorCores = conf.get(SPARK_CORES_MAX)
+  def totalExecutorCores_= (value: String):Unit = 
conf.put(SPARK_CORES_MAX, value)
+
+  def driverMemory = conf(SPARK_DRIVER_MEMORY)
+  def driverMemory_= (value: String):Unit = conf.put(SPARK_DRIVER_MEMORY, 
value)
+
+  def driverExtraClassPath = conf.get(SPARK_DRIVER_EXTRA_CLASSPATH)
+  def driverExtraClassPath_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_CLASSPATH, value)
+
+  def driverExtraLibraryPath = conf.get(SPARK_DRIVER_EXTRA_LIBRARY_PATH)
+  def driverExtraLibraryPath_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_LIBRARY_PATH, value)
+
+  def driverExtraJavaOptions = conf.get(SPARK_DRIVER_EXTRA_JAVA_OPTIONS)
+  def driverExtraJavaOptions_= (value: String):Unit = 
conf.put(SPARK_DRIVER_EXTRA_JAVA_OPTIONS, value)
+

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18911187
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -227,91 +312,92 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, St
  */
 def parse(opts: Seq[String]): Unit = opts match {
   case (--name) :: value :: tail =
-name = value
+cmdLineConfig.put(SPARK_APP_NAME, value)
 parse(tail)
 
   case (--master) :: value :: tail =
-master = value
+cmdLineConfig.put(SPARK_MASTER, value)
 parse(tail)
 
   case (--class) :: value :: tail =
-mainClass = value
+cmdLineConfig.put(SPARK_APP_CLASS, value)
 parse(tail)
 
   case (--deploy-mode) :: value :: tail =
-if (value != client  value != cluster) {
-  SparkSubmit.printErrorAndExit(--deploy-mode must be either 
\client\ or \cluster\)
-}
-deployMode = value
+cmdLineConfig.put(SPARK_DEPLOY_MODE, value)
 parse(tail)
 
   case (--num-executors) :: value :: tail =
-numExecutors = value
+cmdLineConfig.put(SPARK_EXECUTOR_INSTANCES, value)
 parse(tail)
 
   case (--total-executor-cores) :: value :: tail =
-totalExecutorCores = value
+cmdLineConfig.put(SPARK_CORES_MAX, value)
 parse(tail)
 
   case (--executor-cores) :: value :: tail =
-executorCores = value
+cmdLineConfig.put(SPARK_EXECUTOR_CORES, value)
 parse(tail)
 
   case (--executor-memory) :: value :: tail =
-executorMemory = value
+cmdLineConfig.put(SPARK_EXECUTOR_MEMORY, value)
 parse(tail)
 
   case (--driver-memory) :: value :: tail =
-driverMemory = value
+cmdLineConfig.put(SPARK_DRIVER_MEMORY, value)
 parse(tail)
 
   case (--driver-cores) :: value :: tail =
-driverCores = value
+cmdLineConfig.put(SPARK_DRIVER_CORES, value)
 parse(tail)
 
   case (--driver-class-path) :: value :: tail =
-driverExtraClassPath = value
+cmdLineConfig.put(SPARK_DRIVER_EXTRA_CLASSPATH, value)
 parse(tail)
 
   case (--driver-java-options) :: value :: tail =
-driverExtraJavaOptions = value
+cmdLineConfig.put(SPARK_DRIVER_EXTRA_JAVA_OPTIONS, value)
 parse(tail)
 
   case (--driver-library-path) :: value :: tail =
-driverExtraLibraryPath = value
+cmdLineConfig.put(SPARK_DRIVER_EXTRA_LIBRARY_PATH, value)
 parse(tail)
 
   case (--properties-file) :: value :: tail =
-propertiesFile = value
+/*  We merge the property file config options into the rest of the 
command lines options
+ *  after we have finished the rest of the command line processing 
as property files
+ *  cannot override explicit command line options .
--- End diff --

nit: extra space before `.`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18911284
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -227,91 +312,92 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, St
  */
 def parse(opts: Seq[String]): Unit = opts match {
   case (--name) :: value :: tail =
-name = value
+cmdLineConfig.put(SPARK_APP_NAME, value)
 parse(tail)
 
   case (--master) :: value :: tail =
-master = value
+cmdLineConfig.put(SPARK_MASTER, value)
 parse(tail)
 
   case (--class) :: value :: tail =
-mainClass = value
+cmdLineConfig.put(SPARK_APP_CLASS, value)
 parse(tail)
 
   case (--deploy-mode) :: value :: tail =
-if (value != client  value != cluster) {
-  SparkSubmit.printErrorAndExit(--deploy-mode must be either 
\client\ or \cluster\)
-}
-deployMode = value
+cmdLineConfig.put(SPARK_DEPLOY_MODE, value)
 parse(tail)
 
   case (--num-executors) :: value :: tail =
-numExecutors = value
+cmdLineConfig.put(SPARK_EXECUTOR_INSTANCES, value)
 parse(tail)
 
   case (--total-executor-cores) :: value :: tail =
-totalExecutorCores = value
+cmdLineConfig.put(SPARK_CORES_MAX, value)
 parse(tail)
 
   case (--executor-cores) :: value :: tail =
-executorCores = value
+cmdLineConfig.put(SPARK_EXECUTOR_CORES, value)
 parse(tail)
 
   case (--executor-memory) :: value :: tail =
-executorMemory = value
+cmdLineConfig.put(SPARK_EXECUTOR_MEMORY, value)
 parse(tail)
 
   case (--driver-memory) :: value :: tail =
-driverMemory = value
+cmdLineConfig.put(SPARK_DRIVER_MEMORY, value)
 parse(tail)
 
   case (--driver-cores) :: value :: tail =
-driverCores = value
+cmdLineConfig.put(SPARK_DRIVER_CORES, value)
 parse(tail)
 
   case (--driver-class-path) :: value :: tail =
-driverExtraClassPath = value
+cmdLineConfig.put(SPARK_DRIVER_EXTRA_CLASSPATH, value)
 parse(tail)
 
   case (--driver-java-options) :: value :: tail =
-driverExtraJavaOptions = value
+cmdLineConfig.put(SPARK_DRIVER_EXTRA_JAVA_OPTIONS, value)
 parse(tail)
 
   case (--driver-library-path) :: value :: tail =
-driverExtraLibraryPath = value
+cmdLineConfig.put(SPARK_DRIVER_EXTRA_LIBRARY_PATH, value)
 parse(tail)
 
   case (--properties-file) :: value :: tail =
-propertiesFile = value
+/*  We merge the property file config options into the rest of the 
command lines options
+ *  after we have finished the rest of the command line processing 
as property files
+ *  cannot override explicit command line options .
+ */
+cmdLinePropertyFileValues ++= 
Utils.getPropertyValuesFromFile(value)
--- End diff --

So, this is actually introducing different behavior from before.

A command line like this:

--properties-file foo --properties-file bar

Would only load bar before, but now it's loading both.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18911428
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -398,22 +478,117 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, St
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Default property values - string literals are defined in 
ConfigConstants.scala
+   */
+  val DEFAULTS = Map(
+SPARK_MASTER - local[*],
+SPARK_VERBOSE - false,
+SPARK_DEPLOY_MODE - client,
+SPARK_EXECUTOR_MEMORY - 1g,
+SPARK_EXECUTOR_CORES - 1 ,
+SPARK_EXECUTOR_INSTANCES - 2,
+SPARK_DRIVER_MEMORY - 512m,
+SPARK_DRIVER_CORES - 1,
+SPARK_DRIVER_SUPERVISE - false,
+SPARK_YARN_QUEUE - default,
+SPARK_EXECUTOR_INSTANCES - 2
+  )
+
+  /**
+   * Config items that should only be set from the command line
+   */
+  val CMD_LINE_ONLY_KEYS = Set (
+SPARK_VERBOSE,
+SPARK_APP_CLASS,
+SPARK_APP_PRIMARY_RESOURCE
+  )
+
+  /**
+   * Used to support legacy environment variable mappings
+   */
+  val LEGACY_ENV_VARS = Map (
+MASTER - SPARK_MASTER,
+DEPLOY_MODE - SPARK_DEPLOY_MODE,
+SPARK_DRIVER_MEMORY - SPARK_DRIVER_MEMORY,
+SPARK_EXECUTOR_MEMORY - SPARK_EXECUTOR_MEMORY
+  )
+
+  /**
+   * Function returns the spark submit default config map 
(Map[configName-ConfigValue])
+   * Function is over-writable to allow for easier debugging
+   */
+  private[spark] var getHardCodedDefaultValues: () = Map[String, String] 
= () = {
--- End diff --

This feels like a long way around to achieve something simple: why not just 
an argument to `mergeSparkProperties()` with the default values?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18911516
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -398,22 +478,117 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, St
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Default property values - string literals are defined in 
ConfigConstants.scala
+   */
+  val DEFAULTS = Map(
+SPARK_MASTER - local[*],
+SPARK_VERBOSE - false,
+SPARK_DEPLOY_MODE - client,
+SPARK_EXECUTOR_MEMORY - 1g,
+SPARK_EXECUTOR_CORES - 1 ,
+SPARK_EXECUTOR_INSTANCES - 2,
+SPARK_DRIVER_MEMORY - 512m,
+SPARK_DRIVER_CORES - 1,
+SPARK_DRIVER_SUPERVISE - false,
+SPARK_YARN_QUEUE - default,
+SPARK_EXECUTOR_INSTANCES - 2
+  )
+
+  /**
+   * Config items that should only be set from the command line
+   */
+  val CMD_LINE_ONLY_KEYS = Set (
+SPARK_VERBOSE,
+SPARK_APP_CLASS,
+SPARK_APP_PRIMARY_RESOURCE
+  )
+
+  /**
+   * Used to support legacy environment variable mappings
+   */
+  val LEGACY_ENV_VARS = Map (
+MASTER - SPARK_MASTER,
+DEPLOY_MODE - SPARK_DEPLOY_MODE,
+SPARK_DRIVER_MEMORY - SPARK_DRIVER_MEMORY,
+SPARK_EXECUTOR_MEMORY - SPARK_EXECUTOR_MEMORY
+  )
+
+  /**
+   * Function returns the spark submit default config map 
(Map[configName-ConfigValue])
+   * Function is over-writable to allow for easier debugging
+   */
+  private[spark] var getHardCodedDefaultValues: () = Map[String, String] 
= () = {
+DEFAULTS
+  }
+
+  /**
+   * System environment variables.
+   * Function is over-writable to allow for easier debugging
+   */
+  private[spark] var genEnvVars: () = Map[String, String] = () =
--- End diff --

Similar to previous. You can use arguments with default values if you want 
to encapsulate the actual default implementation within this class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18911832
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -398,22 +478,117 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String], env: Map[String, St
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Default property values - string literals are defined in 
ConfigConstants.scala
+   */
+  val DEFAULTS = Map(
+SPARK_MASTER - local[*],
+SPARK_VERBOSE - false,
+SPARK_DEPLOY_MODE - client,
+SPARK_EXECUTOR_MEMORY - 1g,
+SPARK_EXECUTOR_CORES - 1 ,
+SPARK_EXECUTOR_INSTANCES - 2,
+SPARK_DRIVER_MEMORY - 512m,
+SPARK_DRIVER_CORES - 1,
+SPARK_DRIVER_SUPERVISE - false,
+SPARK_YARN_QUEUE - default,
+SPARK_EXECUTOR_INSTANCES - 2
+  )
+
+  /**
+   * Config items that should only be set from the command line
+   */
+  val CMD_LINE_ONLY_KEYS = Set (
+SPARK_VERBOSE,
+SPARK_APP_CLASS,
+SPARK_APP_PRIMARY_RESOURCE
+  )
+
+  /**
+   * Used to support legacy environment variable mappings
+   */
+  val LEGACY_ENV_VARS = Map (
+MASTER - SPARK_MASTER,
+DEPLOY_MODE - SPARK_DEPLOY_MODE,
+SPARK_DRIVER_MEMORY - SPARK_DRIVER_MEMORY,
+SPARK_EXECUTOR_MEMORY - SPARK_EXECUTOR_MEMORY
+  )
+
+  /**
+   * Function returns the spark submit default config map 
(Map[configName-ConfigValue])
+   * Function is over-writable to allow for easier debugging
+   */
+  private[spark] var getHardCodedDefaultValues: () = Map[String, String] 
= () = {
+DEFAULTS
+  }
+
+  /**
+   * System environment variables.
+   * Function is over-writable to allow for easier debugging
+   */
+  private[spark] var genEnvVars: () = Map[String, String] = () =
+sys.env.filterKeys( x = x.toLowerCase.startsWith(spark) )
+
+  /**
+   * Gets configuration from reading SPARK_CONF_DIR/spark-defaults.conf if 
it exists
+   * otherwise reads SPARK_HOME/conf/spark-defaults.conf if it exists
+   * otherwise returns an empty config structure
+   * Function is over-writable to allow for easier debugging
--- End diff --

So, third time you're adding these... yet I don't see you using the 
overridability feature anywhere. I'd understand if you were using this in the 
tests, but what's your goal here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18911938
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala 
---
@@ -50,71 +51,69 @@ private[spark] object SparkSubmitDriverBootstrapper {
 val javaOpts = sys.env(JAVA_OPTS)
 val defaultDriverMemory = sys.env(OUR_JAVA_MEM)
 
-// Spark submit specific environment variables
-val deployMode = sys.env(SPARK_SUBMIT_DEPLOY_MODE)
-val propertiesFile = sys.env(SPARK_SUBMIT_PROPERTIES_FILE)
+// SPARK_SUBMIT_BOOTSTRAP_DRIVER is used for runtime validation
 val bootstrapDriver = sys.env(SPARK_SUBMIT_BOOTSTRAP_DRIVER)
-val submitDriverMemory = sys.env.get(SPARK_SUBMIT_DRIVER_MEMORY)
-val submitLibraryPath = sys.env.get(SPARK_SUBMIT_LIBRARY_PATH)
-val submitClasspath = sys.env.get(SPARK_SUBMIT_CLASSPATH)
-val submitJavaOpts = sys.env.get(SPARK_SUBMIT_OPTS)
+
+// list of environment variables that override differently named 
properties
+val envOverides = Map( OUR_JAVA_MEM - SPARK_DRIVER_MEMORY,
+  SPARK_SUBMIT_DEPLOY_MODE - SPARK_DEPLOY_MODE,
+  SPARK_SUBMIT_DRIVER_MEMORY - SPARK_DRIVER_MEMORY,
+  SPARK_SUBMIT_LIBRARY_PATH - SPARK_DRIVER_EXTRA_LIBRARY_PATH,
+  SPARK_SUBMIT_CLASSPATH - SPARK_DRIVER_EXTRA_CLASSPATH,
+  SPARK_SUBMIT_OPTS - SPARK_DRIVER_EXTRA_JAVA_OPTIONS
+)
+
+/* SPARK_SUBMIT environment variables are treated as the highest 
priority source
--- End diff --

nit: use single-line comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18911947
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala 
---
@@ -50,71 +51,69 @@ private[spark] object SparkSubmitDriverBootstrapper {
 val javaOpts = sys.env(JAVA_OPTS)
 val defaultDriverMemory = sys.env(OUR_JAVA_MEM)
 
-// Spark submit specific environment variables
-val deployMode = sys.env(SPARK_SUBMIT_DEPLOY_MODE)
-val propertiesFile = sys.env(SPARK_SUBMIT_PROPERTIES_FILE)
+// SPARK_SUBMIT_BOOTSTRAP_DRIVER is used for runtime validation
 val bootstrapDriver = sys.env(SPARK_SUBMIT_BOOTSTRAP_DRIVER)
-val submitDriverMemory = sys.env.get(SPARK_SUBMIT_DRIVER_MEMORY)
-val submitLibraryPath = sys.env.get(SPARK_SUBMIT_LIBRARY_PATH)
-val submitClasspath = sys.env.get(SPARK_SUBMIT_CLASSPATH)
-val submitJavaOpts = sys.env.get(SPARK_SUBMIT_OPTS)
+
+// list of environment variables that override differently named 
properties
+val envOverides = Map( OUR_JAVA_MEM - SPARK_DRIVER_MEMORY,
--- End diff --

nit: no space after `(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18911964
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala 
---
@@ -50,71 +51,69 @@ private[spark] object SparkSubmitDriverBootstrapper {
 val javaOpts = sys.env(JAVA_OPTS)
 val defaultDriverMemory = sys.env(OUR_JAVA_MEM)
 
-// Spark submit specific environment variables
-val deployMode = sys.env(SPARK_SUBMIT_DEPLOY_MODE)
-val propertiesFile = sys.env(SPARK_SUBMIT_PROPERTIES_FILE)
+// SPARK_SUBMIT_BOOTSTRAP_DRIVER is used for runtime validation
 val bootstrapDriver = sys.env(SPARK_SUBMIT_BOOTSTRAP_DRIVER)
-val submitDriverMemory = sys.env.get(SPARK_SUBMIT_DRIVER_MEMORY)
-val submitLibraryPath = sys.env.get(SPARK_SUBMIT_LIBRARY_PATH)
-val submitClasspath = sys.env.get(SPARK_SUBMIT_CLASSPATH)
-val submitJavaOpts = sys.env.get(SPARK_SUBMIT_OPTS)
+
+// list of environment variables that override differently named 
properties
+val envOverides = Map( OUR_JAVA_MEM - SPARK_DRIVER_MEMORY,
+  SPARK_SUBMIT_DEPLOY_MODE - SPARK_DEPLOY_MODE,
+  SPARK_SUBMIT_DRIVER_MEMORY - SPARK_DRIVER_MEMORY,
+  SPARK_SUBMIT_LIBRARY_PATH - SPARK_DRIVER_EXTRA_LIBRARY_PATH,
+  SPARK_SUBMIT_CLASSPATH - SPARK_DRIVER_EXTRA_CLASSPATH,
+  SPARK_SUBMIT_OPTS - SPARK_DRIVER_EXTRA_JAVA_OPTIONS
+)
+
+/* SPARK_SUBMIT environment variables are treated as the highest 
priority source
+ *  of config information for their respective config variable (as 
listed in envOverrides)
+ */
+val submitEnvVars = new HashMap() ++ envOverides
+  .map { case(varName, propName) = (sys.env.get(varName), propName) }
+  .filter { case(variable, _) = variable.isDefined }
+  .map { case(variable, propName) = propName - variable.get }
+
+// Property file loading comes after all SPARK* env variables are 
processed and should not
+// overwrite existing SPARK env variables
+sys.env.get(SPARK_SUBMIT_PROPERTIES_FILE)
+.flatMap ( Utils.getFileIfExists )
+.map ( Utils.loadPropFile )
+.getOrElse(Map.empty)
+.foreach { case(k,v) =
+  submitEnvVars.getOrElseUpdate(k,v)
+}
+
+ /* See docco for SparkSubmitArguments to see the various config 
sources and their priority.
--- End diff --

alignment is wrong, also, use single-line comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18912066
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -1479,6 +1479,14 @@ private[spark] object Utils extends Logging {
 PropertyConfigurator.configure(pro)
   }
 
+  /**
+   * Flatten a map of maps out into a single map, later maps in the 
propList
--- End diff --

Code and comment still seem to disagree.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-15 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/2516#issuecomment-59251504
  
@tigerquoll you'll need to merge this with current master, since there are 
conflicts. You may be able to clean up some code since the PR I mentioned 
before is now checked in.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-14 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/2516#issuecomment-59116654
  
For our use cases, it should be either one `/` or three `///`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-13 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/2516#issuecomment-58925305
  
@tigerquoll `file://foo/bar` is not a valid URL (or at least it's not what 
you think it is). `file:/foo/bar` is a valid one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-13 Thread tigerquoll

Github user tigerquoll commented on the pull request:

https://github.com/apache/spark/pull/2516#issuecomment-58943431
  
Interesting, can you give any references as to what single slash file uris 
mean? neither RFC 1738 or 1630 seem to mention them, and 
http://en.wikipedia.org/wiki/File_URI_scheme only mentions them in passing in 
that some web browsers allow them even though it is against the spec.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-13 Thread vanzin

Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/2516#issuecomment-58944199
  
This is not specific to file URLs, this is how URLs are defined.

`file:/foo/bar` means path `/foo/bar` on the local server.

`file://foo/bar` means path `/bar` on server `foo`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-12 Thread tigerquoll

Github user tigerquoll commented on the pull request:

https://github.com/apache/spark/pull/2516#issuecomment-58775682
  
Hi @vanzin, I've implemented your suggestions, tidied up the code more, and 
also added more unit tests to flesh out the test coverage.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-12 Thread tigerquoll

Github user tigerquoll commented on the pull request:

https://github.com/apache/spark/pull/2516#issuecomment-58775741
  
@andrewor14 we could be stepping on each other's toes soon.  Have a query 
about your work on Utils.ResolveUris.  I notice that you produce and test for 
file URIs with a single fwd slash (file:/foo/bar).
Shouldn't they have two fwd slashes (file://foo/bar) ?  I've left my unit 
tests expecting a single fwd slash for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-10-11 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18745897
  
--- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
@@ -1479,6 +1479,14 @@ private[spark] object Utils extends Logging {
 PropertyConfigurator.configure(pro)
   }
 
+  /**
+   * Flatten a map of maps out into a single map, later maps in the 
propList
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-30 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18209006
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-30 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18229704
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-29 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18148421
  
--- Diff: 
core/src/main/resources/org/apache/spark/deploy/spark-submit-defaults.prop ---
@@ -0,0 +1,18 @@
+
+spark.master = local[*]
--- End diff --

Ok, will do


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-29 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18148846
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
--- End diff --

Ok, nuked the property file read in from the classpath and now using a 
default value map


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-29 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18148885
  
--- Diff: 
core/src/test/scala/org/apache/spark/deploy/MergablePropertiesTest.scala ---
@@ -0,0 +1,55 @@
+package org.apache.spark.deploy
--- End diff --

added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-29 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18172865
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
+val propsWithEnvVars : mutable.Map[String,String] = new 
mutable.HashMap() ++ systemPropertyConfig ++ legacyEnvVars
+  .map( {case(varName, propName) = (environmentConfig.get(varName), 
propName) })
+  .filter( {case(varVariable, _) = varVariable.isDefined  
!varVariable.get.isEmpty} )
+  .map{case(varVariable, propName) = (propName, varVariable.get)}
+
+val ConfigSources  = additionalConfigs ++ Seq (
+  environmentConfig,
+  propsWithEnvVars,
+  sparkDefaultConfig,
+  hardCodedDefaultConfig.get
+)
+
+// Load properties file at

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-29 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18173009
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-29 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18190759
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
+val propsWithEnvVars : mutable.Map[String,String] = new 
mutable.HashMap() ++ systemPropertyConfig ++ legacyEnvVars
+  .map( {case(varName, propName) = (environmentConfig.get(varName), 
propName) })
+  .filter( {case(varVariable, _) = varVariable.isDefined  
!varVariable.get.isEmpty} )
+  .map{case(varVariable, propName) = (propName, varVariable.get)}
+
+val ConfigSources  = additionalConfigs ++ Seq (
+  environmentConfig,
+  propsWithEnvVars,
+  sparkDefaultConfig,
+  hardCodedDefaultConfig.get
+)
+
+// Load properties file at

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-29 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18190884
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
+val propsWithEnvVars : mutable.Map[String,String] = new 
mutable.HashMap() ++ systemPropertyConfig ++ legacyEnvVars
+  .map( {case(varName, propName) = (environmentConfig.get(varName), 
propName) })
+  .filter( {case(varVariable, _) = varVariable.isDefined  
!varVariable.get.isEmpty} )
+  .map{case(varVariable, propName) = (propName, varVariable.get)}
+
+val ConfigSources  = additionalConfigs ++ Seq (
+  environmentConfig,
+  propsWithEnvVars,
+  sparkDefaultConfig,
+  hardCodedDefaultConfig.get
+)
+
+// Load properties file at

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-29 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18191705
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
+val propsWithEnvVars : mutable.Map[String,String] = new 
mutable.HashMap() ++ systemPropertyConfig ++ legacyEnvVars
+  .map( {case(varName, propName) = (environmentConfig.get(varName), 
propName) })
+  .filter( {case(varVariable, _) = varVariable.isDefined  
!varVariable.get.isEmpty} )
+  .map{case(varVariable, propName) = (propName, varVariable.get)}
+
+val ConfigSources  = additionalConfigs ++ Seq (
+  environmentConfig,
+  propsWithEnvVars,
+  sparkDefaultConfig,
+  hardCodedDefaultConfig.get
+)
+
+// Load properties file at

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-28 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18128789
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
+val propsWithEnvVars : mutable.Map[String,String] = new 
mutable.HashMap() ++ systemPropertyConfig ++ legacyEnvVars
+  .map( {case(varName, propName) = (environmentConfig.get(varName), 
propName) })
+  .filter( {case(varVariable, _) = varVariable.isDefined  
!varVariable.get.isEmpty} )
+  .map{case(varVariable, propName) = (propName, varVariable.get)}
+
+val ConfigSources  = additionalConfigs ++ Seq (
+  environmentConfig,
+  propsWithEnvVars,
+  sparkDefaultConfig,
+  hardCodedDefaultConfig.get
+)
+
+// Load properties file at

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-28 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18128856
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
+val propsWithEnvVars : mutable.Map[String,String] = new 
mutable.HashMap() ++ systemPropertyConfig ++ legacyEnvVars
+  .map( {case(varName, propName) = (environmentConfig.get(varName), 
propName) })
+  .filter( {case(varVariable, _) = varVariable.isDefined  
!varVariable.get.isEmpty} )
+  .map{case(varVariable, propName) = (propName, varVariable.get)}
+
+val ConfigSources  = additionalConfigs ++ Seq (
+  environmentConfig,
+  propsWithEnvVars,
+  sparkDefaultConfig,
+  hardCodedDefaultConfig.get
+)
+
+// Load properties file at

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-28 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18128941
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
--- End diff --

Ok my way was even getting me confused.  Lets use your suggested code and 
treat legacy env variables at the same priority as normal environment variables.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail:

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-28 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18133582
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-28 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18133601
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
+val propsWithEnvVars : mutable.Map[String,String] = new 
mutable.HashMap() ++ systemPropertyConfig ++ legacyEnvVars
+  .map( {case(varName, propName) = (environmentConfig.get(varName), 
propName) })
+  .filter( {case(varVariable, _) = varVariable.isDefined  
!varVariable.get.isEmpty} )
+  .map{case(varVariable, propName) = (propName, varVariable.get)}
+
+val ConfigSources  = additionalConfigs ++ Seq (
--- End diff --

changed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-28 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18134461
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
+val propsWithEnvVars : mutable.Map[String,String] = new 
mutable.HashMap() ++ systemPropertyConfig ++ legacyEnvVars
+  .map( {case(varName, propName) = (environmentConfig.get(varName), 
propName) })
+  .filter( {case(varVariable, _) = varVariable.isDefined  
!varVariable.get.isEmpty} )
+  .map{case(varVariable, propName) = (propName, varVariable.get)}
+
+val ConfigSources  = additionalConfigs ++ Seq (
+  environmentConfig,
+  propsWithEnvVars,
+  sparkDefaultConfig,
+  hardCodedDefaultConfig.get
+)
+
+// Load properties file at

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-26 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18112667
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
--- End diff --

What caller? Your code is the caller.

The issue I raised was based on your comment. If you want to treat env 
variables as system properties, the code is much cleaner the way I proposed. 
Otherwise, remove the comment, and have this instead:

var envVarConfig = legacyEnvVars
.filter { case (k, v) = sys.env.contains(k) }
.map { case (k, v) = (v, sys.env(k) }


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-26 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18112935
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
+val propsWithEnvVars : mutable.Map[String,String] = new 
mutable.HashMap() ++ systemPropertyConfig ++ legacyEnvVars
+  .map( {case(varName, propName) = (environmentConfig.get(varName), 
propName) })
+  .filter( {case(varVariable, _) = varVariable.isDefined  
!varVariable.get.isEmpty} )
+  .map{case(varVariable, propName) = (propName, varVariable.get)}
+
+val ConfigSources  = additionalConfigs ++ Seq (
+  environmentConfig,
+  propsWithEnvVars,
+  sparkDefaultConfig,
+  hardCodedDefaultConfig.get
+)
+
+// Load properties file at

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread tigerquoll

Github user tigerquoll commented on the pull request:

https://github.com/apache/spark/pull/2516#issuecomment-56777816
  
vanzin - great feedback. thanks for the effort of going through the code.

I've implemented all the requested changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18058651
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/configConstants.scala 
---
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+/**
+ * Created by Dale on 19/09/2014.
+ * File is used to centralize references to configuration variables
+ */
+object ConfigConstants {
+  /**
+   * The name of your application. This will appear in the UI and in log 
data.
+   */
+  val SparkAppName: String = spark.app.name
--- End diff --

It's not my project - I can't even commit your code. But I can post 
annoying comments about it. :-)

Anyway, Spark has its own code conventions that are not necessarily the 
same as Scala. I seem to remember a document or page somewhere but can't 
reference it now. In any case, the whole rest of the code base uses the 
Java-style ALL_CAPS for constants. Adding a different style will just add 
confusion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18058687
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,164 +17,205 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStream, File, FileInputStream}
 import java.util.jar.JarFile
-
+import java.util.Properties
+import java.util.{Map=JavaMap}
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+import scala.collection._
+import scala.collection.JavaConverters._
+import scala.collection.{mutable=m}
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constantsdefined in ConfigConstants.scala
+   */
+  val conf = new m.HashMap[String, String]()
+
+  def master = conf.get(SparkMaster).get
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
--- End diff --

Ah, Scala magic. Sometimes Scala is too magic for my tastes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread andrewor14

Github user andrewor14 commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18058767
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -394,8 +404,8 @@ object SparkSubmit {
  * the user's driver program or to downstream launcher tools.
  */
 private[spark] case class OptionAssigner(
-value: String,
-clusterManager: Int,
-deployMode: Int,
-clOption: String = null,
-sysProp: String = null)
+  value: String,
--- End diff --

Yeah it's probably spaces vs indentation


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18058868
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,164 +17,205 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStream, File, FileInputStream}
 import java.util.jar.JarFile
-
+import java.util.Properties
+import java.util.{Map=JavaMap}
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+import scala.collection._
+import scala.collection.JavaConverters._
+import scala.collection.{mutable=m}
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constantsdefined in ConfigConstants.scala
+   */
+  val conf = new m.HashMap[String, String]()
+
+  def master = conf.get(SparkMaster).get
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf.get(SparkDeployMode).get
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf.get(SparkExecutorMemory).get
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf.get(SparkExecutorCores).get
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf.get(SparkDriverMemory).get
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18059106
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +413,166 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
+
 object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
-try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
-} finally {
-  inputStream.close()
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs additional Map[ConfigName-ConfigValue] in 
order of highest
+   *  priority to lowest
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Vector[Map[String,String]]) 
= {
+
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+val is = Option(Thread.currentThread().getContextClassLoader()
+  .getResourceAsStream(SparkSubmitDefaults))
+
+val hardCodedDefaultConfig = is.flatMap{x =
+  Some(SparkSubmitArguments.getPropertyValuesFromStream(x))}
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $SparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))
+{
+SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = List(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+
+// legacy variables act at the priority of a system property
+systemPropertyConfig ++ legacyEnvVars
+  .map( {case(varName, propName) = (environmentConfig.get(varName), 
propName) })
+  .filter( {case(varVariable, _) = varVariable.isDefined  
!varVariable.get.isEmpty} )
+  .map{case(varVariable, propName) = (propName, varVariable.get)}
+
+val ConfigSources  = additionalConfigs ++ Vector (
+  environmentConfig,
+  systemPropertyConfig,
+  sparkDefaultConfig,
+  hardCodedDefaultConfig.get
+)
+
+// Load properties file at priority level of source that specified the 
property file
+// loaded property file configs will not override existing configs at 
the priority
+// level the property file was specified at
+val processedConfigSource = ConfigSources
+  .map( configMap = getFileBasedPropertiesIfSpecified(configMap) ++ 
configMap)
+
+val test = MergedPropertyMap.mergePropertyMaps(processedConfigSource)
+
+test
+  }
+
+  /**
+   * Returns a map of config values from a property file if
+   * the passed configMap has a SparkPropertiesFile defined pointing to a

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18060296
  
--- Diff: 
core/src/main/resources/org/apache/spark/deploy/spark-submit-defaults.prop ---
@@ -0,0 +1,18 @@
+
+spark.master = local[*]
--- End diff --

Hmmm... now that this file has been cleaned up, it looks a lot smaller than 
it did before. I'm not sure it justifies the extra code to load it. It could 
very well live in a `val DEFAULTS = Map(...)` where it's used.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18060382
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -188,18 +188,18 @@ class SparkConf(loadDefaults: Boolean) extends 
Cloneable with Logging {
   def getExecutorEnv: Seq[(String, String)] = {
 val prefix = spark.executorEnv.
 getAll.filter{case (k, v) = k.startsWith(prefix)}
-  .map{case (k, v) = (k.substring(prefix.length), v)}
+  .map{case (k, v) = (k.substring(prefix.length), v)}
   }
 
   /** Get all akka conf variables set on this SparkConf */
   def getAkkaConf: Seq[(String, String)] =
-/* This is currently undocumented. If we want to make this public we 
should consider
- * nesting options under the spark namespace to avoid conflicts with 
user akka options.
- * Otherwise users configuring their own akka code via system 
properties could mess up
- * spark's akka options.
- *
- *   E.g. spark.akka.option.x.y.x = value
- */
+  /* This is currently undocumented. If we want to make this public we 
should consider
--- End diff --

The previous indentation was actually correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18060501
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -330,13 +330,13 @@ private[spark] object SparkConf {
*/
   def isExecutorStartupConf(name: String): Boolean = {
 isAkkaConf(name) ||
-name.startsWith(spark.akka) ||
-name.startsWith(spark.auth) ||
-isSparkPortConf(name)
+  name.startsWith(spark.akka) ||
+  name.startsWith(spark.auth) ||
+  isSparkPortConf(name)
   }
 
   /**
* Return whether the given config is a Spark port config.
*/
   def isSparkPortConf(name: String): Boolean = name.startsWith(spark.) 
 name.endsWith(.port)
-}
+}
--- End diff --

There are no code changes to this file, you're just making indentation 
changes and things like that. Better to leave it unmodified.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18060656
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -57,6 +57,10 @@ object SparkSubmit {
   private val CLASS_NOT_FOUND_EXIT_STATUS = 101
 
   // Exposed for testing
+  // testing currently disabled exitFn() from working, so we need to stop 
execution
--- End diff --

As a general suggestion, try to make comments proper sentences / 
paragraphs. i.e., start them with a capital letter, and them with a period.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18060770
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -83,12 +89,12 @@ object SparkSubmit {
*   (4) the main class for the child
*/
   private[spark] def createLaunchEnv(args: SparkSubmitArguments)
-  : (ArrayBuffer[String], ArrayBuffer[String], Map[String, String], 
String) = {
+  : (mutable.ArrayBuffer[String], mutable.ArrayBuffer[String], 
Map[String, String], String) = {
--- End diff --

So, it's ok to import `ArrayBuffer` and reference it directly. You don't 
need to do `mutable.ArrayBuffer`.

`mutable.Foo` is useful when there are conflicts (e.g. 
`scala.collection.Map` vs. `scala.collection.mutable.Map`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18062451
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -288,11 +291,11 @@ object SparkSubmit {
   }
 
   private def launch(
-  childArgs: ArrayBuffer[String],
-  childClasspath: ArrayBuffer[String],
-  sysProps: Map[String, String],
-  childMainClass: String,
-  verbose: Boolean = false) {
+  childArgs: mutable.ArrayBuffer[String],
--- End diff --

Yeah, this is definitely a tab. You should check your editor's 
configuration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18062586
  
--- Diff: 
core/src/main/resources/org/apache/spark/deploy/spark-submit-defaults.prop ---
@@ -0,0 +1,18 @@
+
+spark.master = local[*]
+
+spark.verbose = false
--- End diff --

BTW this is also something that probably just makes sense in the command 
line, and probably shouldn't be turned into a `spark.*` conf.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18063554
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/configConstants.scala 
---
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.deploy
+
+/**
+ * Created by Dale on 19/09/2014.
+ * File is used to centralize references to configuration variables
+ */
+object ConfigConstants {
+  /**
+   * The name of your application. This will appear in the UI and in log 
data.
+   */
+  val SparkAppName: String = spark.app.name
--- End diff --

For reference: 
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide

This would fall into the If in Doubt section.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18065537
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18065577
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18066251
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18066259
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18066419
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18066401
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18066533
  
--- Diff: 
core/src/main/resources/org/apache/spark/deploy/spark-submit-defaults.prop ---
@@ -0,0 +1,18 @@
+
+spark.master = local[*]
--- End diff --

It could, but if this PR gets accepted I'm thinking of extending the 
concept to other configuration properties as well, many of which have default 
values sitting buried in other parts of the code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18066548
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18066585
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18066576
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18066600
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18066609
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18066623
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -188,41 +228,16 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   either HADOOP_CONF_DIR or YARN_CONF_DIR must be set in the 
environment.)
   }
 }
+
   }
 
   override def toString =  {
-sParsed arguments:
-|  master  $master
-|  deployMode  $deployMode
-|  executorMemory  $executorMemory
-|  executorCores   $executorCores
-|  totalExecutorCores  $totalExecutorCores
-|  propertiesFile  $propertiesFile
-|  extraSparkProperties$sparkProperties
-|  driverMemory$driverMemory
-|  driverCores $driverCores
-|  driverExtraClassPath$driverExtraClassPath
-|  driverExtraLibraryPath  $driverExtraLibraryPath
-|  driverExtraJavaOptions  $driverExtraJavaOptions
-|  supervise   $supervise
-|  queue   $queue
-|  numExecutors$numExecutors
-|  files   $files
-|  pyFiles $pyFiles
-|  archives$archives
-|  mainClass   $mainClass
-|  primaryResource $primaryResource
-|  name$name
-|  childArgs   [${childArgs.mkString( )}]
-|  jars$jars
-|  verbose $verbose
-|
-|Default properties from $propertiesFile:
-|${defaultSparkProperties.mkString(  , \n  , \n)}
-.stripMargin
+conf.mkString(\n)
   }
 
-  /** Fill in values by parsing user options. */
+  /**
--- End diff --

Previous formatting was correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18066712
  
--- Diff: 
core/src/main/resources/org/apache/spark/deploy/spark-submit-defaults.prop ---
@@ -0,0 +1,18 @@
+
+spark.master = local[*]
--- End diff --

I'd cross that bridge when you get there. I've had push back before when 
trying to consolidate options into a single location like this.

For now I feel it's cleaner to just keep this in `SparkSubmitArguments`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18066741
  
--- Diff: core/src/main/scala/org/apache/spark/SparkConf.scala ---
@@ -330,13 +330,13 @@ private[spark] object SparkConf {
*/
   def isExecutorStartupConf(name: String): Boolean = {
 isAkkaConf(name) ||
-name.startsWith(spark.akka) ||
-name.startsWith(spark.auth) ||
-isSparkPortConf(name)
+  name.startsWith(spark.akka) ||
+  name.startsWith(spark.auth) ||
+  isSparkPortConf(name)
   }
 
   /**
* Return whether the given config is a Spark port config.
*/
   def isSparkPortConf(name: String): Boolean = name.startsWith(spark.) 
 name.endsWith(.port)
-}
+}
--- End diff --

reverted the file


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18067374
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +413,166 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
+
 object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
-try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
-} finally {
-  inputStream.close()
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs additional Map[ConfigName-ConfigValue] in 
order of highest
+   *  priority to lowest
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Vector[Map[String,String]]) 
= {
+
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+val is = Option(Thread.currentThread().getContextClassLoader()
+  .getResourceAsStream(SparkSubmitDefaults))
+
+val hardCodedDefaultConfig = is.flatMap{x =
+  Some(SparkSubmitArguments.getPropertyValuesFromStream(x))}
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $SparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))
+{
+SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = List(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+
+// legacy variables act at the priority of a system property
+systemPropertyConfig ++ legacyEnvVars
+  .map( {case(varName, propName) = (environmentConfig.get(varName), 
propName) })
+  .filter( {case(varVariable, _) = varVariable.isDefined  
!varVariable.get.isEmpty} )
+  .map{case(varVariable, propName) = (propName, varVariable.get)}
+
+val ConfigSources  = additionalConfigs ++ Vector (
+  environmentConfig,
+  systemPropertyConfig,
+  sparkDefaultConfig,
+  hardCodedDefaultConfig.get
+)
+
+// Load properties file at priority level of source that specified the 
property file
+// loaded property file configs will not override existing configs at 
the priority
+// level the property file was specified at
+val processedConfigSource = ConfigSources
+  .map( configMap = getFileBasedPropertiesIfSpecified(configMap) ++ 
configMap)
+
+val test = MergedPropertyMap.mergePropertyMaps(processedConfigSource)
+
+test
+  }
+
+  /**
+   * Returns a map of config values from a property file if
+   * the passed configMap has a SparkPropertiesFile defined pointing

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18067347
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
--- End diff --

I was going to suggest a cleanup here but I really believe just hardcoding 
the default config map is a better solution at this point.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18067408
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -57,6 +57,10 @@ object SparkSubmit {
   private val CLASS_NOT_FOUND_EXIT_STATUS = 101
 
   // Exposed for testing
+  // testing currently disabled exitFn() from working, so we need to stop 
execution
--- End diff --

Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18067446
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
--- End diff --

`sys.props`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18067463
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
--- End diff --

`sys.props`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18067545
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
--- End diff --

This is not being assigned to anything? I think you mean something like 
this:

val config = SparkSubmitArguments.getPropertyValuesFromFile(
  
conf.get(SparkPropertiesFile).getOrElse(SparkSubmitArguments.getSparkDefaultFileConfig))


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18067561
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
--- End diff --

Could you put each mapping on its own line, for readability? Also, you need 
spaces around `-`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18067608
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala ---
@@ -288,11 +291,11 @@ object SparkSubmit {
   }
 
   private def launch(
-  childArgs: ArrayBuffer[String],
-  childClasspath: ArrayBuffer[String],
-  sysProps: Map[String, String],
-  childMainClass: String,
-  verbose: Boolean = false) {
+  childArgs: mutable.ArrayBuffer[String],
--- End diff --

Double checked Intellij's Scala plugin is set to 2 space tabs. I've deleted 
the spaces and re-indented - lets see if that solves it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18067705
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18067693
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
--- End diff --

Hmm... then how about just updating the system properties?

legacyEnvVars.foreach { case (k, v) =
  sys.env.get(k).foreach { envValue = sys.props(v) = envValue }
}



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe,

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18067718
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread vanzin

Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18067736
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -406,22 +412,173 @@ private[spark] class SparkSubmitArguments(args: 
Seq[String]) {
   }
 }
 
-object SparkSubmitArguments {
-  /** Load properties present in the given file. */
-  def getPropertiesFromFile(file: File): Seq[(String, String)] = {
-require(file.exists(), sProperties file $file does not exist)
-require(file.isFile(), sProperties file $file is not a normal file)
-val inputStream = new FileInputStream(file)
+private[spark] object SparkSubmitArguments {
+  /**
+   * Resolves Configuration sources in order of highest to lowest
+   * 1. Each map passed in as additionalConfig from first to last
+   * 2. Environment variables (including legacy variable mappings)
+   * 3. System config variables (eg by using -Dspark.var.name)
+   * 4  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf
+   * 5. hard coded defaults in class path at spark-submit-defaults.prop
+   *
+   * A property file specified by one of the means listed above gets read 
in and the properties are
+   * considered to be at the priority of the method that specified the 
files.
+   * A property specified in a property file will not override an existing
+   * config value at that same level
+   *
+   * @param additionalConfigs Seq of additional 
Map[ConfigName-ConfigValue] in order of highest
+   *  priority to lowest this will have priority 
over internal sources
+   * @return Map[propName-propFile] containing values merged from all 
sources in order of priority
+   */
+  def mergeSparkProperties(additionalConfigs: Seq [Map[String,String]]) = {
+// Configuration read in from spark-submit-defaults.prop file found on 
the classpath
+var hardCodedDefaultConfig: Option[Map[String,String]] = None
+var is: InputStream = null
+var isr: Option[InputStreamReader] = None
 try {
-  val properties = new Properties()
-  properties.load(inputStream)
-  properties.stringPropertyNames().toSeq.map(k = (k, 
properties(k).trim))
-} catch {
-  case e: IOException =
-val message = sFailed when loading Spark properties file $file
-throw new SparkException(message, e)
+  is = 
Thread.currentThread().getContextClassLoader.getResourceAsStream(ClassPathSparkSubmitDefaults)
+
+  // only open InputStreamReader if InputStream was successfully opened
+  isr = Option(is).map{is: InputStream =
+new InputStreamReader(is, CharEncoding.UTF_8)
+  }
+
+  hardCodedDefaultConfig = isr.map( defaultValueStream =
+
SparkSubmitArguments.getPropertyValuesFromStream(defaultValueStream))
 } finally {
-  inputStream.close()
+  Option(is).foreach(_.close)
+  isr.foreach(_.close)
 }
+
+if (hardCodedDefaultConfig.isEmpty || (hardCodedDefaultConfig.get.size 
== 0)) {
+  throw new IllegalStateException(sDefault values not found at 
classpath $ClassPathSparkSubmitDefaults)
+}
+
+// Configuration read in from defaults file if it exists
+var sparkDefaultConfig = SparkSubmitArguments.getSparkDefaultFileConfig
+
+if (sparkDefaultConfig.isDefinedAt(SparkPropertiesFile))  {
+  SparkSubmitArguments.getPropertyValuesFromFile(
+  sparkDefaultConfig.get(SparkPropertiesFile).get)
+} else {
+  Map.empty
+}
+
+// Configuration from java system properties
+val systemPropertyConfig = 
SparkSubmitArguments.getPropertyMap(System.getProperties)
+
+// Configuration variables from the environment
+// support legacy variables
+val environmentConfig = System.getenv().asScala
+
+val legacyEnvVars = Seq(MASTER-SparkMaster, 
DEPLOY_MODE-SparkDeployMode,
+  SPARK_DRIVER_MEMORY-SparkDriverMemory, 
SPARK_EXECUTOR_MEMORY-SparkExecutorMemory)
+
+// legacy variables act at the priority of a system property
+val propsWithEnvVars : mutable.Map[String,String] = new 
mutable.HashMap() ++ systemPropertyConfig ++ legacyEnvVars
+  .map( {case(varName, propName) = (environmentConfig.get(varName), 
propName) })
+  .filter( {case(varVariable, _) = varVariable.isDefined  
!varVariable.get.isEmpty} )
+  .map{case(varVariable, propName) = (propName, varVariable.get)}
+
+val ConfigSources  = additionalConfigs ++ Seq (
--- End diff --

nit: variable names should not start with capital letters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on

[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...

2014-09-25 Thread tigerquoll

Github user tigerquoll commented on a diff in the pull request:

https://github.com/apache/spark/pull/2516#discussion_r18067811
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala ---
@@ -17,155 +17,195 @@
 
 package org.apache.spark.deploy
 
-import java.io.{File, FileInputStream, IOException}
-import java.util.Properties
+import java.io.{InputStreamReader, File, FileInputStream, InputStream}
 import java.util.jar.JarFile
+import java.util.Properties
 
+import scala.collection._
+import scala.collection.JavaConverters._
 import scala.collection.JavaConversions._
-import scala.collection.mutable.{ArrayBuffer, HashMap}
+import org.apache.commons.lang3.CharEncoding
 
-import org.apache.spark.SparkException
+import org.apache.spark.deploy.ConfigConstants._
 import org.apache.spark.util.Utils
 
+
+
 /**
- * Parses and encapsulates arguments from the spark-submit script.
- */
+ * Pulls configuration information together in order of priority
+ *
+ * Entries in the conf Map will be filled in the following priority order
+ * 1. entries specified on the command line (except from --conf entries)
+ * 2. Entries specified on the command line with --conf
+ * 3. Environment variables (including legacy variable mappings)
+ * 4. System config variables (eg by using -Dspark.var.name)
+ * 5  SPARK_DEFAULT_CONF/spark-defaults.conf or 
SPARK_HOME/conf/spark-defaults.conf if either exist
+ * 6. hard coded defaults in class path at spark-submit-defaults.prop
+ *
+ * A property file specified by one of the means listed above gets read in 
and the properties are
+ * considered to be at the priority of the method that specified the 
files. A property specified in
+ * a property file will not override an existing config value at that same 
level
+*/
 private[spark] class SparkSubmitArguments(args: Seq[String]) {
-  var master: String = null
-  var deployMode: String = null
-  var executorMemory: String = null
-  var executorCores: String = null
-  var totalExecutorCores: String = null
-  var propertiesFile: String = null
-  var driverMemory: String = null
-  var driverExtraClassPath: String = null
-  var driverExtraLibraryPath: String = null
-  var driverExtraJavaOptions: String = null
-  var driverCores: String = null
-  var supervise: Boolean = false
-  var queue: String = null
-  var numExecutors: String = null
-  var files: String = null
-  var archives: String = null
-  var mainClass: String = null
-  var primaryResource: String = null
-  var name: String = null
-  var childArgs: ArrayBuffer[String] = new ArrayBuffer[String]()
-  var jars: String = null
-  var verbose: Boolean = false
-  var isPython: Boolean = false
-  var pyFiles: String = null
-  val sparkProperties: HashMap[String, String] = new HashMap[String, 
String]()
-
-  /** Default properties present in the currently defined defaults file. */
-  lazy val defaultSparkProperties: HashMap[String, String] = {
-val defaultProperties = new HashMap[String, String]()
-if (verbose) SparkSubmit.printStream.println(sUsing properties file: 
$propertiesFile)
-Option(propertiesFile).foreach { filename =
-  val file = new File(filename)
-  SparkSubmitArguments.getPropertiesFromFile(file).foreach { case (k, 
v) =
-if (k.startsWith(spark)) {
-  defaultProperties(k) = v
-  if (verbose) SparkSubmit.printStream.println(sAdding default 
property: $k=$v)
-} else {
-  SparkSubmit.printWarning(sIgnoring non-spark config property: 
$k=$v)
-}
-  }
-}
-defaultProperties
-  }
+  /**
+   * Stores all configuration items except for child arguments,
+   * referenced by the constants defined in ConfigConstants.scala
+   */
+  val conf = new mutable.HashMap[String, String]()
+
+  def master  = conf(SparkMaster)
+  def master_= (value: String):Unit = conf.put(SparkMaster, value)
+
+  def deployMode = conf(SparkDeployMode)
+  def deployMode_= (value: String):Unit = conf.put(SparkDeployMode, value)
+
+  def executorMemory = conf(SparkExecutorMemory)
+  def executorMemory_= (value: String):Unit = 
conf.put(SparkExecutorMemory, value)
+
+  def executorCores = conf(SparkExecutorCores)
+  def executorCores_= (value: String):Unit = conf.put(SparkExecutorCores, 
value)
+
+  def totalExecutorCores = conf.get(SparkCoresMax)
+  def totalExecutorCores_= (value: String):Unit = conf.put(SparkCoresMax, 
value)
+
+  def driverMemory = conf(SparkDriverMemory)
+  def driverMemory_= (value: String):Unit = conf.put(SparkDriverMemory, 
value)
+
+  def driverExtraClassPath =

1 2 3 >

1 - 100 of 208 matches

Mail list logo