Thanks to everyone for suggestions and explanations.
Currently I've started to experiment with the following scenario, that
seems to work for me:
- Put the properties file on a web server so that it is centrally available
- Pass it to the Spark driver program via --conf 'propertiesFile=http:
Emre,
As you are keeping the properties file external to the JAR you need to make
sure to submit the properties file as an additional --files (or whatever
the necessary CLI switch is) so all the executors get a copy of the file
along with the JAR.
If you know you are going to just put the
+1 for TypeSafe config
Our practice is to include all spark properties under a 'spark' entry in
the config file alongside job-specific configuration:
A config file would look like:
spark {
master =
cleaner.ttl = 123456
...
}
job {
context {
src = foo
action =
I've decided to try
spark-submit ... --conf
spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties
But when I try to retrieve the value of propertiesFile via
System.err.println(propertiesFile : +
System.getProperty(propertiesFile));
I get NULL:
We've been using commons configuration to pull our properties out of
properties files and system properties (prioritizing system properties over
others) and we add those properties to our spark conf explicitly and we use
ArgoPartser to get the command line argument for which property file to
load.
Hello,
I'm using Spark 1.2.1 and have a module.properties file, and in it I have
non-Spark properties, as well as Spark properties, e.g.:
job.output.dir=file:///home/emre/data/mymodule/out
I'm trying to pass it to spark-submit via:
spark-submit --class com.myModule --master local[4]
Since SparkConf is only for Spark properties, I think it will in
general only pay attention to and preserve spark.* properties. You
could experiment with that. In general I wouldn't rely on Spark
mechanisms for your configuration, and you can use any config
mechanism you like to retain your own
Sean,
I'm trying this as an alternative to what I currently do. Currently I have
my module.properties file for my module in the resources directory, and
that file is put inside the über JAR file when I build my application with
Maven, and then when I submit it using spark-submit, I can read that
I haven't actually tried mixing non-Spark settings into the Spark
properties. Instead I package my properties into the jar and use the
Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
specific) to get at my properties:
Properties file: src/main/resources/integration.conf
(below