Thanks to everyone for suggestions and explanations.

Currently I've started to experiment with the following scenario, that
seems to work for me:

- Put the properties file on a web server so that it is centrally available
- Pass it to the Spark driver program via --conf 'propertiesFile=http:
- And then load the configuration using Apache Commons Configuration:

    PropertiesConfiguration config = new PropertiesConfiguration();

Using the method described above, I don't need to statically compile my
properties file into the über JAR anymore, I can modify the file on the web
server and when I submit my application via spark-submit, passing the URL
of the properties file, the driver program reads the contents of that file
for once, retrieves the values of the keys and continues.

PS: I've opted for Apache Commons Configuration because it is already part
of the many dependencies I have in my pom.xml, and I did not want to pull
another library, even though I Typesafe Config library seems to be a
powerful and flexible choice, too.


On Tue, Feb 17, 2015 at 6:12 PM, Charles Feduke <>

> Emre,
> As you are keeping the properties file external to the JAR you need to
> make sure to submit the properties file as an additional --files (or
> whatever the necessary CLI switch is) so all the executors get a copy of
> the file along with the JAR.
> If you know you are going to just put the properties file on HDFS then why
> don't you define a custom system setting like "properties.url" and pass it
> along:
> (this is for Spark shell, the only CLI string I have available at the
> moment:)
> spark-shell --jars $JAR_NAME \
>     --conf 'properties.url=hdfs://config/' \
>     --conf
> 'spark.executor.extraJavaOptions=-Dproperties.url=hdfs://config/'"
> ... then load the properties file during initialization by examining the
> properties.url system setting.
> I'd still strongly recommend Typesafe Config as it makes this a lot less
> painful, and I know for certain you can place your *.conf at a URL (using
> the -Dconfig.url=) though it probably won't work with an HDFS URL.
> On Tue Feb 17 2015 at 9:53:08 AM Gerard Maas <>
> wrote:
>> +1 for TypeSafe config
>> Our practice is to include all spark properties under a 'spark' entry in
>> the config file alongside job-specific configuration:
>> A config file would look like:
>> spark {
>>      master = ""
>>      cleaner.ttl = 123456
>>      ...
>> }
>> job {
>>     context {
>>         src = "foo"
>>         action = "barAction"
>>     }
>>     prop1 = "val1"
>> }
>> Then, to create our Spark context, we transparently pass the spark
>> section to a SparkConf instance.
>> This idiom will instantiate the context with the spark specific
>> configuration:
>> sparkConfig.setAll(configToStringSeq(config.getConfig("spark").atPath("spark")))
>> And we can make use of the config object everywhere else.
>> We use the override model of the typesafe config: reasonable defaults go
>> in the reference.conf (within the jar). Environment-specific overrides go
>> in the application.conf (alongside the job jar) and hacks are passed with
>> -Dprop=value :-)
>> -kr, Gerard.
>> On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc <>
>> wrote:
>>> I've decided to try
>>>   spark-submit ... --conf
>>> "spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/"
>>> But when I try to retrieve the value of propertiesFile via
>>>    System.err.println("propertiesFile : " +
>>> System.getProperty("propertiesFile"));
>>> I get NULL:
>>>    propertiesFile : null
>>> Interestingly, when I run spark-submit with --verbose, I see that it
>>> prints:
>>>   spark.driver.extraJavaOptions ->
>>> -DpropertiesFile=/home/emre/data/belga/
>>> I couldn't understand why I couldn't get to the value of
>>> "propertiesFile" by using standard System.getProperty method. (I can use
>>> new SparkConf().get("spark.driver.extraJavaOptions")  and manually parse
>>> it, and retrieve the value, but I'd like to know why I cannot retrieve that
>>> value using System.getProperty method).
>>> Any ideas?
>>> If I can achieve what I've described above properly, I plan to pass a
>>> properties file that resides on HDFS, so that it will be available to my
>>> driver program wherever that program runs.
>>> --
>>> Emre
>>> On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke <
>>>> wrote:
>>>> I haven't actually tried mixing non-Spark settings into the Spark
>>>> properties. Instead I package my properties into the jar and use the
>>>> Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
>>>> specific) to get at my properties:
>>>> Properties file: src/main/resources/integration.conf
>>>> (below $ENV might be set to either "integration" or "prod"[3])
>>>> ssh -t root@$HOST "/root/spark/bin/spark-shell --jars /root/$JAR_NAME \
>>>>     --conf 'config.resource=$ENV.conf' \
>>>>     --conf
>>>> 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'"
>>>> Since the properties file is packaged up with the JAR I don't have to
>>>> worry about sending the file separately to all of the slave nodes. Typesafe
>>>> Config is written in Java so it will work if you're not using Scala. (The
>>>> Typesafe Config also has the advantage of being extremely easy to integrate
>>>> with code that is using Java Properties today.)
>>>> If you instead want to send the file separately from the JAR and you
>>>> use the Typesafe Config library, you can specify "config.file" instead of
>>>> ".resource"; though I'd point you to [3] below if you want to make your
>>>> development life easier.
>>>> 1.
>>>> 2.
>>>> 3.
>>>> On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc <>
>>>> wrote:
>>>>> Hello,
>>>>> I'm using Spark 1.2.1 and have a file, and in it I
>>>>> have non-Spark properties, as well as Spark properties, e.g.:
>>>>>    job.output.dir=file:///home/emre/data/mymodule/out
>>>>> I'm trying to pass it to spark-submit via:
>>>>>    spark-submit --class com.myModule --master local[4] --deploy-mode
>>>>> client --verbose --properties-file /home/emre/data/
>>>>> mymodule.jar
>>>>> And I thought I could read the value of my non-Spark property, namely,
>>>>> job.output.dir by using:
>>>>>     SparkConf sparkConf = new SparkConf();
>>>>>     final String validatedJSONoutputDir =
>>>>> sparkConf.get("job.output.dir");
>>>>> But it gives me an exception:
>>>>>     Exception in thread "main" java.util.NoSuchElementException:
>>>>> job.output.dir
>>>>> Is it not possible to mix Spark and non-Spark properties in a single
>>>>> .properties file, then pass it via --properties-file and then get the
>>>>> values of those non-Spark properties via SparkConf?
>>>>> Or is there another object / method to retrieve the values for those
>>>>> non-Spark properties?
>>>>> --
>>>>> Emre Sevinç
>>> --
>>> Emre Sevinc

Emre Sevinc

Reply via email to