Thanks to everyone for suggestions and explanations. Currently I've started to experiment with the following scenario, that seems to work for me:
- Put the properties file on a web server so that it is centrally available - Pass it to the Spark driver program via --conf 'propertiesFile=http: //myWebServer.com/mymodule.properties' - And then load the configuration using Apache Commons Configuration: PropertiesConfiguration config = new PropertiesConfiguration(); config.load(System.getProperty("propertiesFile")); Using the method described above, I don't need to statically compile my properties file into the über JAR anymore, I can modify the file on the web server and when I submit my application via spark-submit, passing the URL of the properties file, the driver program reads the contents of that file for once, retrieves the values of the keys and continues. PS: I've opted for Apache Commons Configuration because it is already part of the many dependencies I have in my pom.xml, and I did not want to pull another library, even though I Typesafe Config library seems to be a powerful and flexible choice, too. -- Emre On Tue, Feb 17, 2015 at 6:12 PM, Charles Feduke <charles.fed...@gmail.com> wrote: > Emre, > > As you are keeping the properties file external to the JAR you need to > make sure to submit the properties file as an additional --files (or > whatever the necessary CLI switch is) so all the executors get a copy of > the file along with the JAR. > > If you know you are going to just put the properties file on HDFS then why > don't you define a custom system setting like "properties.url" and pass it > along: > > (this is for Spark shell, the only CLI string I have available at the > moment:) > > spark-shell --jars $JAR_NAME \ > --conf 'properties.url=hdfs://config/stuff.properties' \ > --conf > 'spark.executor.extraJavaOptions=-Dproperties.url=hdfs://config/stuff.properties'" > > ... then load the properties file during initialization by examining the > properties.url system setting. > > I'd still strongly recommend Typesafe Config as it makes this a lot less > painful, and I know for certain you can place your *.conf at a URL (using > the -Dconfig.url=) though it probably won't work with an HDFS URL. > > > > On Tue Feb 17 2015 at 9:53:08 AM Gerard Maas <gerard.m...@gmail.com> > wrote: > >> +1 for TypeSafe config >> Our practice is to include all spark properties under a 'spark' entry in >> the config file alongside job-specific configuration: >> >> A config file would look like: >> spark { >> master = "" >> cleaner.ttl = 123456 >> ... >> } >> job { >> context { >> src = "foo" >> action = "barAction" >> } >> prop1 = "val1" >> } >> >> Then, to create our Spark context, we transparently pass the spark >> section to a SparkConf instance. >> This idiom will instantiate the context with the spark specific >> configuration: >> >> >> sparkConfig.setAll(configToStringSeq(config.getConfig("spark").atPath("spark"))) >> >> And we can make use of the config object everywhere else. >> >> We use the override model of the typesafe config: reasonable defaults go >> in the reference.conf (within the jar). Environment-specific overrides go >> in the application.conf (alongside the job jar) and hacks are passed with >> -Dprop=value :-) >> >> >> -kr, Gerard. >> >> >> On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc <emre.sev...@gmail.com> >> wrote: >> >>> I've decided to try >>> >>> spark-submit ... --conf >>> "spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties" >>> >>> But when I try to retrieve the value of propertiesFile via >>> >>> System.err.println("propertiesFile : " + >>> System.getProperty("propertiesFile")); >>> >>> I get NULL: >>> >>> propertiesFile : null >>> >>> Interestingly, when I run spark-submit with --verbose, I see that it >>> prints: >>> >>> spark.driver.extraJavaOptions -> >>> -DpropertiesFile=/home/emre/data/belga/schemavalidator.properties >>> >>> I couldn't understand why I couldn't get to the value of >>> "propertiesFile" by using standard System.getProperty method. (I can use >>> new SparkConf().get("spark.driver.extraJavaOptions") and manually parse >>> it, and retrieve the value, but I'd like to know why I cannot retrieve that >>> value using System.getProperty method). >>> >>> Any ideas? >>> >>> If I can achieve what I've described above properly, I plan to pass a >>> properties file that resides on HDFS, so that it will be available to my >>> driver program wherever that program runs. >>> >>> -- >>> Emre >>> >>> >>> >>> >>> On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke < >>> charles.fed...@gmail.com> wrote: >>> >>>> I haven't actually tried mixing non-Spark settings into the Spark >>>> properties. Instead I package my properties into the jar and use the >>>> Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala >>>> specific) to get at my properties: >>>> >>>> Properties file: src/main/resources/integration.conf >>>> >>>> (below $ENV might be set to either "integration" or "prod"[3]) >>>> >>>> ssh -t root@$HOST "/root/spark/bin/spark-shell --jars /root/$JAR_NAME \ >>>> --conf 'config.resource=$ENV.conf' \ >>>> --conf >>>> 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'" >>>> >>>> Since the properties file is packaged up with the JAR I don't have to >>>> worry about sending the file separately to all of the slave nodes. Typesafe >>>> Config is written in Java so it will work if you're not using Scala. (The >>>> Typesafe Config also has the advantage of being extremely easy to integrate >>>> with code that is using Java Properties today.) >>>> >>>> If you instead want to send the file separately from the JAR and you >>>> use the Typesafe Config library, you can specify "config.file" instead of >>>> ".resource"; though I'd point you to [3] below if you want to make your >>>> development life easier. >>>> >>>> 1. https://github.com/typesafehub/config >>>> 2. https://github.com/ceedubs/ficus >>>> 3. >>>> http://deploymentzone.com/2015/01/27/spark-ec2-and-easy-spark-shell-deployment/ >>>> >>>> >>>> >>>> On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc <emre.sev...@gmail.com> >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> I'm using Spark 1.2.1 and have a module.properties file, and in it I >>>>> have non-Spark properties, as well as Spark properties, e.g.: >>>>> >>>>> job.output.dir=file:///home/emre/data/mymodule/out >>>>> >>>>> I'm trying to pass it to spark-submit via: >>>>> >>>>> spark-submit --class com.myModule --master local[4] --deploy-mode >>>>> client --verbose --properties-file /home/emre/data/mymodule.properties >>>>> mymodule.jar >>>>> >>>>> And I thought I could read the value of my non-Spark property, namely, >>>>> job.output.dir by using: >>>>> >>>>> SparkConf sparkConf = new SparkConf(); >>>>> final String validatedJSONoutputDir = >>>>> sparkConf.get("job.output.dir"); >>>>> >>>>> But it gives me an exception: >>>>> >>>>> Exception in thread "main" java.util.NoSuchElementException: >>>>> job.output.dir >>>>> >>>>> Is it not possible to mix Spark and non-Spark properties in a single >>>>> .properties file, then pass it via --properties-file and then get the >>>>> values of those non-Spark properties via SparkConf? >>>>> >>>>> Or is there another object / method to retrieve the values for those >>>>> non-Spark properties? >>>>> >>>>> >>>>> -- >>>>> Emre Sevinç >>>>> >>>> >>> >>> >>> -- >>> Emre Sevinc >>> >> >> -- Emre Sevinc