Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-18 Thread Emre Sevinc
Thanks to everyone for suggestions and explanations.

Currently I've started to experiment with the following scenario, that
seems to work for me:

- Put the properties file on a web server so that it is centrally available
- Pass it to the Spark driver program via --conf 'propertiesFile=http:
//myWebServer.com/mymodule.properties'
- And then load the configuration using Apache Commons Configuration:

PropertiesConfiguration config = new PropertiesConfiguration();
config.load(System.getProperty("propertiesFile"));

Using the method described above, I don't need to statically compile my
properties file into the über JAR anymore, I can modify the file on the web
server and when I submit my application via spark-submit, passing the URL
of the properties file, the driver program reads the contents of that file
for once, retrieves the values of the keys and continues.

PS: I've opted for Apache Commons Configuration because it is already part
of the many dependencies I have in my pom.xml, and I did not want to pull
another library, even though I Typesafe Config library seems to be a
powerful and flexible choice, too.

--
Emre



On Tue, Feb 17, 2015 at 6:12 PM, Charles Feduke 
wrote:

> Emre,
>
> As you are keeping the properties file external to the JAR you need to
> make sure to submit the properties file as an additional --files (or
> whatever the necessary CLI switch is) so all the executors get a copy of
> the file along with the JAR.
>
> If you know you are going to just put the properties file on HDFS then why
> don't you define a custom system setting like "properties.url" and pass it
> along:
>
> (this is for Spark shell, the only CLI string I have available at the
> moment:)
>
> spark-shell --jars $JAR_NAME \
> --conf 'properties.url=hdfs://config/stuff.properties' \
> --conf
> 'spark.executor.extraJavaOptions=-Dproperties.url=hdfs://config/stuff.properties'"
>
> ... then load the properties file during initialization by examining the
> properties.url system setting.
>
> I'd still strongly recommend Typesafe Config as it makes this a lot less
> painful, and I know for certain you can place your *.conf at a URL (using
> the -Dconfig.url=) though it probably won't work with an HDFS URL.
>
>
>
> On Tue Feb 17 2015 at 9:53:08 AM Gerard Maas 
> wrote:
>
>> +1 for TypeSafe config
>> Our practice is to include all spark properties under a 'spark' entry in
>> the config file alongside job-specific configuration:
>>
>> A config file would look like:
>> spark {
>>  master = ""
>>  cleaner.ttl = 123456
>>  ...
>> }
>> job {
>> context {
>> src = "foo"
>> action = "barAction"
>> }
>> prop1 = "val1"
>> }
>>
>> Then, to create our Spark context, we transparently pass the spark
>> section to a SparkConf instance.
>> This idiom will instantiate the context with the spark specific
>> configuration:
>>
>>
>> sparkConfig.setAll(configToStringSeq(config.getConfig("spark").atPath("spark")))
>>
>> And we can make use of the config object everywhere else.
>>
>> We use the override model of the typesafe config: reasonable defaults go
>> in the reference.conf (within the jar). Environment-specific overrides go
>> in the application.conf (alongside the job jar) and hacks are passed with
>> -Dprop=value :-)
>>
>>
>> -kr, Gerard.
>>
>>
>> On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc 
>> wrote:
>>
>>> I've decided to try
>>>
>>>   spark-submit ... --conf
>>> "spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties"
>>>
>>> But when I try to retrieve the value of propertiesFile via
>>>
>>>System.err.println("propertiesFile : " +
>>> System.getProperty("propertiesFile"));
>>>
>>> I get NULL:
>>>
>>>propertiesFile : null
>>>
>>> Interestingly, when I run spark-submit with --verbose, I see that it
>>> prints:
>>>
>>>   spark.driver.extraJavaOptions ->
>>> -DpropertiesFile=/home/emre/data/belga/schemavalidator.properties
>>>
>>> I couldn't understand why I couldn't get to the value of
>>> "propertiesFile" by using standard System.getProperty method. (I can use
>>> new SparkConf().get("spark.driver.extraJavaOptions")  and manually parse
>>> it, and retrieve the value, but I'd like to know why I cannot retrieve that
>>> value using System.getProperty method).
>>>
>>> Any ideas?
>>>
>>> If I can achieve what I've described above properly, I plan to pass a
>>> properties file that resides on HDFS, so that it will be available to my
>>> driver program wherever that program runs.
>>>
>>> --
>>> Emre
>>>
>>>
>>>
>>>
>>> On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke <
>>> charles.fed...@gmail.com> wrote:
>>>
 I haven't actually tried mixing non-Spark settings into the Spark
 properties. Instead I package my properties into the jar and use the
 Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
 specific) to get at my properties:

 Properties file: src/main/resources/integration.conf

 (below $ENV might b

Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-17 Thread Charles Feduke
Emre,

As you are keeping the properties file external to the JAR you need to make
sure to submit the properties file as an additional --files (or whatever
the necessary CLI switch is) so all the executors get a copy of the file
along with the JAR.

If you know you are going to just put the properties file on HDFS then why
don't you define a custom system setting like "properties.url" and pass it
along:

(this is for Spark shell, the only CLI string I have available at the
moment:)

spark-shell --jars $JAR_NAME \
--conf 'properties.url=hdfs://config/stuff.properties' \
--conf
'spark.executor.extraJavaOptions=-Dproperties.url=hdfs://config/stuff.properties'"

... then load the properties file during initialization by examining the
properties.url system setting.

I'd still strongly recommend Typesafe Config as it makes this a lot less
painful, and I know for certain you can place your *.conf at a URL (using
the -Dconfig.url=) though it probably won't work with an HDFS URL.


On Tue Feb 17 2015 at 9:53:08 AM Gerard Maas  wrote:

> +1 for TypeSafe config
> Our practice is to include all spark properties under a 'spark' entry in
> the config file alongside job-specific configuration:
>
> A config file would look like:
> spark {
>  master = ""
>  cleaner.ttl = 123456
>  ...
> }
> job {
> context {
> src = "foo"
> action = "barAction"
> }
> prop1 = "val1"
> }
>
> Then, to create our Spark context, we transparently pass the spark section
> to a SparkConf instance.
> This idiom will instantiate the context with the spark specific
> configuration:
>
>
> sparkConfig.setAll(configToStringSeq(config.getConfig("spark").atPath("spark")))
>
> And we can make use of the config object everywhere else.
>
> We use the override model of the typesafe config: reasonable defaults go
> in the reference.conf (within the jar). Environment-specific overrides go
> in the application.conf (alongside the job jar) and hacks are passed with
> -Dprop=value :-)
>
>
> -kr, Gerard.
>
>
> On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc 
> wrote:
>
>> I've decided to try
>>
>>   spark-submit ... --conf
>> "spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties"
>>
>> But when I try to retrieve the value of propertiesFile via
>>
>>System.err.println("propertiesFile : " +
>> System.getProperty("propertiesFile"));
>>
>> I get NULL:
>>
>>propertiesFile : null
>>
>> Interestingly, when I run spark-submit with --verbose, I see that it
>> prints:
>>
>>   spark.driver.extraJavaOptions ->
>> -DpropertiesFile=/home/emre/data/belga/schemavalidator.properties
>>
>> I couldn't understand why I couldn't get to the value of "propertiesFile"
>> by using standard System.getProperty method. (I can use new
>> SparkConf().get("spark.driver.extraJavaOptions")  and manually parse it,
>> and retrieve the value, but I'd like to know why I cannot retrieve that
>> value using System.getProperty method).
>>
>> Any ideas?
>>
>> If I can achieve what I've described above properly, I plan to pass a
>> properties file that resides on HDFS, so that it will be available to my
>> driver program wherever that program runs.
>>
>> --
>> Emre
>>
>>
>>
>>
>> On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke > > wrote:
>>
>>> I haven't actually tried mixing non-Spark settings into the Spark
>>> properties. Instead I package my properties into the jar and use the
>>> Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
>>> specific) to get at my properties:
>>>
>>> Properties file: src/main/resources/integration.conf
>>>
>>> (below $ENV might be set to either "integration" or "prod"[3])
>>>
>>> ssh -t root@$HOST "/root/spark/bin/spark-shell --jars /root/$JAR_NAME \
>>> --conf 'config.resource=$ENV.conf' \
>>> --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'"
>>>
>>> Since the properties file is packaged up with the JAR I don't have to
>>> worry about sending the file separately to all of the slave nodes. Typesafe
>>> Config is written in Java so it will work if you're not using Scala. (The
>>> Typesafe Config also has the advantage of being extremely easy to integrate
>>> with code that is using Java Properties today.)
>>>
>>> If you instead want to send the file separately from the JAR and you use
>>> the Typesafe Config library, you can specify "config.file" instead of
>>> ".resource"; though I'd point you to [3] below if you want to make your
>>> development life easier.
>>>
>>> 1. https://github.com/typesafehub/config
>>> 2. https://github.com/ceedubs/ficus
>>> 3.
>>> http://deploymentzone.com/2015/01/27/spark-ec2-and-easy-spark-shell-deployment/
>>>
>>>
>>>
>>> On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc 
>>> wrote:
>>>
 Hello,

 I'm using Spark 1.2.1 and have a module.properties file, and in it I
 have non-Spark properties, as well as Spark properties, e.g.:

job.output.dir=file:///home/emre/data/mymodule/out

 I'm trying 

Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-17 Thread Gerard Maas
+1 for TypeSafe config
Our practice is to include all spark properties under a 'spark' entry in
the config file alongside job-specific configuration:

A config file would look like:
spark {
 master = ""
 cleaner.ttl = 123456
 ...
}
job {
context {
src = "foo"
action = "barAction"
}
prop1 = "val1"
}

Then, to create our Spark context, we transparently pass the spark section
to a SparkConf instance.
This idiom will instantiate the context with the spark specific
configuration:

sparkConfig.setAll(configToStringSeq(config.getConfig("spark").atPath("spark")))

And we can make use of the config object everywhere else.

We use the override model of the typesafe config: reasonable defaults go in
the reference.conf (within the jar). Environment-specific overrides go in
the application.conf (alongside the job jar) and hacks are passed with
-Dprop=value :-)


-kr, Gerard.


On Tue, Feb 17, 2015 at 1:45 PM, Emre Sevinc  wrote:

> I've decided to try
>
>   spark-submit ... --conf
> "spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties"
>
> But when I try to retrieve the value of propertiesFile via
>
>System.err.println("propertiesFile : " +
> System.getProperty("propertiesFile"));
>
> I get NULL:
>
>propertiesFile : null
>
> Interestingly, when I run spark-submit with --verbose, I see that it
> prints:
>
>   spark.driver.extraJavaOptions ->
> -DpropertiesFile=/home/emre/data/belga/schemavalidator.properties
>
> I couldn't understand why I couldn't get to the value of "propertiesFile"
> by using standard System.getProperty method. (I can use new
> SparkConf().get("spark.driver.extraJavaOptions")  and manually parse it,
> and retrieve the value, but I'd like to know why I cannot retrieve that
> value using System.getProperty method).
>
> Any ideas?
>
> If I can achieve what I've described above properly, I plan to pass a
> properties file that resides on HDFS, so that it will be available to my
> driver program wherever that program runs.
>
> --
> Emre
>
>
>
>
> On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke 
> wrote:
>
>> I haven't actually tried mixing non-Spark settings into the Spark
>> properties. Instead I package my properties into the jar and use the
>> Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
>> specific) to get at my properties:
>>
>> Properties file: src/main/resources/integration.conf
>>
>> (below $ENV might be set to either "integration" or "prod"[3])
>>
>> ssh -t root@$HOST "/root/spark/bin/spark-shell --jars /root/$JAR_NAME \
>> --conf 'config.resource=$ENV.conf' \
>> --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'"
>>
>> Since the properties file is packaged up with the JAR I don't have to
>> worry about sending the file separately to all of the slave nodes. Typesafe
>> Config is written in Java so it will work if you're not using Scala. (The
>> Typesafe Config also has the advantage of being extremely easy to integrate
>> with code that is using Java Properties today.)
>>
>> If you instead want to send the file separately from the JAR and you use
>> the Typesafe Config library, you can specify "config.file" instead of
>> ".resource"; though I'd point you to [3] below if you want to make your
>> development life easier.
>>
>> 1. https://github.com/typesafehub/config
>> 2. https://github.com/ceedubs/ficus
>> 3.
>> http://deploymentzone.com/2015/01/27/spark-ec2-and-easy-spark-shell-deployment/
>>
>>
>>
>> On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc 
>> wrote:
>>
>>> Hello,
>>>
>>> I'm using Spark 1.2.1 and have a module.properties file, and in it I
>>> have non-Spark properties, as well as Spark properties, e.g.:
>>>
>>>job.output.dir=file:///home/emre/data/mymodule/out
>>>
>>> I'm trying to pass it to spark-submit via:
>>>
>>>spark-submit --class com.myModule --master local[4] --deploy-mode
>>> client --verbose --properties-file /home/emre/data/mymodule.properties
>>> mymodule.jar
>>>
>>> And I thought I could read the value of my non-Spark property, namely,
>>> job.output.dir by using:
>>>
>>> SparkConf sparkConf = new SparkConf();
>>> final String validatedJSONoutputDir =
>>> sparkConf.get("job.output.dir");
>>>
>>> But it gives me an exception:
>>>
>>> Exception in thread "main" java.util.NoSuchElementException:
>>> job.output.dir
>>>
>>> Is it not possible to mix Spark and non-Spark properties in a single
>>> .properties file, then pass it via --properties-file and then get the
>>> values of those non-Spark properties via SparkConf?
>>>
>>> Or is there another object / method to retrieve the values for those
>>> non-Spark properties?
>>>
>>>
>>> --
>>> Emre Sevinç
>>>
>>
>
>
> --
> Emre Sevinc
>


Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-17 Thread Emre Sevinc
I've decided to try

  spark-submit ... --conf
"spark.driver.extraJavaOptions=-DpropertiesFile=/home/emre/data/myModule.properties"

But when I try to retrieve the value of propertiesFile via

   System.err.println("propertiesFile : " +
System.getProperty("propertiesFile"));

I get NULL:

   propertiesFile : null

Interestingly, when I run spark-submit with --verbose, I see that it prints:

  spark.driver.extraJavaOptions ->
-DpropertiesFile=/home/emre/data/belga/schemavalidator.properties

I couldn't understand why I couldn't get to the value of "propertiesFile"
by using standard System.getProperty method. (I can use new
SparkConf().get("spark.driver.extraJavaOptions")  and manually parse it,
and retrieve the value, but I'd like to know why I cannot retrieve that
value using System.getProperty method).

Any ideas?

If I can achieve what I've described above properly, I plan to pass a
properties file that resides on HDFS, so that it will be available to my
driver program wherever that program runs.

--
Emre




On Mon, Feb 16, 2015 at 4:41 PM, Charles Feduke 
wrote:

> I haven't actually tried mixing non-Spark settings into the Spark
> properties. Instead I package my properties into the jar and use the
> Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
> specific) to get at my properties:
>
> Properties file: src/main/resources/integration.conf
>
> (below $ENV might be set to either "integration" or "prod"[3])
>
> ssh -t root@$HOST "/root/spark/bin/spark-shell --jars /root/$JAR_NAME \
> --conf 'config.resource=$ENV.conf' \
> --conf 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'"
>
> Since the properties file is packaged up with the JAR I don't have to
> worry about sending the file separately to all of the slave nodes. Typesafe
> Config is written in Java so it will work if you're not using Scala. (The
> Typesafe Config also has the advantage of being extremely easy to integrate
> with code that is using Java Properties today.)
>
> If you instead want to send the file separately from the JAR and you use
> the Typesafe Config library, you can specify "config.file" instead of
> ".resource"; though I'd point you to [3] below if you want to make your
> development life easier.
>
> 1. https://github.com/typesafehub/config
> 2. https://github.com/ceedubs/ficus
> 3.
> http://deploymentzone.com/2015/01/27/spark-ec2-and-easy-spark-shell-deployment/
>
>
>
> On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc 
> wrote:
>
>> Hello,
>>
>> I'm using Spark 1.2.1 and have a module.properties file, and in it I have
>> non-Spark properties, as well as Spark properties, e.g.:
>>
>>job.output.dir=file:///home/emre/data/mymodule/out
>>
>> I'm trying to pass it to spark-submit via:
>>
>>spark-submit --class com.myModule --master local[4] --deploy-mode
>> client --verbose --properties-file /home/emre/data/mymodule.properties
>> mymodule.jar
>>
>> And I thought I could read the value of my non-Spark property, namely,
>> job.output.dir by using:
>>
>> SparkConf sparkConf = new SparkConf();
>> final String validatedJSONoutputDir = sparkConf.get("job.output.dir");
>>
>> But it gives me an exception:
>>
>> Exception in thread "main" java.util.NoSuchElementException:
>> job.output.dir
>>
>> Is it not possible to mix Spark and non-Spark properties in a single
>> .properties file, then pass it via --properties-file and then get the
>> values of those non-Spark properties via SparkConf?
>>
>> Or is there another object / method to retrieve the values for those
>> non-Spark properties?
>>
>>
>> --
>> Emre Sevinç
>>
>


-- 
Emre Sevinc


Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Corey Nolet
We've been using commons configuration to pull our properties out of
properties files and system properties (prioritizing system properties over
others) and we add those properties to our spark conf explicitly and we use
ArgoPartser to get the command line argument for which property file to
load. We also implicitly added an extra parse args method to our SparkConf.
In our main method, we do something like this:

val sparkConf = SparkConfFactory.newSparkConf.parseModuleArts(args)
val sparkContext = new SparkContext(sparkConf)

Now all of our externally parsed properties are in the same spark conf so
we can pull them off anywhere in the program that has access to an
rdd/sparkcontext or the spark conf directly.

On Mon, Feb 16, 2015 at 10:42 AM, Sean Owen  wrote:

> How about system properties? or something like Typesafe Config which
> lets you at least override something in a built-in config file on the
> command line, with props or other files.
>
> On Mon, Feb 16, 2015 at 3:38 PM, Emre Sevinc 
> wrote:
> > Sean,
> >
> > I'm trying this as an alternative to what I currently do. Currently I
> have
> > my module.properties file for my module in the resources directory, and
> that
> > file is put inside the über JAR file when I build my application with
> Maven,
> > and then when I submit it using spark-submit, I can read that
> > module.properties file via the traditional method:
> >
> >
> >
> properties.load(MyModule.class.getClassLoader().getResourceAsStream("module.properties"));
> >
> > and everything works fine. The disadvantage is that in order to make any
> > changes to that .properties file effective, I have to re-build my
> > application. Therefore I'm trying to find a way to be able to send that
> > module.properties file via spark-submit and read the values in iy, so
> that I
> > will not be forced to build my application every time I want to make a
> > change in the module.properties file.
> >
> > I've also checked the "--files" option of spark-submit, but I see that
> it is
> > for sending the listed files to executors (correct me if I'm wrong), what
> > I'm after is being able to pass dynamic properties (key/value pairs) to
> the
> > Driver program of my Spark application. And I still could not find out
> how
> > to do that.
> >
> > --
> > Emre
> >
> >
> >
> >
> >
> > On Mon, Feb 16, 2015 at 4:28 PM, Sean Owen  wrote:
> >>
> >> Since SparkConf is only for Spark properties, I think it will in
> >> general only pay attention to and preserve "spark.*" properties. You
> >> could experiment with that. In general I wouldn't rely on Spark
> >> mechanisms for your configuration, and you can use any config
> >> mechanism you like to retain your own properties.
> >>
> >> On Mon, Feb 16, 2015 at 3:26 PM, Emre Sevinc 
> >> wrote:
> >> > Hello,
> >> >
> >> > I'm using Spark 1.2.1 and have a module.properties file, and in it I
> >> > have
> >> > non-Spark properties, as well as Spark properties, e.g.:
> >> >
> >> >job.output.dir=file:///home/emre/data/mymodule/out
> >> >
> >> > I'm trying to pass it to spark-submit via:
> >> >
> >> >spark-submit --class com.myModule --master local[4] --deploy-mode
> >> > client
> >> > --verbose --properties-file /home/emre/data/mymodule.properties
> >> > mymodule.jar
> >> >
> >> > And I thought I could read the value of my non-Spark property, namely,
> >> > job.output.dir by using:
> >> >
> >> > SparkConf sparkConf = new SparkConf();
> >> > final String validatedJSONoutputDir =
> >> > sparkConf.get("job.output.dir");
> >> >
> >> > But it gives me an exception:
> >> >
> >> > Exception in thread "main" java.util.NoSuchElementException:
> >> > job.output.dir
> >> >
> >> > Is it not possible to mix Spark and non-Spark properties in a single
> >> > .properties file, then pass it via --properties-file and then get the
> >> > values
> >> > of those non-Spark properties via SparkConf?
> >> >
> >> > Or is there another object / method to retrieve the values for those
> >> > non-Spark properties?
> >> >
> >> >
> >> > --
> >> > Emre Sevinç
> >
> >
> >
> >
> > --
> > Emre Sevinc
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Charles Feduke
I haven't actually tried mixing non-Spark settings into the Spark
properties. Instead I package my properties into the jar and use the
Typesafe Config[1] - v1.2.1 - library (along with Ficus[2] - Scala
specific) to get at my properties:

Properties file: src/main/resources/integration.conf

(below $ENV might be set to either "integration" or "prod"[3])

ssh -t root@$HOST "/root/spark/bin/spark-shell --jars /root/$JAR_NAME \
--conf 'config.resource=$ENV.conf' \
--conf 'spark.executor.extraJavaOptions=-Dconfig.resource=$ENV.conf'"

Since the properties file is packaged up with the JAR I don't have to worry
about sending the file separately to all of the slave nodes. Typesafe
Config is written in Java so it will work if you're not using Scala. (The
Typesafe Config also has the advantage of being extremely easy to integrate
with code that is using Java Properties today.)

If you instead want to send the file separately from the JAR and you use
the Typesafe Config library, you can specify "config.file" instead of
".resource"; though I'd point you to [3] below if you want to make your
development life easier.

1. https://github.com/typesafehub/config
2. https://github.com/ceedubs/ficus
3.
http://deploymentzone.com/2015/01/27/spark-ec2-and-easy-spark-shell-deployment/



On Mon Feb 16 2015 at 10:27:01 AM Emre Sevinc  wrote:

> Hello,
>
> I'm using Spark 1.2.1 and have a module.properties file, and in it I have
> non-Spark properties, as well as Spark properties, e.g.:
>
>job.output.dir=file:///home/emre/data/mymodule/out
>
> I'm trying to pass it to spark-submit via:
>
>spark-submit --class com.myModule --master local[4] --deploy-mode
> client --verbose --properties-file /home/emre/data/mymodule.properties
> mymodule.jar
>
> And I thought I could read the value of my non-Spark property, namely,
> job.output.dir by using:
>
> SparkConf sparkConf = new SparkConf();
> final String validatedJSONoutputDir = sparkConf.get("job.output.dir");
>
> But it gives me an exception:
>
> Exception in thread "main" java.util.NoSuchElementException:
> job.output.dir
>
> Is it not possible to mix Spark and non-Spark properties in a single
> .properties file, then pass it via --properties-file and then get the
> values of those non-Spark properties via SparkConf?
>
> Or is there another object / method to retrieve the values for those
> non-Spark properties?
>
>
> --
> Emre Sevinç
>


Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Sean Owen
How about system properties? or something like Typesafe Config which
lets you at least override something in a built-in config file on the
command line, with props or other files.

On Mon, Feb 16, 2015 at 3:38 PM, Emre Sevinc  wrote:
> Sean,
>
> I'm trying this as an alternative to what I currently do. Currently I have
> my module.properties file for my module in the resources directory, and that
> file is put inside the über JAR file when I build my application with Maven,
> and then when I submit it using spark-submit, I can read that
> module.properties file via the traditional method:
>
>
> properties.load(MyModule.class.getClassLoader().getResourceAsStream("module.properties"));
>
> and everything works fine. The disadvantage is that in order to make any
> changes to that .properties file effective, I have to re-build my
> application. Therefore I'm trying to find a way to be able to send that
> module.properties file via spark-submit and read the values in iy, so that I
> will not be forced to build my application every time I want to make a
> change in the module.properties file.
>
> I've also checked the "--files" option of spark-submit, but I see that it is
> for sending the listed files to executors (correct me if I'm wrong), what
> I'm after is being able to pass dynamic properties (key/value pairs) to the
> Driver program of my Spark application. And I still could not find out how
> to do that.
>
> --
> Emre
>
>
>
>
>
> On Mon, Feb 16, 2015 at 4:28 PM, Sean Owen  wrote:
>>
>> Since SparkConf is only for Spark properties, I think it will in
>> general only pay attention to and preserve "spark.*" properties. You
>> could experiment with that. In general I wouldn't rely on Spark
>> mechanisms for your configuration, and you can use any config
>> mechanism you like to retain your own properties.
>>
>> On Mon, Feb 16, 2015 at 3:26 PM, Emre Sevinc 
>> wrote:
>> > Hello,
>> >
>> > I'm using Spark 1.2.1 and have a module.properties file, and in it I
>> > have
>> > non-Spark properties, as well as Spark properties, e.g.:
>> >
>> >job.output.dir=file:///home/emre/data/mymodule/out
>> >
>> > I'm trying to pass it to spark-submit via:
>> >
>> >spark-submit --class com.myModule --master local[4] --deploy-mode
>> > client
>> > --verbose --properties-file /home/emre/data/mymodule.properties
>> > mymodule.jar
>> >
>> > And I thought I could read the value of my non-Spark property, namely,
>> > job.output.dir by using:
>> >
>> > SparkConf sparkConf = new SparkConf();
>> > final String validatedJSONoutputDir =
>> > sparkConf.get("job.output.dir");
>> >
>> > But it gives me an exception:
>> >
>> > Exception in thread "main" java.util.NoSuchElementException:
>> > job.output.dir
>> >
>> > Is it not possible to mix Spark and non-Spark properties in a single
>> > .properties file, then pass it via --properties-file and then get the
>> > values
>> > of those non-Spark properties via SparkConf?
>> >
>> > Or is there another object / method to retrieve the values for those
>> > non-Spark properties?
>> >
>> >
>> > --
>> > Emre Sevinç
>
>
>
>
> --
> Emre Sevinc

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Emre Sevinc
Sean,

I'm trying this as an alternative to what I currently do. Currently I have
my module.properties file for my module in the resources directory, and
that file is put inside the über JAR file when I build my application with
Maven, and then when I submit it using spark-submit, I can read that
module.properties file via the traditional method:


properties.load(MyModule.class.getClassLoader().getResourceAsStream("module.properties"));

and everything works fine. The disadvantage is that in order to make any
changes to that .properties file effective, I have to re-build my
application. Therefore I'm trying to find a way to be able to send that
module.properties file via spark-submit and read the values in iy, so that
I will not be forced to build my application every time I want to make a
change in the module.properties file.

I've also checked the "--files" option of spark-submit, but I see that it
is for sending the listed files to executors (correct me if I'm wrong),
what I'm after is being able to pass dynamic properties (key/value pairs)
to the Driver program of my Spark application. And I still could not find
out how to do that.

--
Emre





On Mon, Feb 16, 2015 at 4:28 PM, Sean Owen  wrote:

> Since SparkConf is only for Spark properties, I think it will in
> general only pay attention to and preserve "spark.*" properties. You
> could experiment with that. In general I wouldn't rely on Spark
> mechanisms for your configuration, and you can use any config
> mechanism you like to retain your own properties.
>
> On Mon, Feb 16, 2015 at 3:26 PM, Emre Sevinc 
> wrote:
> > Hello,
> >
> > I'm using Spark 1.2.1 and have a module.properties file, and in it I have
> > non-Spark properties, as well as Spark properties, e.g.:
> >
> >job.output.dir=file:///home/emre/data/mymodule/out
> >
> > I'm trying to pass it to spark-submit via:
> >
> >spark-submit --class com.myModule --master local[4] --deploy-mode
> client
> > --verbose --properties-file /home/emre/data/mymodule.properties
> mymodule.jar
> >
> > And I thought I could read the value of my non-Spark property, namely,
> > job.output.dir by using:
> >
> > SparkConf sparkConf = new SparkConf();
> > final String validatedJSONoutputDir =
> sparkConf.get("job.output.dir");
> >
> > But it gives me an exception:
> >
> > Exception in thread "main" java.util.NoSuchElementException:
> > job.output.dir
> >
> > Is it not possible to mix Spark and non-Spark properties in a single
> > .properties file, then pass it via --properties-file and then get the
> values
> > of those non-Spark properties via SparkConf?
> >
> > Or is there another object / method to retrieve the values for those
> > non-Spark properties?
> >
> >
> > --
> > Emre Sevinç
>



-- 
Emre Sevinc


Re: Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Sean Owen
Since SparkConf is only for Spark properties, I think it will in
general only pay attention to and preserve "spark.*" properties. You
could experiment with that. In general I wouldn't rely on Spark
mechanisms for your configuration, and you can use any config
mechanism you like to retain your own properties.

On Mon, Feb 16, 2015 at 3:26 PM, Emre Sevinc  wrote:
> Hello,
>
> I'm using Spark 1.2.1 and have a module.properties file, and in it I have
> non-Spark properties, as well as Spark properties, e.g.:
>
>job.output.dir=file:///home/emre/data/mymodule/out
>
> I'm trying to pass it to spark-submit via:
>
>spark-submit --class com.myModule --master local[4] --deploy-mode client
> --verbose --properties-file /home/emre/data/mymodule.properties mymodule.jar
>
> And I thought I could read the value of my non-Spark property, namely,
> job.output.dir by using:
>
> SparkConf sparkConf = new SparkConf();
> final String validatedJSONoutputDir = sparkConf.get("job.output.dir");
>
> But it gives me an exception:
>
> Exception in thread "main" java.util.NoSuchElementException:
> job.output.dir
>
> Is it not possible to mix Spark and non-Spark properties in a single
> .properties file, then pass it via --properties-file and then get the values
> of those non-Spark properties via SparkConf?
>
> Or is there another object / method to retrieve the values for those
> non-Spark properties?
>
>
> --
> Emre Sevinç

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Can't I mix non-Spark properties into a .properties file and pass it to spark-submit via --properties-file?

2015-02-16 Thread Emre Sevinc
Hello,

I'm using Spark 1.2.1 and have a module.properties file, and in it I have
non-Spark properties, as well as Spark properties, e.g.:

   job.output.dir=file:///home/emre/data/mymodule/out

I'm trying to pass it to spark-submit via:

   spark-submit --class com.myModule --master local[4] --deploy-mode client
--verbose --properties-file /home/emre/data/mymodule.properties
mymodule.jar

And I thought I could read the value of my non-Spark property, namely,
job.output.dir by using:

SparkConf sparkConf = new SparkConf();
final String validatedJSONoutputDir = sparkConf.get("job.output.dir");

But it gives me an exception:

Exception in thread "main" java.util.NoSuchElementException:
job.output.dir

Is it not possible to mix Spark and non-Spark properties in a single
.properties file, then pass it via --properties-file and then get the
values of those non-Spark properties via SparkConf?

Or is there another object / method to retrieve the values for those
non-Spark properties?


-- 
Emre Sevinç