Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-14 Thread Gourav Sengupta
Hi,

if you start spark or pyspark from command line and then add the option
--jars and see that things are working fine, then it means that you will
have to add the jar either to SPARK_HOME jars file or modify the spark-env
file to include the path pointing to the location where the jar file is
stored. This location has to be accessible by all the worker nodes.


Regards,
Gourav Sengupta

On Sat, Apr 14, 2018 at 6:02 PM, Jason Boorn  wrote:

> Ok great I’ll give that a shot -
>
> Thanks for all the help
>
>
> On Apr 14, 2018, at 12:08 PM, Gene Pang  wrote:
>
> Yes, I think that is the case. I haven't tried that before, but it should
> work.
>
> Thanks,
> Gene
>
> On Fri, Apr 13, 2018 at 11:32 AM, Jason Boorn  wrote:
>
>> Hi Gene -
>>
>> Are you saying that I just need to figure out how to get the Alluxio jar
>> into the classpath of my parent application?  If it shows up in the
>> classpath then Spark will automatically know that it needs to use it when
>> communicating with Alluxio?
>>
>> Apologies for going back-and-forth on this - I feel like my particular
>> use case is clouding what is already a tricky issue.
>>
>> On Apr 13, 2018, at 2:26 PM, Gene Pang  wrote:
>>
>> Hi Jason,
>>
>> Alluxio does work with Spark in master=local mode. This is because both
>> spark-submit and spark-shell have command-line options to set the classpath
>> for the JVM that is being started.
>>
>> If you are not using spark-submit or spark-shell, you will have to figure
>> out how to configure that JVM instance with the proper properties.
>>
>> Thanks,
>> Gene
>>
>> On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn  wrote:
>>
>>> Ok thanks - I was basing my design on this:
>>>
>>> https://databricks.com/blog/2016/08/15/how-to-use-sparksessi
>>> on-in-apache-spark-2-0.html
>>>
>>> Wherein it says:
>>> Once the SparkSession is instantiated, you can configure Spark’s runtime
>>> config properties.
>>> Apparently the suite of runtime configs you can change does not include
>>> classpath.
>>>
>>> So the answer to my original question is basically this:
>>>
>>> When using local (pseudo-cluster) mode, there is no way to add external
>>> jars to the spark instance.  This means that Alluxio will not work with
>>> Spark when Spark is run in master=local mode.
>>>
>>> Thanks again - often getting a definitive “no” is almost as good as a
>>> yes.  Almost ;)
>>>
>>> On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin  wrote:
>>>
>>> There are two things you're doing wrong here:
>>>
>>> On Thu, Apr 12, 2018 at 6:32 PM, jb44  wrote:
>>>
>>> Then I can add the alluxio client library like so:
>>> sparkSession.conf.set("spark.driver.extraClassPath",
>>> ALLUXIO_SPARK_CLIENT)
>>>
>>>
>>> First one, you can't modify JVM configuration after it has already
>>> started. So this line does nothing since it can't re-launch your
>>> application with a new JVM.
>>>
>>> sparkSession.conf.set("spark.executor.extraClassPath",
>>> ALLUXIO_SPARK_CLIENT)
>>>
>>>
>>> There is a lot of configuration that you cannot set after the
>>> application has already started. For example, after the session is
>>> created, most probably this option will be ignored, since executors
>>> will already have started.
>>>
>>> I'm not so sure about what happens when you use dynamic allocation,
>>> but these post-hoc config changes in general are not expected to take
>>> effect.
>>>
>>> The documentation could be clearer about this (especially stuff that
>>> only applies to spark-submit), but that's the gist of it.
>>>
>>>
>>> --
>>> Marcelo
>>>
>>>
>>>
>>
>>
>
>


Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-14 Thread Jason Boorn
Ok great I’ll give that a shot -

Thanks for all the help

> On Apr 14, 2018, at 12:08 PM, Gene Pang  wrote:
> 
> Yes, I think that is the case. I haven't tried that before, but it should 
> work.
> 
> Thanks,
> Gene
> 
> On Fri, Apr 13, 2018 at 11:32 AM, Jason Boorn  > wrote:
> Hi Gene - 
> 
> Are you saying that I just need to figure out how to get the Alluxio jar into 
> the classpath of my parent application?  If it shows up in the classpath then 
> Spark will automatically know that it needs to use it when communicating with 
> Alluxio?
> 
> Apologies for going back-and-forth on this - I feel like my particular use 
> case is clouding what is already a tricky issue.
> 
>> On Apr 13, 2018, at 2:26 PM, Gene Pang > > wrote:
>> 
>> Hi Jason,
>> 
>> Alluxio does work with Spark in master=local mode. This is because both 
>> spark-submit and spark-shell have command-line options to set the classpath 
>> for the JVM that is being started.
>> 
>> If you are not using spark-submit or spark-shell, you will have to figure 
>> out how to configure that JVM instance with the proper properties.
>> 
>> Thanks,
>> Gene
>> 
>> On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn > > wrote:
>> Ok thanks - I was basing my design on this:
>> 
>> https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html
>>  
>> 
>> 
>> Wherein it says:
>> Once the SparkSession is instantiated, you can configure Spark’s runtime 
>> config properties. 
>> Apparently the suite of runtime configs you can change does not include 
>> classpath.  
>> 
>> So the answer to my original question is basically this:
>> 
>> When using local (pseudo-cluster) mode, there is no way to add external jars 
>> to the spark instance.  This means that Alluxio will not work with Spark 
>> when Spark is run in master=local mode.
>> 
>> Thanks again - often getting a definitive “no” is almost as good as a yes.  
>> Almost ;)
>> 
>>> On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin >> > wrote:
>>> 
>>> There are two things you're doing wrong here:
>>> 
>>> On Thu, Apr 12, 2018 at 6:32 PM, jb44 >> > wrote:
 Then I can add the alluxio client library like so:
 sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
>>> 
>>> First one, you can't modify JVM configuration after it has already
>>> started. So this line does nothing since it can't re-launch your
>>> application with a new JVM.
>>> 
 sparkSession.conf.set("spark.executor.extraClassPath", 
 ALLUXIO_SPARK_CLIENT)
>>> 
>>> There is a lot of configuration that you cannot set after the
>>> application has already started. For example, after the session is
>>> created, most probably this option will be ignored, since executors
>>> will already have started.
>>> 
>>> I'm not so sure about what happens when you use dynamic allocation,
>>> but these post-hoc config changes in general are not expected to take
>>> effect.
>>> 
>>> The documentation could be clearer about this (especially stuff that
>>> only applies to spark-submit), but that's the gist of it.
>>> 
>>> 
>>> -- 
>>> Marcelo
>> 
>> 
> 
> 



Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-14 Thread Gene Pang
Yes, I think that is the case. I haven't tried that before, but it should
work.

Thanks,
Gene

On Fri, Apr 13, 2018 at 11:32 AM, Jason Boorn  wrote:

> Hi Gene -
>
> Are you saying that I just need to figure out how to get the Alluxio jar
> into the classpath of my parent application?  If it shows up in the
> classpath then Spark will automatically know that it needs to use it when
> communicating with Alluxio?
>
> Apologies for going back-and-forth on this - I feel like my particular use
> case is clouding what is already a tricky issue.
>
> On Apr 13, 2018, at 2:26 PM, Gene Pang  wrote:
>
> Hi Jason,
>
> Alluxio does work with Spark in master=local mode. This is because both
> spark-submit and spark-shell have command-line options to set the classpath
> for the JVM that is being started.
>
> If you are not using spark-submit or spark-shell, you will have to figure
> out how to configure that JVM instance with the proper properties.
>
> Thanks,
> Gene
>
> On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn  wrote:
>
>> Ok thanks - I was basing my design on this:
>>
>> https://databricks.com/blog/2016/08/15/how-to-use-sparksessi
>> on-in-apache-spark-2-0.html
>>
>> Wherein it says:
>> Once the SparkSession is instantiated, you can configure Spark’s runtime
>> config properties.
>> Apparently the suite of runtime configs you can change does not include
>> classpath.
>>
>> So the answer to my original question is basically this:
>>
>> When using local (pseudo-cluster) mode, there is no way to add external
>> jars to the spark instance.  This means that Alluxio will not work with
>> Spark when Spark is run in master=local mode.
>>
>> Thanks again - often getting a definitive “no” is almost as good as a
>> yes.  Almost ;)
>>
>> On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin  wrote:
>>
>> There are two things you're doing wrong here:
>>
>> On Thu, Apr 12, 2018 at 6:32 PM, jb44  wrote:
>>
>> Then I can add the alluxio client library like so:
>> sparkSession.conf.set("spark.driver.extraClassPath",
>> ALLUXIO_SPARK_CLIENT)
>>
>>
>> First one, you can't modify JVM configuration after it has already
>> started. So this line does nothing since it can't re-launch your
>> application with a new JVM.
>>
>> sparkSession.conf.set("spark.executor.extraClassPath",
>> ALLUXIO_SPARK_CLIENT)
>>
>>
>> There is a lot of configuration that you cannot set after the
>> application has already started. For example, after the session is
>> created, most probably this option will be ignored, since executors
>> will already have started.
>>
>> I'm not so sure about what happens when you use dynamic allocation,
>> but these post-hoc config changes in general are not expected to take
>> effect.
>>
>> The documentation could be clearer about this (especially stuff that
>> only applies to spark-submit), but that's the gist of it.
>>
>>
>> --
>> Marcelo
>>
>>
>>
>
>


Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Jason Boorn
Hi Gene - 

Are you saying that I just need to figure out how to get the Alluxio jar into 
the classpath of my parent application?  If it shows up in the classpath then 
Spark will automatically know that it needs to use it when communicating with 
Alluxio?

Apologies for going back-and-forth on this - I feel like my particular use case 
is clouding what is already a tricky issue.

> On Apr 13, 2018, at 2:26 PM, Gene Pang  wrote:
> 
> Hi Jason,
> 
> Alluxio does work with Spark in master=local mode. This is because both 
> spark-submit and spark-shell have command-line options to set the classpath 
> for the JVM that is being started.
> 
> If you are not using spark-submit or spark-shell, you will have to figure out 
> how to configure that JVM instance with the proper properties.
> 
> Thanks,
> Gene
> 
> On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn  > wrote:
> Ok thanks - I was basing my design on this:
> 
> https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html
>  
> 
> 
> Wherein it says:
> Once the SparkSession is instantiated, you can configure Spark’s runtime 
> config properties. 
> Apparently the suite of runtime configs you can change does not include 
> classpath.  
> 
> So the answer to my original question is basically this:
> 
> When using local (pseudo-cluster) mode, there is no way to add external jars 
> to the spark instance.  This means that Alluxio will not work with Spark when 
> Spark is run in master=local mode.
> 
> Thanks again - often getting a definitive “no” is almost as good as a yes.  
> Almost ;)
> 
>> On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin > > wrote:
>> 
>> There are two things you're doing wrong here:
>> 
>> On Thu, Apr 12, 2018 at 6:32 PM, jb44 > > wrote:
>>> Then I can add the alluxio client library like so:
>>> sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
>> 
>> First one, you can't modify JVM configuration after it has already
>> started. So this line does nothing since it can't re-launch your
>> application with a new JVM.
>> 
>>> sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)
>> 
>> There is a lot of configuration that you cannot set after the
>> application has already started. For example, after the session is
>> created, most probably this option will be ignored, since executors
>> will already have started.
>> 
>> I'm not so sure about what happens when you use dynamic allocation,
>> but these post-hoc config changes in general are not expected to take
>> effect.
>> 
>> The documentation could be clearer about this (especially stuff that
>> only applies to spark-submit), but that's the gist of it.
>> 
>> 
>> -- 
>> Marcelo
> 
> 



Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Gene Pang
Hi Jason,

Alluxio does work with Spark in master=local mode. This is because both
spark-submit and spark-shell have command-line options to set the classpath
for the JVM that is being started.

If you are not using spark-submit or spark-shell, you will have to figure
out how to configure that JVM instance with the proper properties.

Thanks,
Gene

On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn  wrote:

> Ok thanks - I was basing my design on this:
>
> https://databricks.com/blog/2016/08/15/how-to-use-
> sparksession-in-apache-spark-2-0.html
>
> Wherein it says:
> Once the SparkSession is instantiated, you can configure Spark’s runtime
> config properties.
> Apparently the suite of runtime configs you can change does not include
> classpath.
>
> So the answer to my original question is basically this:
>
> When using local (pseudo-cluster) mode, there is no way to add external
> jars to the spark instance.  This means that Alluxio will not work with
> Spark when Spark is run in master=local mode.
>
> Thanks again - often getting a definitive “no” is almost as good as a
> yes.  Almost ;)
>
> On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin  wrote:
>
> There are two things you're doing wrong here:
>
> On Thu, Apr 12, 2018 at 6:32 PM, jb44  wrote:
>
> Then I can add the alluxio client library like so:
> sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
>
>
> First one, you can't modify JVM configuration after it has already
> started. So this line does nothing since it can't re-launch your
> application with a new JVM.
>
> sparkSession.conf.set("spark.executor.extraClassPath",
> ALLUXIO_SPARK_CLIENT)
>
>
> There is a lot of configuration that you cannot set after the
> application has already started. For example, after the session is
> created, most probably this option will be ignored, since executors
> will already have started.
>
> I'm not so sure about what happens when you use dynamic allocation,
> but these post-hoc config changes in general are not expected to take
> effect.
>
> The documentation could be clearer about this (especially stuff that
> only applies to spark-submit), but that's the gist of it.
>
>
> --
> Marcelo
>
>
>


Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Jason Boorn
Ok thanks - I was basing my design on this:

https://databricks.com/blog/2016/08/15/how-to-use-sparksession-in-apache-spark-2-0.html
 


Wherein it says:
Once the SparkSession is instantiated, you can configure Spark’s runtime config 
properties. 
Apparently the suite of runtime configs you can change does not include 
classpath.  

So the answer to my original question is basically this:

When using local (pseudo-cluster) mode, there is no way to add external jars to 
the spark instance.  This means that Alluxio will not work with Spark when 
Spark is run in master=local mode.

Thanks again - often getting a definitive “no” is almost as good as a yes.  
Almost ;)

> On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin  wrote:
> 
> There are two things you're doing wrong here:
> 
> On Thu, Apr 12, 2018 at 6:32 PM, jb44  wrote:
>> Then I can add the alluxio client library like so:
>> sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
> 
> First one, you can't modify JVM configuration after it has already
> started. So this line does nothing since it can't re-launch your
> application with a new JVM.
> 
>> sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)
> 
> There is a lot of configuration that you cannot set after the
> application has already started. For example, after the session is
> created, most probably this option will be ignored, since executors
> will already have started.
> 
> I'm not so sure about what happens when you use dynamic allocation,
> but these post-hoc config changes in general are not expected to take
> effect.
> 
> The documentation could be clearer about this (especially stuff that
> only applies to spark-submit), but that's the gist of it.
> 
> 
> -- 
> Marcelo



Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Marcelo Vanzin
There are two things you're doing wrong here:

On Thu, Apr 12, 2018 at 6:32 PM, jb44  wrote:
> Then I can add the alluxio client library like so:
> sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)

First one, you can't modify JVM configuration after it has already
started. So this line does nothing since it can't re-launch your
application with a new JVM.

> sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)

There is a lot of configuration that you cannot set after the
application has already started. For example, after the session is
created, most probably this option will be ignored, since executors
will already have started.

I'm not so sure about what happens when you use dynamic allocation,
but these post-hoc config changes in general are not expected to take
effect.

The documentation could be clearer about this (especially stuff that
only applies to spark-submit), but that's the gist of it.


-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Jason Boorn
Thanks - I’ve seen this SO post, it covers spark-submit, which I am not using.

Regarding the ALLUXIO_SPARK_CLIENT variable, it is located on the machine that 
is running the job which spawns the master=local spark.  According to the Spark 
documentation, this should be possible, but it appears it is not.

Once again - I’m trying to solve the use case for master=local, NOT for a 
cluster and NOT with spark-submit.  

> On Apr 13, 2018, at 12:47 PM, yohann jardin  wrote:
> 
> Hey Jason,
> Might be related to what is behind your variable ALLUXIO_SPARK_CLIENT and 
> where is located the lib (is it on HDFS, on the node that submits the job, or 
> locally to all spark workers?)
> There is a great post on SO about it: https://stackoverflow.com/a/37348234 
> 
> We might as well check that you provide correctly the jar based on its 
> location. I have found it tricky in some cases.
> As a debug try, if the jar is not on HDFS, you can copy it there and then 
> specify the full path in the extraclasspath property. 
> Regards,
> Yohann Jardin
> 
> Le 4/13/2018 à 5:38 PM, Jason Boorn a écrit :
>> I do, and this is what I will fall back to if nobody has a better idea :)
>> 
>> I was just hoping to get this working as it is much more convenient for my 
>> testing pipeline.
>> 
>> Thanks again for the help
>> 
>>> On Apr 13, 2018, at 11:33 AM, Geoff Von Allmen >> > wrote:
>>> 
>>> Ok - `LOCAL` makes sense now.
>>> 
>>> Do you have the option to still use `spark-submit` in this scenario, but 
>>> using the following options:
>>> 
>>> ```bash
>>> --master "local[*]" \
>>> --deploy-mode "client" \
>>> ...
>>> ```
>>> 
>>> I know in the past, I have setup some options using `.config("Option", 
>>> "value")` when creating the spark session, and then other runtime options 
>>> as you describe above with `spark.conf.set`. At this point though I've just 
>>> moved everything out into a `spark-submit` script.
>>> 
>>> On Fri, Apr 13, 2018 at 8:18 AM, Jason Boorn >> > wrote:
>>> Hi Geoff -
>>> 
>>> Appreciate the help here - I do understand what you’re saying below.  And I 
>>> am able to get this working when I submit a job to a local cluster.
>>> 
>>> I think part of the issue here is that there’s ambiguity in the 
>>> terminology.  When I say “LOCAL” spark, I mean an instance of spark that is 
>>> created by my driver program, and is not a cluster itself.  It means that 
>>> my master node is “local”, and this mode is primarily used for testing.
>>> 
>>> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-local.html
>>>  
>>> 
>>> 
>>> While I am able to get alluxio working with spark-submit, I am unable to 
>>> get it working when using local mode.  The mechanisms for setting class 
>>> paths during spark-submit are not available in local mode.  My 
>>> understanding is that all one is able to use is:
>>> 
>>> spark.conf.set(“”)
>>> 
>>> To set any runtime properties of the local instance.  Note that it is 
>>> possible (and I am more convinced of this as time goes on) that alluxio 
>>> simply does not work in spark local mode as described above.
>>> 
>>> 
 On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen > wrote:
 
 I fought with a 
 ClassNotFoundException for quite some time, but it was for kafka.
 
 The final configuration that got everything working was running 
 spark-submit with the following options:
 
 --jars "/path/to/.ivy2/jars/package.jar" \
 --driver-class-path "/path/to/.ivy2/jars/package.jar" \
 --conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
 --packages org.some.package:package_name:version
 While this was needed for me to run in 
 cluster mode, it works equally well for 
 client mode as well.
 
 One other note when needing to supplied multiple items to these args - 
 --jars and 
 --packages should be comma separated, 
 --driver-class-path and 
 extraClassPath should be 
 : separated
 
 HTH
 
 
 On Fri, Apr 13, 2018 at 4:28 AM, jb44 > wrote:
 Haoyuan -
 
 As I mentioned below, I've been through the documentation already.  It has
 not helped me to resolve the issue.
 
 Here is what I have tried so far:
 
 - setting extraClassPath as explained below
 - adding fs.alluxio.impl through sparkconf
 - adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
 this matters in my case)
 - compiling the client from source 
 
 Do you have any other suggestions on how to get this working?  
 
 Thanks
 
 
 
 --
 Sent from: 

Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread yohann jardin
Hey Jason,

Might be related to what is behind your variable ALLUXIO_SPARK_CLIENT and where 
is located the lib (is it on HDFS, on the node that submits the job, or locally 
to all spark workers?)
There is a great post on SO about it: https://stackoverflow.com/a/37348234

We might as well check that you provide correctly the jar based on its 
location. I have found it tricky in some cases.
As a debug try, if the jar is not on HDFS, you can copy it there and then 
specify the full path in the extraclasspath property.

Regards,

Yohann Jardin

Le 4/13/2018 à 5:38 PM, Jason Boorn a écrit :
I do, and this is what I will fall back to if nobody has a better idea :)

I was just hoping to get this working as it is much more convenient for my 
testing pipeline.

Thanks again for the help

On Apr 13, 2018, at 11:33 AM, Geoff Von Allmen 
> wrote:

Ok - `LOCAL` makes sense now.

Do you have the option to still use `spark-submit` in this scenario, but using 
the following options:

```bash
--master "local[*]" \
--deploy-mode "client" \
...
```

I know in the past, I have setup some options using `.config("Option", 
"value")` when creating the spark session, and then other runtime options as 
you describe above with `spark.conf.set`. At this point though I've just moved 
everything out into a `spark-submit` script.

On Fri, Apr 13, 2018 at 8:18 AM, Jason Boorn 
> wrote:
Hi Geoff -

Appreciate the help here - I do understand what you’re saying below.  And I am 
able to get this working when I submit a job to a local cluster.

I think part of the issue here is that there’s ambiguity in the terminology.  
When I say “LOCAL” spark, I mean an instance of spark that is created by my 
driver program, and is not a cluster itself.  It means that my master node is 
“local”, and this mode is primarily used for testing.

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-local.html

While I am able to get alluxio working with spark-submit, I am unable to get it 
working when using local mode.  The mechanisms for setting class paths during 
spark-submit are not available in local mode.  My understanding is that all one 
is able to use is:

spark.conf.set(“”)

To set any runtime properties of the local instance.  Note that it is possible 
(and I am more convinced of this as time goes on) that alluxio simply does not 
work in spark local mode as described above.


On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen 
> wrote:


I fought with a ClassNotFoundException for quite some time, but it was for 
kafka.

The final configuration that got everything working was running spark-submit 
with the following options:

--jars "/path/to/.ivy2/jars/package.jar" \
--driver-class-path "/path/to/.ivy2/jars/package.jar" \
--conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
--packages org.some.package:package_name:version


While this was needed for me to run in cluster mode, it works equally well for 
client mode as well.

One other note when needing to supplied multiple items to these args - --jars 
and --packages should be comma separated, --driver-class-path and 
extraClassPath should be : separated

HTH

​

On Fri, Apr 13, 2018 at 4:28 AM, jb44 
> wrote:
Haoyuan -

As I mentioned below, I've been through the documentation already.  It has
not helped me to resolve the issue.

Here is what I have tried so far:

- setting extraClassPath as explained below
- adding fs.alluxio.impl through sparkconf
- adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
this matters in my case)
- compiling the client from source

Do you have any other suggestions on how to get this working?

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org








Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Jason Boorn
I do, and this is what I will fall back to if nobody has a better idea :)

I was just hoping to get this working as it is much more convenient for my 
testing pipeline.

Thanks again for the help

> On Apr 13, 2018, at 11:33 AM, Geoff Von Allmen  wrote:
> 
> Ok - `LOCAL` makes sense now.
> 
> Do you have the option to still use `spark-submit` in this scenario, but 
> using the following options:
> 
> ```bash
> --master "local[*]" \
> --deploy-mode "client" \
> ...
> ```
> 
> I know in the past, I have setup some options using `.config("Option", 
> "value")` when creating the spark session, and then other runtime options as 
> you describe above with `spark.conf.set`. At this point though I've just 
> moved everything out into a `spark-submit` script.
> 
> On Fri, Apr 13, 2018 at 8:18 AM, Jason Boorn  > wrote:
> Hi Geoff -
> 
> Appreciate the help here - I do understand what you’re saying below.  And I 
> am able to get this working when I submit a job to a local cluster.
> 
> I think part of the issue here is that there’s ambiguity in the terminology.  
> When I say “LOCAL” spark, I mean an instance of spark that is created by my 
> driver program, and is not a cluster itself.  It means that my master node is 
> “local”, and this mode is primarily used for testing.
> 
> https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-local.html
>  
> 
> 
> While I am able to get alluxio working with spark-submit, I am unable to get 
> it working when using local mode.  The mechanisms for setting class paths 
> during spark-submit are not available in local mode.  My understanding is 
> that all one is able to use is:
> 
> spark.conf.set(“”)
> 
> To set any runtime properties of the local instance.  Note that it is 
> possible (and I am more convinced of this as time goes on) that alluxio 
> simply does not work in spark local mode as described above.
> 
> 
>> On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen > > wrote:
>> 
>> I fought with a ClassNotFoundException for quite some time, but it was for 
>> kafka.
>> 
>> The final configuration that got everything working was running spark-submit 
>> with the following options:
>> 
>> --jars "/path/to/.ivy2/jars/package.jar" \
>> --driver-class-path "/path/to/.ivy2/jars/package.jar" \
>> --conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
>> --packages org.some.package:package_name:version
>> While this was needed for me to run in cluster mode, it works equally well 
>> for client mode as well.
>> 
>> One other note when needing to supplied multiple items to these args - 
>> --jars and --packages should be comma separated, --driver-class-path and 
>> extraClassPath should be : separated
>> 
>> HTH
>> 
>> 
>> On Fri, Apr 13, 2018 at 4:28 AM, jb44 > > wrote:
>> Haoyuan -
>> 
>> As I mentioned below, I've been through the documentation already.  It has
>> not helped me to resolve the issue.
>> 
>> Here is what I have tried so far:
>> 
>> - setting extraClassPath as explained below
>> - adding fs.alluxio.impl through sparkconf
>> - adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
>> this matters in my case)
>> - compiling the client from source 
>> 
>> Do you have any other suggestions on how to get this working?  
>> 
>> Thanks
>> 
>> 
>> 
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ 
>> 
>> 
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
>> 
>> 
>> 
> 
> 



Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Geoff Von Allmen
Ok - `LOCAL` makes sense now.

Do you have the option to still use `spark-submit` in this scenario, but
using the following options:

```bash
--master "local[*]" \
--deploy-mode "client" \
...
```

I know in the past, I have setup some options using `.config("Option",
"value")` when creating the spark session, and then other runtime options
as you describe above with `spark.conf.set`. At this point though I've just
moved everything out into a `spark-submit` script.

On Fri, Apr 13, 2018 at 8:18 AM, Jason Boorn  wrote:

> Hi Geoff -
>
> Appreciate the help here - I do understand what you’re saying below.  And
> I am able to get this working when I submit a job to a local cluster.
>
> I think part of the issue here is that there’s ambiguity in the
> terminology.  When I say “LOCAL” spark, I mean an instance of spark that is
> created by my driver program, and is not a cluster itself.  It means that
> my master node is “local”, and this mode is primarily used for testing.
>
> https://jaceklaskowski.gitbooks.io/mastering-apache-
> spark/content/spark-local.html
>
> While I am able to get alluxio working with spark-submit, I am unable to
> get it working when using local mode.  The mechanisms for setting class
> paths during spark-submit are not available in local mode.  My
> understanding is that all one is able to use is:
>
> spark.conf.set(“”)
>
> To set any runtime properties of the local instance.  Note that it is
> possible (and I am more convinced of this as time goes on) that alluxio
> simply does not work in spark local mode as described above.
>
>
> On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen 
> wrote:
>
> I fought with a ClassNotFoundException for quite some time, but it was
> for kafka.
>
> The final configuration that got everything working was running
> spark-submit with the following options:
>
> --jars "/path/to/.ivy2/jars/package.jar" \
> --driver-class-path "/path/to/.ivy2/jars/package.jar" \
> --conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
> --packages org.some.package:package_name:version
>
> While this was needed for me to run in cluster mode, it works equally
> well for client mode as well.
>
> One other note when needing to supplied multiple items to these args -
> --jars and --packages should be comma separated, --driver-class-path and
> extraClassPath should be : separated
>
> HTH
> ​
>
> On Fri, Apr 13, 2018 at 4:28 AM, jb44  wrote:
>
>> Haoyuan -
>>
>> As I mentioned below, I've been through the documentation already.  It has
>> not helped me to resolve the issue.
>>
>> Here is what I have tried so far:
>>
>> - setting extraClassPath as explained below
>> - adding fs.alluxio.impl through sparkconf
>> - adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
>> this matters in my case)
>> - compiling the client from source
>>
>> Do you have any other suggestions on how to get this working?
>>
>> Thanks
>>
>>
>>
>> --
>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>


Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Jason Boorn
Hi Geoff -

Appreciate the help here - I do understand what you’re saying below.  And I am 
able to get this working when I submit a job to a local cluster.

I think part of the issue here is that there’s ambiguity in the terminology.  
When I say “LOCAL” spark, I mean an instance of spark that is created by my 
driver program, and is not a cluster itself.  It means that my master node is 
“local”, and this mode is primarily used for testing.

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-local.html
 


While I am able to get alluxio working with spark-submit, I am unable to get it 
working when using local mode.  The mechanisms for setting class paths during 
spark-submit are not available in local mode.  My understanding is that all one 
is able to use is:

spark.conf.set(“”)

To set any runtime properties of the local instance.  Note that it is possible 
(and I am more convinced of this as time goes on) that alluxio simply does not 
work in spark local mode as described above.


> On Apr 13, 2018, at 11:09 AM, Geoff Von Allmen  wrote:
> 
> I fought with a ClassNotFoundException for quite some time, but it was for 
> kafka.
> 
> The final configuration that got everything working was running spark-submit 
> with the following options:
> 
> --jars "/path/to/.ivy2/jars/package.jar" \
> --driver-class-path "/path/to/.ivy2/jars/package.jar" \
> --conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
> --packages org.some.package:package_name:version
> While this was needed for me to run in cluster mode, it works equally well 
> for client mode as well.
> 
> One other note when needing to supplied multiple items to these args - --jars 
> and --packages should be comma separated, --driver-class-path and 
> extraClassPath should be : separated
> 
> HTH
> 
> 
> On Fri, Apr 13, 2018 at 4:28 AM, jb44  > wrote:
> Haoyuan -
> 
> As I mentioned below, I've been through the documentation already.  It has
> not helped me to resolve the issue.
> 
> Here is what I have tried so far:
> 
> - setting extraClassPath as explained below
> - adding fs.alluxio.impl through sparkconf
> - adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
> this matters in my case)
> - compiling the client from source 
> 
> Do you have any other suggestions on how to get this working?  
> 
> Thanks
> 
> 
> 
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ 
> 
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org 
> 
> 
> 



Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread Geoff Von Allmen
I fought with a ClassNotFoundException for quite some time, but it was for
kafka.

The final configuration that got everything working was running spark-submit
with the following options:

--jars "/path/to/.ivy2/jars/package.jar" \
--driver-class-path "/path/to/.ivy2/jars/package.jar" \
--conf "spark.executor.extraClassPath=/path/to/.ivy2/package.jar" \
--packages org.some.package:package_name:version

While this was needed for me to run in cluster mode, it works equally well
for client mode as well.

One other note when needing to supplied multiple items to these args -
--jars and --packages should be comma separated, --driver-class-path and
extraClassPath should be : separated

HTH
​

On Fri, Apr 13, 2018 at 4:28 AM, jb44  wrote:

> Haoyuan -
>
> As I mentioned below, I've been through the documentation already.  It has
> not helped me to resolve the issue.
>
> Here is what I have tried so far:
>
> - setting extraClassPath as explained below
> - adding fs.alluxio.impl through sparkconf
> - adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
> this matters in my case)
> - compiling the client from source
>
> Do you have any other suggestions on how to get this working?
>
> Thanks
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-13 Thread jb44
Haoyuan -

As I mentioned below, I've been through the documentation already.  It has
not helped me to resolve the issue.

Here is what I have tried so far:

- setting extraClassPath as explained below
- adding fs.alluxio.impl through sparkconf
- adding spark.sql.hive.metastore.sharedPrefixes (though I don't believe
this matters in my case)
- compiling the client from source 

Do you have any other suggestions on how to get this working?  

Thanks



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark LOCAL mode and external jar (extraClassPath)

2018-04-12 Thread Haoyuan Li
This link should be helpful:
https://alluxio.org/docs/1.7/en/Running-Spark-on-Alluxio.html

Best regards,

Haoyuan (HY)

alluxio.com  | alluxio.org
 | powered
by Alluxio 


On Thu, Apr 12, 2018 at 6:32 PM, jb44  wrote:

> I'm running spark in LOCAL mode and trying to get it to talk to alluxio.
> I'm
> getting the error: java.lang.ClassNotFoundException: Class
> alluxio.hadoop.FileSystem not found
> The cause of this error is apparently that Spark cannot find the alluxio
> client jar in its classpath.
>
> I have looked at the page here:
> https://www.alluxio.org/docs/master/en/Debugging-Guide.
> html#q-why-do-i-see-exceptions-like-javalangruntimeexception-
> javalangclassnotfoundexception-class-alluxiohadoopfilesystem-not-found
>
> Which details the steps to take in this situation, but I'm not finding
> success.
>
> According to Spark documentation, I can instance a local Spark like so:
>
> SparkSession.builder
>   .appName("App")
>   .getOrCreate
>
> Then I can add the alluxio client library like so:
> sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
> sparkSession.conf.set("spark.executor.extraClassPath",
> ALLUXIO_SPARK_CLIENT)
>
> I have verified that the proper jar file exists in the right location on my
> local machine with:
> logger.error(sparkSession.conf.get("spark.driver.extraClassPath"))
> logger.error(sparkSession.conf.get("spark.executor.extraClassPath"))
>
> But I still get the error. Is there anything else I can do to figure out
> why
> Spark is not picking the library up?
>
> Please note I am not using spark-submit - I am aware of the methods for
> adding the client jar to a spark-submit job. My Spark instance is being
> created as local within my application and this is the use case I want to
> solve.
>
> As an FYI there is another application in the cluster which is connecting
> to
> my alluxio using the fs client and that all works fine. In that case,
> though, the fs client is being packaged as part of the application through
> standard sbt dependencies.
>
>
>
>
>
> --
> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Spark LOCAL mode and external jar (extraClassPath)

2018-04-12 Thread jb44
I'm running spark in LOCAL mode and trying to get it to talk to alluxio. I'm
getting the error: java.lang.ClassNotFoundException: Class
alluxio.hadoop.FileSystem not found
The cause of this error is apparently that Spark cannot find the alluxio
client jar in its classpath.

I have looked at the page here:
https://www.alluxio.org/docs/master/en/Debugging-Guide.html#q-why-do-i-see-exceptions-like-javalangruntimeexception-javalangclassnotfoundexception-class-alluxiohadoopfilesystem-not-found

Which details the steps to take in this situation, but I'm not finding
success.

According to Spark documentation, I can instance a local Spark like so:

SparkSession.builder
  .appName("App")
  .getOrCreate

Then I can add the alluxio client library like so:
sparkSession.conf.set("spark.driver.extraClassPath", ALLUXIO_SPARK_CLIENT)
sparkSession.conf.set("spark.executor.extraClassPath", ALLUXIO_SPARK_CLIENT)

I have verified that the proper jar file exists in the right location on my
local machine with:
logger.error(sparkSession.conf.get("spark.driver.extraClassPath"))
logger.error(sparkSession.conf.get("spark.executor.extraClassPath"))

But I still get the error. Is there anything else I can do to figure out why
Spark is not picking the library up?

Please note I am not using spark-submit - I am aware of the methods for
adding the client jar to a spark-submit job. My Spark instance is being
created as local within my application and this is the use case I want to
solve.

As an FYI there is another application in the cluster which is connecting to
my alluxio using the fs client and that all works fine. In that case,
though, the fs client is being packaged as part of the application through
standard sbt dependencies.





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org