As a followup, I am able to upload the jar file by constructing the POST
request that initiates the Livy session, using the following:

*# start session*
*host = 'http://localhost:8998 <http://localhost:8998>'*

*# start livy session as pyspark, with foo jar file included in path*
*data =
{'kind':'pyspark','name':'monkey','jars':['file:///path/to/foo.jar']}*
*headers = {'Content-Type': 'application/json'}*
*r = requests.post(host + '/sessions', data=json.dumps(data),
headers=headers)*


Then, when watching the sesion initialize, I see the jar file is uploaded
in the Livy output, and code executes as expected:


*17/09/07 08:40:24 INFO ContextLauncher: 17/09/07 08:40:24 INFO
yarn.Client: Uploading resource file:/path/to/foo.jar ->
hdfs://localhost/user/grahamhukill/.sparkStaging/application_1504727938366_0007/foo.jar*


So, I'm wondering if this just a problem with how I'm initializing the
python-api HttpClient?

Digging in a bit more, I see that the python HttpClient will look for
default config files at the environment variable LIVY_CLIENT_CONF_DIR,
specifically looking for livy-client.conf and spark-defaults.conf.  If I
set that env variable and create a livy-client.conf file with the following:

*spark.yarn.jars = file:///path/to/foo.jar*

I do see that the jar file is uploaded during session creation, but then I
get the following error:

*Error: Could not find or load main class
org.apache.spark.deploy.yarn.ApplicationMaster*

Which suggests that perhaps this overrides the same setting somewhere else,
and that now it's not getting other jar files it needs?

Thanks,
Graham

On Wed, Sep 6, 2017 at 4:29 PM, Graham Hukill <[email protected]> wrote:

> I've got a local jar file that I would like to use for spark jobs, let's
> call it *foo.jar*.
>
> I've successfully used it with pyspark, spark-submit, and through Livy's
> REST API and python HttpClient.  However, I'm trying to get Livy running
> through YARN, but I can't figure out how to include this jar file in a way
> that Spark running in YARN will see it.
>
> Formerly, I would include the path of this jar file in the config that I
> sent with Livy session creation, like:
>
> *LIVY_DEFAULT_SESSION_CONFIG = {*
> *    'kind':'pyspark',*
> *    'jars':['/path/to/foo.jar']*
> *    }*
>
> However, now that I'm trying the Livy Python HttpClient, I don't have that
> option.  The client *does *have *client.add_jar()*, and that works if I'm
> not running behind YARN.  But with YARN, I just keep getting 
> *ClassNotFoundException
> *related to this jar being missing.
>
> I've tried uploading this jar file to the HDFS, and then including that in
> my spark conf file at SPARK_HOME/conf/spark-defaults.conf:
> *# spark yarn*
> *spark.yarn.jars hdfs://localhost/user/USERACCOUNT/foo.jar*
>
> But I get a somewhat confusing message when Livy starts a session:
> *INFO Client: Source and destination file systems are the same. Not
> copying **hdfs://localhost/user/USERACCOUNT/foo.jar*
>
> Is there a preferred way to include external jar files for Spark jobs in
> Livy running in YARN?  I'd even be okay just copying this file to a
> directory that Livy uploads with each session.
>
> With spark_submit, it was pretty cut and dry that I'd include *--jars*
> with the command, but I don't feel as though I have a similar option with
> Livy.  Where I could pass it in the session configuration (see above), or
> even from the HttpClient method *client.add_jar()*, those no longer work
> with YARN.
>
> It kind of makes sense... that it's running in the YARN context and would
> need to be uploaded to when the session is created, and that perhaps YARN
> cannot see outside of the HDFS.
>
> I've seen some say that pointing to the wrong Hadoop conf directory was
> the problem, but as far as I can tell, I'm pointing to the correct place by
> setting this in Livy's livy-env.sh:
> *HADOOP_CONF_DIR=/Users/USERACCOUNT/opt/hadoop-2.7.4/etc/hadoop*
>
> And here's my livy.conf related to deployment:
>
>
>
> *# What spark master Livy sessions should use.livy.spark.master = yarn#
> What spark deploy mode Livy sessions should use.livy.spark.deployMode =
> client# If livy should impersonate the requesting users when creating a new
> session.livy.impersonation.enabled = true*
>
> I have a feeling this must be somewhat simple, but I'm quite stumped.  Any
> suggestions would be much appreciated.
>
> thanks,
> Graham
>

Reply via email to