As a followup, I am able to upload the jar file by constructing the POST request that initiates the Livy session, using the following:
*# start session* *host = 'http://localhost:8998 <http://localhost:8998>'* *# start livy session as pyspark, with foo jar file included in path* *data = {'kind':'pyspark','name':'monkey','jars':['file:///path/to/foo.jar']}* *headers = {'Content-Type': 'application/json'}* *r = requests.post(host + '/sessions', data=json.dumps(data), headers=headers)* Then, when watching the sesion initialize, I see the jar file is uploaded in the Livy output, and code executes as expected: *17/09/07 08:40:24 INFO ContextLauncher: 17/09/07 08:40:24 INFO yarn.Client: Uploading resource file:/path/to/foo.jar -> hdfs://localhost/user/grahamhukill/.sparkStaging/application_1504727938366_0007/foo.jar* So, I'm wondering if this just a problem with how I'm initializing the python-api HttpClient? Digging in a bit more, I see that the python HttpClient will look for default config files at the environment variable LIVY_CLIENT_CONF_DIR, specifically looking for livy-client.conf and spark-defaults.conf. If I set that env variable and create a livy-client.conf file with the following: *spark.yarn.jars = file:///path/to/foo.jar* I do see that the jar file is uploaded during session creation, but then I get the following error: *Error: Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster* Which suggests that perhaps this overrides the same setting somewhere else, and that now it's not getting other jar files it needs? Thanks, Graham On Wed, Sep 6, 2017 at 4:29 PM, Graham Hukill <[email protected]> wrote: > I've got a local jar file that I would like to use for spark jobs, let's > call it *foo.jar*. > > I've successfully used it with pyspark, spark-submit, and through Livy's > REST API and python HttpClient. However, I'm trying to get Livy running > through YARN, but I can't figure out how to include this jar file in a way > that Spark running in YARN will see it. > > Formerly, I would include the path of this jar file in the config that I > sent with Livy session creation, like: > > *LIVY_DEFAULT_SESSION_CONFIG = {* > * 'kind':'pyspark',* > * 'jars':['/path/to/foo.jar']* > * }* > > However, now that I'm trying the Livy Python HttpClient, I don't have that > option. The client *does *have *client.add_jar()*, and that works if I'm > not running behind YARN. But with YARN, I just keep getting > *ClassNotFoundException > *related to this jar being missing. > > I've tried uploading this jar file to the HDFS, and then including that in > my spark conf file at SPARK_HOME/conf/spark-defaults.conf: > *# spark yarn* > *spark.yarn.jars hdfs://localhost/user/USERACCOUNT/foo.jar* > > But I get a somewhat confusing message when Livy starts a session: > *INFO Client: Source and destination file systems are the same. Not > copying **hdfs://localhost/user/USERACCOUNT/foo.jar* > > Is there a preferred way to include external jar files for Spark jobs in > Livy running in YARN? I'd even be okay just copying this file to a > directory that Livy uploads with each session. > > With spark_submit, it was pretty cut and dry that I'd include *--jars* > with the command, but I don't feel as though I have a similar option with > Livy. Where I could pass it in the session configuration (see above), or > even from the HttpClient method *client.add_jar()*, those no longer work > with YARN. > > It kind of makes sense... that it's running in the YARN context and would > need to be uploaded to when the session is created, and that perhaps YARN > cannot see outside of the HDFS. > > I've seen some say that pointing to the wrong Hadoop conf directory was > the problem, but as far as I can tell, I'm pointing to the correct place by > setting this in Livy's livy-env.sh: > *HADOOP_CONF_DIR=/Users/USERACCOUNT/opt/hadoop-2.7.4/etc/hadoop* > > And here's my livy.conf related to deployment: > > > > *# What spark master Livy sessions should use.livy.spark.master = yarn# > What spark deploy mode Livy sessions should use.livy.spark.deployMode = > client# If livy should impersonate the requesting users when creating a new > session.livy.impersonation.enabled = true* > > I have a feeling this must be somewhat simple, but I'm quite stumped. Any > suggestions would be much appreciated. > > thanks, > Graham >
