As a final update, though perhaps not a good solution, I am able to upload my jar file by simply copying (or symlinking) it to the directory *LIVY_INSTALL/rsc/target/jars. *
I can then access classes from this jar file within Livy sessions either initiating HTTP requests or the python HttpClient. -Graham On Thu, Sep 7, 2017 at 9:17 AM, Graham Hukill <[email protected]> wrote: > As a followup, I am able to upload the jar file by constructing the POST > request that initiates the Livy session, using the following: > > *# start session* > *host = 'http://localhost:8998 <http://localhost:8998>'* > > *# start livy session as pyspark, with foo jar file included in path* > *data = > {'kind':'pyspark','name':'monkey','jars':['file:///path/to/foo.jar']}* > *headers = {'Content-Type': 'application/json'}* > *r = requests.post(host + '/sessions', data=json.dumps(data), > headers=headers)* > > > Then, when watching the sesion initialize, I see the jar file is uploaded > in the Livy output, and code executes as expected: > > > *17/09/07 08:40:24 INFO ContextLauncher: 17/09/07 08:40:24 INFO > yarn.Client: Uploading resource file:/path/to/foo.jar -> > hdfs://localhost/user/grahamhukill/.sparkStaging/application_1504727938366_0007/foo.jar* > > > So, I'm wondering if this just a problem with how I'm initializing the > python-api HttpClient? > > Digging in a bit more, I see that the python HttpClient will look for > default config files at the environment variable LIVY_CLIENT_CONF_DIR, > specifically looking for livy-client.conf and spark-defaults.conf. If I > set that env variable and create a livy-client.conf file with the following: > > *spark.yarn.jars = file:///path/to/foo.jar* > > I do see that the jar file is uploaded during session creation, but then I > get the following error: > > *Error: Could not find or load main class > org.apache.spark.deploy.yarn.ApplicationMaster* > > Which suggests that perhaps this overrides the same setting somewhere > else, and that now it's not getting other jar files it needs? > > Thanks, > Graham > > On Wed, Sep 6, 2017 at 4:29 PM, Graham Hukill <[email protected]> wrote: > >> I've got a local jar file that I would like to use for spark jobs, let's >> call it *foo.jar*. >> >> I've successfully used it with pyspark, spark-submit, and through Livy's >> REST API and python HttpClient. However, I'm trying to get Livy running >> through YARN, but I can't figure out how to include this jar file in a way >> that Spark running in YARN will see it. >> >> Formerly, I would include the path of this jar file in the config that I >> sent with Livy session creation, like: >> >> *LIVY_DEFAULT_SESSION_CONFIG = {* >> * 'kind':'pyspark',* >> * 'jars':['/path/to/foo.jar']* >> * }* >> >> However, now that I'm trying the Livy Python HttpClient, I don't have >> that option. The client *does *have *client.add_jar()*, and that works >> if I'm not running behind YARN. But with YARN, I just keep getting >> *ClassNotFoundException >> *related to this jar being missing. >> >> I've tried uploading this jar file to the HDFS, and then including that >> in my spark conf file at SPARK_HOME/conf/spark-defaults.conf: >> *# spark yarn* >> *spark.yarn.jars hdfs://localhost/user/USERACCOUNT/foo.jar* >> >> But I get a somewhat confusing message when Livy starts a session: >> *INFO Client: Source and destination file systems are the same. Not >> copying **hdfs://localhost/user/USERACCOUNT/foo.jar* >> >> Is there a preferred way to include external jar files for Spark jobs in >> Livy running in YARN? I'd even be okay just copying this file to a >> directory that Livy uploads with each session. >> >> With spark_submit, it was pretty cut and dry that I'd include *--jars* >> with the command, but I don't feel as though I have a similar option with >> Livy. Where I could pass it in the session configuration (see above), or >> even from the HttpClient method *client.add_jar()*, those no longer work >> with YARN. >> >> It kind of makes sense... that it's running in the YARN context and would >> need to be uploaded to when the session is created, and that perhaps YARN >> cannot see outside of the HDFS. >> >> I've seen some say that pointing to the wrong Hadoop conf directory was >> the problem, but as far as I can tell, I'm pointing to the correct place by >> setting this in Livy's livy-env.sh: >> *HADOOP_CONF_DIR=/Users/USERACCOUNT/opt/hadoop-2.7.4/etc/hadoop* >> >> And here's my livy.conf related to deployment: >> >> >> >> *# What spark master Livy sessions should use.livy.spark.master = yarn# >> What spark deploy mode Livy sessions should use.livy.spark.deployMode = >> client# If livy should impersonate the requesting users when creating a new >> session.livy.impersonation.enabled = true* >> >> I have a feeling this must be somewhat simple, but I'm quite stumped. >> Any suggestions would be much appreciated. >> >> thanks, >> Graham >> > >
