As a final update, though perhaps not a good solution, I am able to upload
my jar file by simply copying (or symlinking) it to the directory
*LIVY_INSTALL/rsc/target/jars.
 *

I can then access classes from this jar file within Livy sessions either
initiating HTTP requests or the python HttpClient.

-Graham

On Thu, Sep 7, 2017 at 9:17 AM, Graham Hukill <[email protected]> wrote:

> As a followup, I am able to upload the jar file by constructing the POST
> request that initiates the Livy session, using the following:
>
> *# start session*
> *host = 'http://localhost:8998 <http://localhost:8998>'*
>
> *# start livy session as pyspark, with foo jar file included in path*
> *data =
> {'kind':'pyspark','name':'monkey','jars':['file:///path/to/foo.jar']}*
> *headers = {'Content-Type': 'application/json'}*
> *r = requests.post(host + '/sessions', data=json.dumps(data),
> headers=headers)*
>
>
> Then, when watching the sesion initialize, I see the jar file is uploaded
> in the Livy output, and code executes as expected:
>
>
> *17/09/07 08:40:24 INFO ContextLauncher: 17/09/07 08:40:24 INFO
> yarn.Client: Uploading resource file:/path/to/foo.jar ->
> hdfs://localhost/user/grahamhukill/.sparkStaging/application_1504727938366_0007/foo.jar*
>
>
> So, I'm wondering if this just a problem with how I'm initializing the
> python-api HttpClient?
>
> Digging in a bit more, I see that the python HttpClient will look for
> default config files at the environment variable LIVY_CLIENT_CONF_DIR,
> specifically looking for livy-client.conf and spark-defaults.conf.  If I
> set that env variable and create a livy-client.conf file with the following:
>
> *spark.yarn.jars = file:///path/to/foo.jar*
>
> I do see that the jar file is uploaded during session creation, but then I
> get the following error:
>
> *Error: Could not find or load main class
> org.apache.spark.deploy.yarn.ApplicationMaster*
>
> Which suggests that perhaps this overrides the same setting somewhere
> else, and that now it's not getting other jar files it needs?
>
> Thanks,
> Graham
>
> On Wed, Sep 6, 2017 at 4:29 PM, Graham Hukill <[email protected]> wrote:
>
>> I've got a local jar file that I would like to use for spark jobs, let's
>> call it *foo.jar*.
>>
>> I've successfully used it with pyspark, spark-submit, and through Livy's
>> REST API and python HttpClient.  However, I'm trying to get Livy running
>> through YARN, but I can't figure out how to include this jar file in a way
>> that Spark running in YARN will see it.
>>
>> Formerly, I would include the path of this jar file in the config that I
>> sent with Livy session creation, like:
>>
>> *LIVY_DEFAULT_SESSION_CONFIG = {*
>> *    'kind':'pyspark',*
>> *    'jars':['/path/to/foo.jar']*
>> *    }*
>>
>> However, now that I'm trying the Livy Python HttpClient, I don't have
>> that option.  The client *does *have *client.add_jar()*, and that works
>> if I'm not running behind YARN.  But with YARN, I just keep getting 
>> *ClassNotFoundException
>> *related to this jar being missing.
>>
>> I've tried uploading this jar file to the HDFS, and then including that
>> in my spark conf file at SPARK_HOME/conf/spark-defaults.conf:
>> *# spark yarn*
>> *spark.yarn.jars hdfs://localhost/user/USERACCOUNT/foo.jar*
>>
>> But I get a somewhat confusing message when Livy starts a session:
>> *INFO Client: Source and destination file systems are the same. Not
>> copying **hdfs://localhost/user/USERACCOUNT/foo.jar*
>>
>> Is there a preferred way to include external jar files for Spark jobs in
>> Livy running in YARN?  I'd even be okay just copying this file to a
>> directory that Livy uploads with each session.
>>
>> With spark_submit, it was pretty cut and dry that I'd include *--jars*
>> with the command, but I don't feel as though I have a similar option with
>> Livy.  Where I could pass it in the session configuration (see above), or
>> even from the HttpClient method *client.add_jar()*, those no longer work
>> with YARN.
>>
>> It kind of makes sense... that it's running in the YARN context and would
>> need to be uploaded to when the session is created, and that perhaps YARN
>> cannot see outside of the HDFS.
>>
>> I've seen some say that pointing to the wrong Hadoop conf directory was
>> the problem, but as far as I can tell, I'm pointing to the correct place by
>> setting this in Livy's livy-env.sh:
>> *HADOOP_CONF_DIR=/Users/USERACCOUNT/opt/hadoop-2.7.4/etc/hadoop*
>>
>> And here's my livy.conf related to deployment:
>>
>>
>>
>> *# What spark master Livy sessions should use.livy.spark.master = yarn#
>> What spark deploy mode Livy sessions should use.livy.spark.deployMode =
>> client# If livy should impersonate the requesting users when creating a new
>> session.livy.impersonation.enabled = true*
>>
>> I have a feeling this must be somewhat simple, but I'm quite stumped.
>> Any suggestions would be much appreciated.
>>
>> thanks,
>> Graham
>>
>
>

Reply via email to