Mithun Radhakrishnan created HIVE-17574:
-------------------------------------------

             Summary: Avoid multiple copies of HDFS-based jars when localizing 
job-jars
                 Key: HIVE-17574
                 URL: https://issues.apache.org/jira/browse/HIVE-17574
             Project: Hive
          Issue Type: Bug
    Affects Versions: 2.2.0, 3.0.0, 2.4.0
            Reporter: Mithun Radhakrishnan
            Assignee: Mithun Radhakrishnan


Raising this on behalf of [~selinazh]. (For my own reference: YHIVE-1035.)

This has to do with the classpaths of Hive actions run from Oozie, and affects 
scripts that adds jars/resources from HDFS locations.

As part of Oozie's "sharelib" deploys, foundation jars (such as Hive jars) tend 
to be stored in HDFS paths, as are any custom user-libraries used in workflows. 
An {{ADD JAR|FILE|ARCHIVE}} statement in a Hive script causes the following 
steps to occur:
# Files are downloaded from HDFS to local temp dir.
# UDFs are resolved/validated.
# All jars/files, including those just downloaded from HDFS, are shipped right 
back to HDFS-based scratch-directories, for job submission.

This is wasteful and time-consuming. #3 above should skip shipping HDFS-based 
resources, and add those directly to the Tez session.

We have a patch that's being used internally at Yahoo.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to