Re: CREATE FUNCTION: How to automatically load extra jar file?

arthur.hk.c...@gmail.com Sun, 04 Jan 2015 10:24:54 -0800

Hi,

A question: Why does it need to copy the jar file to the temp folder? Why 
couldn’t it use the file defined in using JAR 
'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar' directly?


Regards
Arthur


On 4 Jan, 2015, at 7:48 am, arthur.hk.c...@gmail.com <arthur.hk.c...@gmail.com> 
wrote:

> Hi,
> 
> 
> A1: Are all of these commands (Step 1-5) from the same Hive CLI prompt?
> Yes
> 
> A2:  Would you be able to check if such a file exists with the same path, on 
> the local file system?
> The file does not exist on the local file system.  
> 
> 
> Is there a way to set the another “tmp" folder for HIVE? or any suggestions 
> to fix this issue?
> 
> Thanks !!
> 
> Arthur
>  
> 
> 
> On 3 Jan, 2015, at 4:12 am, Jason Dere <jd...@hortonworks.com> wrote:
> 
>> The point of USING JAR as part of the CREATE FUNCTION statement to try to 
>> avoid having to do ADD JAR/aux path stuff to get the UDF to work. 
>> 
>> Are all of these commands (Step 1-5) from the same Hive CLI prompt?
>> 
>>>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' 
>>>> using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>>>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>>> Added 
>>>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>>>  to class path
>>>> Added resource: 
>>>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>>> OK
>> 
>> 
>> One note, 
>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>  here should actually be on the local file system, not on HDFS where you 
>> were checking in Step 5. During CREATE FUNCTION/query compilation, Hive will 
>> make a copy of the source JAR 
>> (hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar), copied to a temp 
>> location on the local file system where it's used by that Hive session.
>> 
>> The location mentioned in the FileNotFoundException 
>> (hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar)
>>  has a different path than the local copy mentioned during CREATE FUNCTION 
>> (/tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar).
>>  I'm not really sure why it is a HDFS path here either, but I'm not too 
>> familiar with what goes on during the job submission process. But the fact 
>> that this HDFS path has the same naming convention as the directory used for 
>> downloading resources locally (***_resources) looks a little fishy to me. 
>> Would you be able to check if such a file exists with the same path, on the 
>> local file system?
>> 
>> 
>> 
>> 
>> 
>> On Dec 31, 2014, at 5:22 AM, Nirmal Kumar <nirmal.ku...@impetus.co.in> wrote:
>> 
>>>   Important: HiveQL's ADD JAR operation does not work with HiveServer2 and 
>>> the Beeline client when Beeline runs on a different host. As an alterntive 
>>> to ADD JAR, Hive auxiliary path functionality should be used as described 
>>> below.
>>> 
>>> Refer:
>>> http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-8-0/Cloudera-Manager-Managing-Clusters/cmmc_hive_udf.html
>>> 
>>> 
>>> Thanks,
>>> -Nirmal
>>> 
>>> From: arthur.hk.c...@gmail.com <arthur.hk.c...@gmail.com>
>>> Sent: Tuesday, December 30, 2014 9:54 PM
>>> To: vic0777
>>> Cc: arthur.hk.c...@gmail.com; user@hive.apache.org
>>> Subject: Re: CREATE FUNCTION: How to automatically load extra jar file?
>>>  
>>> Thank you.
>>> 
>>> Will this work for hiveserver2 ?
>>> 
>>> 
>>> Arthur
>>> 
>>> On 30 Dec, 2014, at 2:24 pm, vic0777 <vic0...@163.com> wrote:
>>> 
>>>> 
>>>> You can put it into $HOME/.hiverc like this: ADD JAR full_path_of_the_jar. 
>>>> Then, the file is automatically loaded when Hive is started.
>>>> 
>>>> Wantao
>>>> 
>>>> 
>>>> 
>>>> 
>>>> At 2014-12-30 11:01:06, "arthur.hk.c...@gmail.com" 
>>>> <arthur.hk.c...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> I am using Hive 0.13.1 on Hadoop 2.4.1, I need to automatically load an 
>>>> extra JAR file to hive for UDF, below are my steps to create the UDF 
>>>> function. I have tried the following but still no luck to get thru.
>>>> 
>>>> Please help!!
>>>> 
>>>> Regards
>>>> Arthur
>>>> 
>>>> 
>>>> Step 1:   (make sure the jar in in HDFS)
>>>> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
>>>> -rw-r--r--   3 hadoop hadoop      57388 2014-12-30 
>>>> 10:02hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>>> 
>>>> Step 2: (drop if function exists) 
>>>> hive> drop function sysdate;                                               
>>>>    
>>>> OK
>>>> Time taken: 0.013 seconds
>>>> 
>>>> Step 3: (create function using the jar in HDFS)
>>>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' 
>>>> using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>>>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>>> Added 
>>>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>>>  to class path
>>>> Added resource: 
>>>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>>> OK
>>>> Time taken: 0.034 seconds
>>>> 
>>>> Step 4: (test)
>>>> hive> select sysdate();                                                    
>>>>                                                                            
>>>>  
>>>> Automatically selecting local only mode for query
>>>> Total jobs = 1
>>>> Launching Job 1 out of 1
>>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>>> SLF4J: Class path contains multiple SLF4J bindings.
>>>> SLF4J: Found binding in 
>>>> [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: Found binding in 
>>>> [jar:file:/hadoop/hbase-0.98.5-hadoop2/lib/phoenix-4.1.0-client-hadoop2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
>>>> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
>>>> explanation.
>>>> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
>>>> 14/12/30 10:17:06 WARN conf.Configuration: 
>>>> file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
>>>>  attempt to override final parameter: 
>>>> mapreduce.job.end-notification.max.retry.interval;  Ignoring.
>>>> 14/12/30 10:17:06 WARN conf.Configuration: 
>>>> file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
>>>>  attempt to override final parameter: yarn.nodemanager.loacl-dirs;  
>>>> Ignoring.
>>>> 14/12/30 10:17:06 WARN conf.Configuration: 
>>>> file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
>>>>  attempt to override final parameter: 
>>>> mapreduce.job.end-notification.max.attempts;  Ignoring.
>>>> Execution log at: 
>>>> /tmp/hadoop/hadoop_20141230101717_282ec475-8621-40fa-8178-a7927d81540b.log
>>>> java.io.FileNotFoundException: File does not 
>>>> exist:hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>>> at 
>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
>>>> at 
>>>> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
>>>> at 
>>>> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>>>> at 
>>>> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
>>>> at 
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>>>> at 
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>>>> at 
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
>>>> at 
>>>> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>>>> at 
>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
>>>> at 
>>>> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
>>>> at 
>>>> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
>>>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
>>>> at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at 
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>>>> at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
>>>> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
>>>> at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at javax.security.auth.Subject.doAs(Subject.java:415)
>>>> at 
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>>>> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
>>>> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
>>>> at 
>>>> org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
>>>> at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:740)
>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>> at 
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>>>> at 
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>>> at java.lang.reflect.Method.invoke(Method.java:606)
>>>> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
>>>> Job Submission failed with exception 'java.io.FileNotFoundException(File 
>>>> does not 
>>>> exist:hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar)'
>>>> Execution failed with exit status: 1
>>>> Obtaining error information
>>>> Task failed!
>>>> Task ID:
>>>>   Stage-1
>>>> Logs:
>>>> /tmp/hadoop/hive.log
>>>> FAILED: Execution Error, return code 1 from 
>>>> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>>>> 
>>>> 
>>>> Step 5: (check the file)
>>>> hive> dfs -ls 
>>>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar;
>>>> ls: 
>>>> `/tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar':
>>>>  No such file or directory
>>>> Command failed with exit code = 1
>>>> Query returned non-zero code: 1, cause: null
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> NOTE: This message may contain information that is confidential, 
>>> proprietary, privileged or otherwise protected by law. The message is 
>>> intended solely for the named addressee. If received in error, please 
>>> destroy and notify the sender. Any use of this email is prohibited when 
>>> received in error. Impetus does not represent, warrant and/or guarantee, 
>>> that the integrity of this communication has been maintained nor that the 
>>> communication is free of errors, virus, interception or interference.
>> 
>> 
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or entity to 
>> which it is addressed and may contain information that is confidential, 
>> privileged and exempt from disclosure under applicable law. If the reader of 
>> this message is not the intended recipient, you are hereby notified that any 
>> printing, copying, dissemination, distribution, disclosure or forwarding of 
>> this communication is strictly prohibited. If you have received this 
>> communication in error, please contact the sender immediately and delete it 
>> from your system. Thank You.
>

Re: CREATE FUNCTION: How to automatically load extra jar file?

Reply via email to