subscribe

2014-10-11 Thread arthur.hk.c...@gmail.com
subscribe


Smoke Test after 1 days 7 hours 5 minutes 19 seconds 70 msec, Failed with Error: GC overhead limit exceeded

2014-10-11 Thread arthur.hk.c...@gmail.com
Hi,

My Hive version is 0.13.1, I tried a smoke test, after 1 days 7 hours 5 minutes 
19 seconds 70 msec, the job failed with error: Error: GC overhead limit exceeded


LOG:
2014-10-12 06:16:07,288 Stage-6 map = 100%,  reduce = 50%, Cumulative CPU 
425.35 sec
2014-10-12 06:16:12,431 Stage-6 map = 100%,  reduce = 67%, Cumulative CPU 
433.01 sec
2014-10-12 06:16:15,515 Stage-6 map = 100%,  reduce = 100%, Cumulative CPU 
447.59 sec
…...
Hadoop job information for Stage-19: number of mappers: 3; number of reducers: 0
2014-10-12 06:16:30,643 Stage-19 map = 0%,  reduce = 0%
2014-10-12 06:16:55,494 Stage-19 map = 33%,  reduce = 0%, Cumulative CPU 153.83 
sec
2014-10-12 06:16:56,520 Stage-19 map = 0%,  reduce = 0%
2014-10-12 06:17:57,037 Stage-19 map = 0%,  reduce = 0%
2014-10-12 06:18:27,720 Stage-19 map = 100%,  reduce = 0%
MapReduce Total cumulative CPU time: 2 minutes 33 seconds 830 msec
Ended Job = job_1413024651684_0033 with errors
Error during job, obtaining debugging information...
Examining task ID: task_1413024651684_0033_m_01 (and more) from job 
job_1413024651684_0033

Task with the most failures(4):
-
Task ID:
  task_1413024651684_0033_m_02

URL:
  
http://m1:8088/taskdetails.jsp?jobid=job_1413024651684_0033&tipid=task_1413024651684_0033_m_02
-
Diagnostic Messages for this Task:
Error: GC overhead limit exceeded

FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 5  Reduce: 1   Cumulative CPU: 10705.42 sec   HDFS Read: 829911667 
HDFS Write: 693918010684 SUCCESS
Job 1: Map: 2684  Reduce: 721   Cumulative CPU: 100612.23 sec   HDFS Read: 
720031197955 HDFS Write: 56301916 SUCCESS
Job 2: Map: 25  Reduce: 6   Cumulative CPU: 447.59 sec   HDFS Read: 5785850462 
HDFS Write: 22244710 SUCCESS
Job 3: Map: 3   Cumulative CPU: 153.83 sec   HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 1 days 7 hours 5 minutes 19 seconds 70 msec

my smoke test SQL :
SELECT O_YEAR, 
   SUM(CASE 
 WHEN NATION = 'BRAZIL' THEN VOLUME 
 ELSE 0 
   END) / SUM(VOLUME) AS MKT_SHARE 
FROM   (SELECT  YEAR(cast(O_ORDERDATE as date)) AS O_YEAR, 
   L_EXTENDEDPRICE * ( 1 - L_DISCOUNT ) AS VOLUME, 
   N2.N_NAMEAS NATION 
FROM   PART, 
   SUPPLIER, 
   LINEITEM, 
   ORDERS, 
   CUSTOMER, 
   NATION N1, 
   NATION N2, 
   REGION 
WHERE  P_PARTKEY = L_PARTKEY 
   AND S_SUPPKEY = L_SUPPKEY 
   AND L_ORDERKEY = O_ORDERKEY 
   AND O_CUSTKEY = C_CUSTKEY 
   AND C_NATIONKEY = N1.N_NATIONKEY 
   AND N1.N_REGIONKEY = R_REGIONKEY 
   AND R_NAME = 'AMERICA' 
   AND S_NATIONKEY = N2.N_NATIONKEY 
   AND cast(O_ORDERDATE as date)  >= cast('1995-01-01' as date) 
   AND cast(O_ORDERDATE as date)  <= cast('1996-12-31' as date) 
   AND P_TYPE = 'ECONOMY ANODIZED STEEL') AS ALL_NATIONS 
GROUP  BY O_YEAR 
ORDER  BY O_YEAR;


Please help.
Regards
Arthur





Re: Smoke Test after 1 days 7 hours 5 minutes 19 seconds 70 msec, Failed with Error: GC overhead limit exceeded

2014-10-13 Thread arthur.hk.c...@gmail.com
Hi,

I have managed to resolve the issue by turning the SQL.

Regards
Arthur
On 12 Oct, 2014, at 6:49 am, arthur.hk.c...@gmail.com 
 wrote:

> Hi,
> 
> My Hive version is 0.13.1, I tried a smoke test, after 1 days 7 hours 5 
> minutes 19 seconds 70 msec, the job failed with error: Error: GC overhead 
> limit exceeded
> 
> 
> LOG:
> 2014-10-12 06:16:07,288 Stage-6 map = 100%,  reduce = 50%, Cumulative CPU 
> 425.35 sec
> 2014-10-12 06:16:12,431 Stage-6 map = 100%,  reduce = 67%, Cumulative CPU 
> 433.01 sec
> 2014-10-12 06:16:15,515 Stage-6 map = 100%,  reduce = 100%, Cumulative CPU 
> 447.59 sec
> …...
> Hadoop job information for Stage-19: number of mappers: 3; number of 
> reducers: 0
> 2014-10-12 06:16:30,643 Stage-19 map = 0%,  reduce = 0%
> 2014-10-12 06:16:55,494 Stage-19 map = 33%,  reduce = 0%, Cumulative CPU 
> 153.83 sec
> 2014-10-12 06:16:56,520 Stage-19 map = 0%,  reduce = 0%
> 2014-10-12 06:17:57,037 Stage-19 map = 0%,  reduce = 0%
> 2014-10-12 06:18:27,720 Stage-19 map = 100%,  reduce = 0%
> MapReduce Total cumulative CPU time: 2 minutes 33 seconds 830 msec
> Ended Job = job_1413024651684_0033 with errors
> Error during job, obtaining debugging information...
> Examining task ID: task_1413024651684_0033_m_01 (and more) from job 
> job_1413024651684_0033
> 
> Task with the most failures(4):
> -
> Task ID:
>   task_1413024651684_0033_m_02
> 
> URL:
>   
> http://m1:8088/taskdetails.jsp?jobid=job_1413024651684_0033&tipid=task_1413024651684_0033_m_02
> -
> Diagnostic Messages for this Task:
> Error: GC overhead limit exceeded
> 
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> MapReduce Jobs Launched:
> Job 0: Map: 5  Reduce: 1   Cumulative CPU: 10705.42 sec   HDFS Read: 
> 829911667 HDFS Write: 693918010684 SUCCESS
> Job 1: Map: 2684  Reduce: 721   Cumulative CPU: 100612.23 sec   HDFS Read: 
> 720031197955 HDFS Write: 56301916 SUCCESS
> Job 2: Map: 25  Reduce: 6   Cumulative CPU: 447.59 sec   HDFS Read: 
> 5785850462 HDFS Write: 22244710 SUCCESS
> Job 3: Map: 3   Cumulative CPU: 153.83 sec   HDFS Read: 0 HDFS Write: 0 FAIL
> Total MapReduce CPU Time Spent: 1 days 7 hours 5 minutes 19 seconds 70 msec
> 
> my smoke test SQL :
> SELECT O_YEAR, 
>SUM(CASE 
>  WHEN NATION = 'BRAZIL' THEN VOLUME 
>  ELSE 0 
>END) / SUM(VOLUME) AS MKT_SHARE 
> FROM   (SELECT  YEAR(cast(O_ORDERDATE as date)) AS O_YEAR, 
>L_EXTENDEDPRICE * ( 1 - L_DISCOUNT ) AS VOLUME, 
>N2.N_NAMEAS NATION 
> FROM   PART, 
>SUPPLIER, 
>LINEITEM, 
>ORDERS, 
>CUSTOMER, 
>NATION N1, 
>NATION N2, 
>REGION 
> WHERE  P_PARTKEY = L_PARTKEY 
>AND S_SUPPKEY = L_SUPPKEY 
>AND L_ORDERKEY = O_ORDERKEY 
>AND O_CUSTKEY = C_CUSTKEY 
>AND C_NATIONKEY = N1.N_NATIONKEY 
>AND N1.N_REGIONKEY = R_REGIONKEY 
>AND R_NAME = 'AMERICA' 
>AND S_NATIONKEY = N2.N_NATIONKEY 
>AND cast(O_ORDERDATE as date)  >= cast('1995-01-01' as date) 
>AND cast(O_ORDERDATE as date)  <= cast('1996-12-31' as date) 
>AND P_TYPE = 'ECONOMY ANODIZED STEEL') AS ALL_NATIONS 
> GROUP  BY O_YEAR 
> ORDER  BY O_YEAR;
> 
> 
> Please help.
> Regards
> Arthur
> 
> 
> 



java.io.FileNotFoundException: File does not exist (nexr-hive-udf-0.2-SNAPSHOT.jar)

2014-12-17 Thread arthur.hk.c...@gmail.com
Hi,

Please help!

I am using hiveserver2 on HIVE 0.13 on Hadoop 2.4.1, also 
nexr-hive-udf-0.2-SNAPSHOT.jar

I can run query from CLI, e.g.
hive> SELECT add_months(sysdate(), +12) FROM DUAL;
Execution completed successfully
MapredLocal task succeeded
OK
2015-12-17
Time taken: 7.393 seconds, Fetched: 1 row(s)


hive-site.xml (added)
 
  hive.aux.jars.path
  
$HIVE_HOME/nexr-hive-udf-0.2-SNAPSHOT.jar,$HIVE_HOME/csv-serde-1.1.2-0.11.0-all.jar
 

hive-env.sh (added)
export 
HIVE_AUX_JARS_PATH=$HIVE_HOME/lib/csv-serde-1.1.2-0.11.0-all.jar:$HIVE_HOME/lib/nexr-hive-udf-0.2-SNAPSHOT.jar


However, if it is accessed via hiveserver2, I got the following error, please 
help.

Regards
Arthur




14/12/17 16:47:52 WARN conf.Configuration: 
file:/tmp/hive_2014-12-17_16-47-51_096_5821374687950910377-1/-local-10003/jobconf.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.attempts;  Ignoring.
Execution log at: 
/tmp/hduser_20141217164747_80b15b85-7820-4e3a-88ea-afffa131ff5a.log
java.io.FileNotFoundException: File does not exist: 
hdfs://mycluster/hadoop_data/hadoop_data/tmp/mapred/staging/hduser1962118853/.staging/job_local1962118853_0001/libjars/nexr-hive-udf-0.2-SNAPSHOT.jar
at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
at 
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at 
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at 
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:93)
at 
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:548)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:420)
at 
org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:740)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Job Submission failed with exception 'java.io.FileNotFoundException(File does 
not exist: 
hdfs://mycluster/hadoop_data/hadoop_data/tmp/mapred/staging/hduser1962118853/.staging/job_local1962118853_0001/libjars/nexr-hive-udf-0.2-SNAPSHOT.jar)'
Execution failed with exit status: 1
Obtaining error information



CREATE FUNCTION: How to automatically load extra jar file?

2014-12-29 Thread arthur.hk.c...@gmail.com
Hi,

I am using Hive 0.13.1 on Hadoop 2.4.1, I need to automatically load an extra 
JAR file to hive for UDF, below are my steps to create the UDF function. I have 
tried the following but still no luck to get thru.

Please help!!

Regards
Arthur


Step 1:   (make sure the jar in in HDFS)
hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
-rw-r--r--   3 hadoop hadoop  57388 2014-12-30 10:02 
hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar

Step 2: (drop if function exists) 
hive> drop function sysdate;  
OK
Time taken: 0.013 seconds

Step 3: (create function using the jar in HDFS)
hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' using 
JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
Added 
/tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
 to class path
Added resource: 
/tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
OK
Time taken: 0.034 seconds

Step 4: (test)
hive> select sysdate(); 
   
Automatically selecting local only mode for query
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/hadoop/hbase-0.98.5-hadoop2/lib/phoenix-4.1.0-client-hadoop2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/12/30 10:17:06 WARN conf.Configuration: 
file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/12/30 10:17:06 WARN conf.Configuration: 
file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
 attempt to override final parameter: yarn.nodemanager.loacl-dirs;  Ignoring.
14/12/30 10:17:06 WARN conf.Configuration: 
file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
 attempt to override final parameter: 
mapreduce.job.end-notification.max.attempts;  Ignoring.
Execution log at: 
/tmp/hadoop/hadoop_20141230101717_282ec475-8621-40fa-8178-a7927d81540b.log
java.io.FileNotFoundException: File does not exist: 
hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
at 
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
at 
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
at 
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
at 
org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
at 
org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
at 
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:562)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:557)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at 
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:557)
at org.apache.hadoop.

Re: CREATE FUNCTION: How to automatically load extra jar file?

2014-12-30 Thread arthur.hk.c...@gmail.com
Thank you.

Will this work for hiveserver2 ?


Arthur

On 30 Dec, 2014, at 2:24 pm, vic0777  wrote:

> 
> You can put it into $HOME/.hiverc like this: ADD JAR full_path_of_the_jar. 
> Then, the file is automatically loaded when Hive is started.
> 
> Wantao
> 
> 
> 
> 
> At 2014-12-30 11:01:06, "arthur.hk.c...@gmail.com"  
> wrote:
> Hi,
> 
> I am using Hive 0.13.1 on Hadoop 2.4.1, I need to automatically load an extra 
> JAR file to hive for UDF, below are my steps to create the UDF function. I 
> have tried the following but still no luck to get thru.
> 
> Please help!!
> 
> Regards
> Arthur
> 
> 
> Step 1:   (make sure the jar in in HDFS)
> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
> -rw-r--r--   3 hadoop hadoop  57388 2014-12-30 10:02 
> hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
> 
> Step 2: (drop if function exists) 
> hive> drop function sysdate;  
> OK
> Time taken: 0.013 seconds
> 
> Step 3: (create function using the jar in HDFS)
> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' 
> using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
> Added 
> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>  to class path
> Added resource: 
> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
> OK
> Time taken: 0.034 seconds
> 
> Step 4: (test)
> hive> select sysdate();   
>  
> Automatically selecting local only mode for query
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks is set to 0 since there's no reduce operator
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in 
> [jar:file:/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in 
> [jar:file:/hadoop/hbase-0.98.5-hadoop2/lib/phoenix-4.1.0-client-hadoop2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an 
> explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 14/12/30 10:17:06 WARN conf.Configuration: 
> file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
>  attempt to override final parameter: 
> mapreduce.job.end-notification.max.retry.interval;  Ignoring.
> 14/12/30 10:17:06 WARN conf.Configuration: 
> file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
>  attempt to override final parameter: yarn.nodemanager.loacl-dirs;  Ignoring.
> 14/12/30 10:17:06 WARN conf.Configuration: 
> file:/tmp/hadoop/hive_2014-12-30_10-17-04_514_2721050094719255719-1/-local-10003/jobconf.xml:an
>  attempt to override final parameter: 
> mapreduce.job.end-notification.max.attempts;  Ignoring.
> Execution log at: 
> /tmp/hadoop/hadoop_20141230101717_282ec475-8621-40fa-8178-a7927d81540b.log
> java.io.FileNotFoundException: File does not exist: 
> hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1128)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120)
>   at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>   at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120)
>   at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:288)
>   at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.getFileStatus(ClientDistributedCacheManager.java:224)
>   at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestamps(ClientDistributedCacheManager.java:99)
>   at 
> org.apache.hadoop.mapreduce.filecache.ClientDistributedCacheManager.determineTimestampsAndCacheVisibilities(ClientDistributedCacheManager.java:57)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:265)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:301)
>   at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:389)
>   at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
>   at org.

Re: CREATE FUNCTION: How to automatically load extra jar file?

2014-12-30 Thread arthur.hk.c...@gmail.com

Hi,

Thanks.

Below are my steps, I did copy my JAR to HDFS and "CREATE FUNCTION  using the 
JAR in HDFS", however during my smoke test, I got FileNotFoundException.

>> java.io.FileNotFoundException: File does not exist: 
>> hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar



>> Step 1:   (make sure the jar in in HDFS)
>> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
>> -rw-r--r--   3 hadoop hadoop  57388 2014-12-30 10:02 
>> hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>> 
>> Step 2: (drop if function exists) 
>> hive> drop function sysdate; 
>>  
>> OK
>> Time taken: 0.013 seconds
>> 
>> Step 3: (create function using the jar in HDFS)
>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' 
>> using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>> Added 
>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>  to class path
>> Added resource: 
>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>> OK
>> Time taken: 0.034 seconds
>> 
>> Step 4: (test)
>> hive> select sysdate(); 
>> Execution log at: 
>> /tmp/hadoop/hadoop_20141230101717_282ec475-8621-40fa-8178-a7927d81540b.log
>> java.io.FileNotFoundException: File does not exist: 
>> hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar


Please help!

Arthur



On 31 Dec, 2014, at 12:31 am, Nitin Pawar  wrote:

> just copy pasting Jason's reply to other thread 
> 
> If you have a recent version of Hive (0.13+), you could try registering your 
> UDF as a "permanent" UDF which was added in HIVE-6047:
> 
> 1) Copy your JAR somewhere on HDFS, say 
> hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar. 
> 2) In Hive, run CREATE FUNCTION zeroifnull AS 'com.test.udf.ZeroIfNullUDF' 
> USING JAR 'hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar';
> 
> The function definition should be saved in the metastore and Hive should 
> remember to pull the JAR from the location you specified in the CREATE 
> FUNCTION call.
> 
> On Tue, Dec 30, 2014 at 9:54 PM, arthur.hk.c...@gmail.com 
>  wrote:
> Thank you.
> 
> Will this work for hiveserver2 ?
> 
> 
> Arthur
> 
> On 30 Dec, 2014, at 2:24 pm, vic0777  wrote:
> 
>> 
>> You can put it into $HOME/.hiverc like this: ADD JAR full_path_of_the_jar. 
>> Then, the file is automatically loaded when Hive is started.
>> 
>> Wantao
>> 
>> 
>> 
>> 
>> At 2014-12-30 11:01:06, "arthur.hk.c...@gmail.com" 
>>  wrote:
>> Hi,
>> 
>> I am using Hive 0.13.1 on Hadoop 2.4.1, I need to automatically load an 
>> extra JAR file to hive for UDF, below are my steps to create the UDF 
>> function. I have tried the following but still no luck to get thru.
>> 
>> Please help!!
>> 
>> Regards
>> Arthur
>> 
>> 
>> Step 1:   (make sure the jar in in HDFS)
>> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
>> -rw-r--r--   3 hadoop hadoop  57388 2014-12-30 10:02 
>> hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>> 
>> Step 2: (drop if function exists) 
>> hive> drop function sysdate; 
>>  
>> OK
>> Time taken: 0.013 seconds
>> 
>> Step 3: (create function using the jar in HDFS)
>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' 
>> using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>> Added 
>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>  to class path
>> Added resource: 
>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>> OK
>> Time taken: 0.034 seconds
>> 
>> Step 4: (test)
>> hive> select sysdate();  
>>   
>> Automatically selecting local only mode for query
>> Total jobs = 1
>> Launching Job 1 out of 1
>> Number of reduce tasks is set to 0 since there's no reduce operator
>> SLF4J: Class path contains multiple SLF4J bindings.
>> SLF4J: Found binding in 
&g

Re: CREATE FUNCTION: How to automatically load extra jar file?

2014-12-30 Thread arthur.hk.c...@gmail.com
Hi

I have already placed it in another folder, not the /tmp/ one:

>>> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
>>> -rw-r--r--   3 hadoop hadoop  57388 2014-12-30 10:02 
>>> hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar

However, Hive places it to /tmp/ folder during its "CREATE FUNCTION USING JAR"
>>> Step 3: (create function using the jar in HDFS)
>>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' 
>>> using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> Added 
>>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>>  to class path
>>> Added resource: 
>>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> OK
>>> Time taken: 0.034 seconds


Any ideas how to avoid HIVE uses /tmp/ folder?

Arthur



On 31 Dec, 2014, at 2:27 pm, Nitin Pawar  wrote:

> If you put a file inside tmp then there is no guarantee it will live there 
> forever based on ur cluster configuration. 
> 
> You may want to put it as a place where all users can access it like making a 
> folder and keeping it read permission 
> 
> On Wed, Dec 31, 2014 at 11:40 AM, arthur.hk.c...@gmail.com 
>  wrote:
> 
> Hi,
> 
> Thanks.
> 
> Below are my steps, I did copy my JAR to HDFS and "CREATE FUNCTION  using the 
> JAR in HDFS", however during my smoke test, I got FileNotFoundException.
> 
>>> java.io.FileNotFoundException: File does not exist: 
>>> hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
> 
> 
> 
>>> Step 1:   (make sure the jar in in HDFS)
>>> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
>>> -rw-r--r--   3 hadoop hadoop  57388 2014-12-30 10:02 
>>> hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> 
>>> Step 2: (drop if function exists) 
>>> hive> drop function sysdate;
>>>   
>>> OK
>>> Time taken: 0.013 seconds
>>> 
>>> Step 3: (create function using the jar in HDFS)
>>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' 
>>> using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> Added 
>>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>>  to class path
>>> Added resource: 
>>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> OK
>>> Time taken: 0.034 seconds
>>> 
>>> Step 4: (test)
>>> hive> select sysdate(); 
>>> Execution log at: 
>>> /tmp/hadoop/hadoop_20141230101717_282ec475-8621-40fa-8178-a7927d81540b.log
>>> java.io.FileNotFoundException: File does not exist: 
>>> hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
> 
> 
> Please help!
> 
> Arthur
> 
> 
> 
> On 31 Dec, 2014, at 12:31 am, Nitin Pawar  wrote:
> 
>> just copy pasting Jason's reply to other thread 
>> 
>> If you have a recent version of Hive (0.13+), you could try registering your 
>> UDF as a "permanent" UDF which was added in HIVE-6047:
>> 
>> 1) Copy your JAR somewhere on HDFS, say 
>> hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar. 
>> 2) In Hive, run CREATE FUNCTION zeroifnull AS 'com.test.udf.ZeroIfNullUDF' 
>> USING JAR 'hdfs:///home/nirmal/udf/hiveUDF-1.0-SNAPSHOT.jar';
>> 
>> The function definition should be saved in the metastore and Hive should 
>> remember to pull the JAR from the location you specified in the CREATE 
>> FUNCTION call.
>> 
>> On Tue, Dec 30, 2014 at 9:54 PM, arthur.hk.c...@gmail.com 
>>  wrote:
>> Thank you.
>> 
>> Will this work for hiveserver2 ?
>> 
>> 
>> Arthur
>> 
>> On 30 Dec, 2014, at 2:24 pm, vic0777  wrote:
>> 
>>> 
>>> You can put it into $HOME/.hiverc like this: ADD JAR full_path_of_the_jar. 
>>> Then, the file is automatically loaded when Hive is started.
>>> 
>>> Wantao
>>> 
>>> 
>>> 
>>> 
>>> At 2014-12-30 11:01:06, "arthur.hk.c...@gmail.com" 
>>>  wrote:
>>> Hi,
>>> 
>>> I am using Hive 0.13.1 on Hadoop

Re: CREATE FUNCTION: How to automatically load extra jar file?

2015-01-03 Thread arthur.hk.c...@gmail.com
Hi,


A1: Are all of these commands (Step 1-5) from the same Hive CLI prompt?
Yes

A2:  Would you be able to check if such a file exists with the same path, on 
the local file system?
The file does not exist on the local file system.  


Is there a way to set the another “tmp" folder for HIVE? or any suggestions to 
fix this issue?

Thanks !!

Arthur
 


On 3 Jan, 2015, at 4:12 am, Jason Dere  wrote:

> The point of USING JAR as part of the CREATE FUNCTION statement to try to 
> avoid having to do ADD JAR/aux path stuff to get the UDF to work. 
> 
> Are all of these commands (Step 1-5) from the same Hive CLI prompt?
> 
>>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' 
>>> using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> Added 
>>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>>  to class path
>>> Added resource: 
>>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> OK
> 
> 
> One note, 
> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>  here should actually be on the local file system, not on HDFS where you were 
> checking in Step 5. During CREATE FUNCTION/query compilation, Hive will make 
> a copy of the source JAR (hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar), 
> copied to a temp location on the local file system where it's used by that 
> Hive session.
> 
> The location mentioned in the FileNotFoundException 
> (hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar)
>  has a different path than the local copy mentioned during CREATE FUNCTION 
> (/tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar).
>  I'm not really sure why it is a HDFS path here either, but I'm not too 
> familiar with what goes on during the job submission process. But the fact 
> that this HDFS path has the same naming convention as the directory used for 
> downloading resources locally (***_resources) looks a little fishy to me. 
> Would you be able to check if such a file exists with the same path, on the 
> local file system?
> 
> 
> 
> 
> 
> On Dec 31, 2014, at 5:22 AM, Nirmal Kumar  wrote:
> 
>>   Important: HiveQL's ADD JAR operation does not work with HiveServer2 and 
>> the Beeline client when Beeline runs on a different host. As an alterntive 
>> to ADD JAR, Hive auxiliary path functionality should be used as described 
>> below.
>> 
>> Refer:
>> http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-8-0/Cloudera-Manager-Managing-Clusters/cmmc_hive_udf.html​
>> 
>> 
>> Thanks,
>> -Nirmal
>> 
>> From: arthur.hk.c...@gmail.com 
>> Sent: Tuesday, December 30, 2014 9:54 PM
>> To: vic0777
>> Cc: arthur.hk.c...@gmail.com; user@hive.apache.org
>> Subject: Re: CREATE FUNCTION: How to automatically load extra jar file?
>>  
>> Thank you.
>> 
>> Will this work for hiveserver2 ?
>> 
>> 
>> Arthur
>> 
>> On 30 Dec, 2014, at 2:24 pm, vic0777  wrote:
>> 
>>> 
>>> You can put it into $HOME/.hiverc like this: ADD JAR full_path_of_the_jar. 
>>> Then, the file is automatically loaded when Hive is started.
>>> 
>>> Wantao
>>> 
>>> 
>>> 
>>> 
>>> At 2014-12-30 11:01:06, "arthur.hk.c...@gmail.com" 
>>>  wrote:
>>> Hi,
>>> 
>>> I am using Hive 0.13.1 on Hadoop 2.4.1, I need to automatically load an 
>>> extra JAR file to hive for UDF, below are my steps to create the UDF 
>>> function. I have tried the following but still no luck to get thru.
>>> 
>>> Please help!!
>>> 
>>> Regards
>>> Arthur
>>> 
>>> 
>>> Step 1:   (make sure the jar in in HDFS)
>>> hive> dfs -ls hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar;
>>> -rw-r--r--   3 hadoop hadoop  57388 2014-12-30 
>>> 10:02hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>> 
>>> Step 2: (drop if function exists) 
>>> hive> drop function sysdate;
>>>   
>>> OK
>>> Time taken: 0.013 seconds
>>> 
>>> Step 3: (create function using the jar in HDFS)
>>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' 
>>> using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-S

Re: CREATE FUNCTION: How to automatically load extra jar file?

2015-01-04 Thread arthur.hk.c...@gmail.com
Hi,

A question: Why does it need to copy the jar file to the temp folder? Why 
couldn’t it use the file defined in using JAR 
'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar' directly? 

Regards
Arthur


On 4 Jan, 2015, at 7:48 am, arthur.hk.c...@gmail.com  
wrote:

> Hi,
> 
> 
> A1: Are all of these commands (Step 1-5) from the same Hive CLI prompt?
> Yes
> 
> A2:  Would you be able to check if such a file exists with the same path, on 
> the local file system?
> The file does not exist on the local file system.  
> 
> 
> Is there a way to set the another “tmp" folder for HIVE? or any suggestions 
> to fix this issue?
> 
> Thanks !!
> 
> Arthur
>  
> 
> 
> On 3 Jan, 2015, at 4:12 am, Jason Dere  wrote:
> 
>> The point of USING JAR as part of the CREATE FUNCTION statement to try to 
>> avoid having to do ADD JAR/aux path stuff to get the UDF to work. 
>> 
>> Are all of these commands (Step 1-5) from the same Hive CLI prompt?
>> 
>>>> hive> CREATE FUNCTION sysdate AS 'com.nexr.platform.hive.udf.UDFSysDate' 
>>>> using JAR 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar';
>>>> converting to local hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar
>>>> Added 
>>>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>>>  to class path
>>>> Added resource: 
>>>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>>> OK
>> 
>> 
>> One note, 
>> /tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar
>>  here should actually be on the local file system, not on HDFS where you 
>> were checking in Step 5. During CREATE FUNCTION/query compilation, Hive will 
>> make a copy of the source JAR 
>> (hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar), copied to a temp 
>> location on the local file system where it's used by that Hive session.
>> 
>> The location mentioned in the FileNotFoundException 
>> (hdfs://tmp/5c658d17-dbeb-4b84-ae8d-ba936404c8bc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar)
>>  has a different path than the local copy mentioned during CREATE FUNCTION 
>> (/tmp/69700312-684c-45d3-b27a-0732bb268ddc_resources/nexr-hive-udf-0.2-SNAPSHOT.jar).
>>  I'm not really sure why it is a HDFS path here either, but I'm not too 
>> familiar with what goes on during the job submission process. But the fact 
>> that this HDFS path has the same naming convention as the directory used for 
>> downloading resources locally (***_resources) looks a little fishy to me. 
>> Would you be able to check if such a file exists with the same path, on the 
>> local file system?
>> 
>> 
>> 
>> 
>> 
>> On Dec 31, 2014, at 5:22 AM, Nirmal Kumar  wrote:
>> 
>>>   Important: HiveQL's ADD JAR operation does not work with HiveServer2 and 
>>> the Beeline client when Beeline runs on a different host. As an alterntive 
>>> to ADD JAR, Hive auxiliary path functionality should be used as described 
>>> below.
>>> 
>>> Refer:
>>> http://www.cloudera.com/content/cloudera/en/documentation/cloudera-manager/v4-8-0/Cloudera-Manager-Managing-Clusters/cmmc_hive_udf.html​
>>> 
>>> 
>>> Thanks,
>>> -Nirmal
>>> 
>>> From: arthur.hk.c...@gmail.com 
>>> Sent: Tuesday, December 30, 2014 9:54 PM
>>> To: vic0777
>>> Cc: arthur.hk.c...@gmail.com; user@hive.apache.org
>>> Subject: Re: CREATE FUNCTION: How to automatically load extra jar file?
>>>  
>>> Thank you.
>>> 
>>> Will this work for hiveserver2 ?
>>> 
>>> 
>>> Arthur
>>> 
>>> On 30 Dec, 2014, at 2:24 pm, vic0777  wrote:
>>> 
>>>> 
>>>> You can put it into $HOME/.hiverc like this: ADD JAR full_path_of_the_jar. 
>>>> Then, the file is automatically loaded when Hive is started.
>>>> 
>>>> Wantao
>>>> 
>>>> 
>>>> 
>>>> 
>>>> At 2014-12-30 11:01:06, "arthur.hk.c...@gmail.com" 
>>>>  wrote:
>>>> Hi,
>>>> 
>>>> I am using Hive 0.13.1 on Hadoop 2.4.1, I need to automatically load an 
>>>> extra JAR file to hive for UDF, below are my steps to create the UDF 
>>>> function. I have tried the following but still no luck to get thru.
>>>> 
>>>> Please help!!
>>>> 
>>>> Regards
>>>> Arthur
>>>> 
&

Re: CREATE FUNCTION: How to automatically load extra jar file?

2015-01-11 Thread arthur.hk.c...@gmail.com
 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
at com.sun.proxy.$Proxy11.getAllDatabases(Unknown Source)
at 
org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1098)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionNames(FunctionRegistry.java:671)
at 
org.apache.hadoop.hive.ql.exec.FunctionRegistry.getFunctionNames(FunctionRegistry.java:662)
at 
org.apache.hadoop.hive.cli.CliDriver.getCommandCompletor(CliDriver.java:540)
at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:758)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:686)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

Regards
Arthur



On 7 Jan, 2015, at 7:22 am, Jason Dere  wrote:

> Does your hive.log contain any lines with "adding libjars:"?
> 
> May also search for any lines containing "_resources", would like to see the 
> result of both searches.
> 
> For example, mine is showing the following line:
> 2015-01-06 14:53:28,115 INFO  mr.ExecDriver (ExecDriver.java:execute(307)) - 
> adding libjars: 
> file:///tmp/d0ed1585-d9e6-4944-b985-225351574de0_resources/spatial-sdk-hive-1.0.3-SNAPSHOT.jar,file:///tmp/d0ed1585-d9e6-4944-b985-225351574de0_resources/esri-geometry-api.jar
> 
> I wonder if your libjars setting for the map/reduce job is somehow getting 
> sent without the "file:///", which might be causing hadoop to interpret the 
> path as a HDFS path rather than a local path.
> 
> On Jan 6, 2015, at 1:11 AM, Arthur.hk.chan  wrote:
> 
>> Hi,
>> 
>> my hadoop’s core-site.xml contains following about tmp
>> 
>> 
>>   hadoop.tmp.dir
>>   /hadoop_data/hadoop_data/tmp
>> 
>> 
>> 
>> 
>> my hive-default.xml contains following about tmp
>> 
>> 
>>   hive.exec.scratchdir
>>   /tmp/hive-${user.name}
>>   Scratch space for Hive jobs
>> 
>> 
>> 
>>   hive.exec.local.scratchdir
>>   /tmp/${user.name}
>>   Local scratch space for Hive jobs
>> 
>> 
>> 
>> 
>> Will this related to configuration issue or a bug?
>> 
>> Please help!
>> 
>> Regards
>> Arthur
>> 
>> 
>> On 6 Jan, 2015, at 3:45 am, Jason Dere  wrote:
>> 
>>> During query compilation Hive needs to instantiate the UDF class and so the 
>>> JAR needs to be resolvable by the class loader, thus the JAR is copied 
>>> locally to a temp location for use.
>>> During map/reduce jobs the local jar (like all jars added with the ADD JAR 
>>> command) should then be added to the distributed cache. It looks like this 
>>> is where the issue is occurring, but based on path in the error message I 
>>> suspect that either Hive or Hadoop is mistaking what should be a local path 
>>> with an HDFS path.
>>> 
>>> On Jan 4, 2015, at 10:23 AM, arthur.hk.c...@gmail.com 
>>>  wrote:
>>> 
>>>> Hi,
>>>> 
>>>> A question: Why does it need to copy the jar file to the temp folder? Why 
>>>> couldn’t it use the file defined in using JAR 
>>>> 'hdfs://hadoop/hive/nexr-hive-udf-0.2-SNAPSHOT.jar' directly? 
>>>> 
>>>> Regards
>>>> Arthur
>>>> 
>>>> 
>>>> On 4 Jan, 2015, at 7:48 am, arthur.hk.c...@gmail.com 
>>>>  wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> 
>>>>> A1: Are all of these commands (Step 1-5) from the same Hive CLI prompt?
>>>>> Yes
>>>>> 
>>>>> A2:  Would you be able to check if such a file exists with the same path, 
>>>>> on the local file system?
>>>>> The file does not exist on the local file system.  
>>>>> 
>>>>> 
>>>>> Is there a way to set the another “tmp" folder for HIVE? or any 
>>>>> suggestions to fix this issue?
>>>>> 
>>>>

Re: CREATE FUNCTION: How to automatically load extra jar file?

2015-01-11 Thread arthur.hk.c...@gmail.com
ternal(JDOQLQuery.java:370)
at org.datanucleus.store.query.Query.executeQuery(Query.java:1744)
at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672)
at org.datanucleus.store.query.Query.execute(Query.java:1654)
at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:221)
at 
org.apache.hadoop.hive.metastore.MetaStoreDirectSql.(MetaStoreDirectSql.java:121)
at 
org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:252)
at 
org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:223)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:58)
at 
org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:497)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:475)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:523)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:397)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.(HiveMetaStore.java:356)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:54)
at 
org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4944)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:171)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:62)
at 
org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
at 
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
at 
org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


What would be wrong?
Regards
Arthur


On 11 Jan, 2015, at 5:18 pm, arthur.hk.c...@gmail.com 
 wrote:

> Hi,
> 
> 
> 2015-01-04 08:57:12,154 ERROR [main]: DataNucleus.Datastore 
> (Log4JLogger.java:error(115)) - An exception was thrown while 
> adding/validating class(es) : Specified key was too long; max key length is 
> 767 bytes
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was 
> too long; max key length is 767 bytes
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
>   at com.mysql.jdbc.Util.getInstance(Util.java:383)
>   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1062)
>   at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4226)
>   at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4158)
>   at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2615)
>   at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2776)
>   at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2834)
>   at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2783)
>   at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:908)
>   at com.mysql.jdbc.StatementImpl.execute(Statement

Re: CREATE FUNCTION: How to automatically load extra jar file?

2015-01-14 Thread arthur.hk.c...@gmail.com
eflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
Job Submission failed with exception 'java.io.FileNotFoundException(File does 
not exist: 
hdfs://mycluster/tmp/abce45b1-6041-40b6-83ed-8c6491216360_resources/nexr.jar)'
Execution failed with exit status: 1
Obtaining error information
Task failed!

5) I cannot find the "abce45b1-6041-40b6-83ed-8c6491216360_resources/nexr.jar” 
in 
"hdfs://mycluster/tmp/abce45b1-6041-40b6-83ed-8c6491216360_resources/nexr.jar”,
also not found in local /tmp/ folder.


I think it should be the case here that “libjars setting for the map/reduce job 
is somehow getting sent without the "file:///", which might be causing hadoop 
to interpret the path as a HDFS path rather than a local path."

Is there a way to verify my libjars setting for map/reduce job?

Please help!
Regards
Arthur




On 11 Jan, 2015, at 5:35 pm, arthur.hk.c...@gmail.com 
 wrote:

> Hi,
> 
> 
> 
> mysql> show variables like "character_set_database";
> +++
> | Variable_name  | Value  |
> +++
> | character_set_database | latin1 |
> +++
> 1 row in set (0.00 sec)
> 
> mysql> show variables like "collation_database";
> ++---+
> | Variable_name  | Value |
> ++---+
> | collation_database | latin1_swedish_ci |
> ++---+
> 1 row in set (0.00 sec)
> 
> 
> 
> 2015-01-11 17:21:07,835 ERROR [main]: DataNucleus.Datastore 
> (Log4JLogger.java:error(115)) - An exception was thrown while 
> adding/validating class(es) : Specified key was too long; max key length is 
> 767 bytes
> com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: Specified key was 
> too long; max key length is 767 bytes
>   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>   at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>   at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>   at com.mysql.jdbc.Util.handleNewInstance(Util.java:408)
>   at com.mysql.jdbc.Util.getInstance(Util.java:383)
>   at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1062)
>   at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4226)
>   at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4158)
>   at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2615)
>   at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2776)
>   at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2834)
>   at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2783)
>   at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:908)
>   at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:788)
>   at com.jolbox.bonecp.StatementHandle.execute(StatementHandle.java:254)
>   at 
> org.datanucleus.store.rdbms.table.AbstractTable.executeDdlStatement(AbstractTable.java:760)
>   at 
> org.datanucleus.store.rdbms.table.TableImpl.createIndices(TableImpl.java:648)
>   at 
> org.datanucleus.store.rdbms.table.TableImpl.validateIndices(TableImpl.java:593)
>   at 
> org.datanucleus.store.rdbms.table.TableImpl.validateConstraints(TableImpl.java:390)
>   at 
> org.datanucleus.store.rdbms.table.ClassTable.validateConstraints(ClassTable.java:3463)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.performTablesValidation(RDBMSStoreManager.java:3464)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.addClassTablesAndValidate(RDBMSStoreManager.java:3190)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager$ClassAdder.run(RDBMSStoreManager.java:2841)
>   at 
> org.datanucleus.store.rdbms.AbstractSchemaTransaction.execute(AbstractSchemaTransaction.java:122)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.addClasses(RDBMSStoreManager.java:1605)
>   at 
> org.datanucleus.store.AbstractStoreManager.addClass(AbstractStoreManager.java:954)
>   at 
> org.datanucleus.store.rdbms.RDBMSStoreManager.getDatastoreClass(RDBMSStoreManager.java:679)
>   at 
> org.datanucleus.store.rdbms.query.RDBMSQueryUtils.getStatementForCandidates(RDBMSQueryUtils.java:408)
>   at