Re: Hive Query

Bejoy Ks Tue, 24 Jul 2012 02:40:50 -0700

Hi Yogesh


I'm not exactly sure of the real root cause of the error.
From the error log and the nature of occurrence. I suspect it could be 
happening when the reduce task is not able to reach the map task node 
and fetch the map output. Something close to fetch failures. Can you try out 
the following and see whether it does make some difference
1. Increase the value of tasktracker.http.threads  (this to be done at TT level 
and not on job level, restart TT)
2. mapred.reduce.parallel.copies 


The query, I just tested it out on my local environment, It is working fine and 
returned the desired output. Looks like the root cause at your end is  some 
hadoop mis configuration as most of the issues are mostly with Map reduce jobs.

Regards
Bejoy KS




________________________________
 From: "yogesh.kuma...@wipro.com" <yogesh.kuma...@wipro.com>
To: user@hive.apache.org; bejoy...@yahoo.com 
Sent: Tuesday, July 24, 2012 2:56 PM
Subject: RE: Hive Query
 

 
Thanks Bejoy :-)

I have an error Issue with 

select count(*) from table;

it throws error 

2012-07-24 13:39:25,181 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201207231123_0011 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201207231123_0011_m_000002 (and more) from job 
job_201207231123_0011
Exception in thread "Thread-93" java.lang.RuntimeException: Error while reading 
from task log url
    at 
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
    at 
org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
    at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
    at java.lang.Thread.run(Thread.java:680)
Caused by: java.io.IOException: Server returned HTTP response code: 407 for 
URL: 
http://10.203.33.81:50060/tasklog?taskid=attempt_201207231123_0011_r_000000_0&start=-8193
    at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
    at java.net.URL.openStream(URL.java:1010)
    at 
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
    ... 3 more
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   HDFS Read: 24 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec



and I run query 


SELECT count(*),sub.name FROM (Select * FROM sitealias JOIN site ON 
(sitealias.site_id = site.site_id) ) sub GROUP BY sub.name;

it went into loop and still Map-Reduce process going on.

Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201207231123_0018, Tracking URL = 
http://localhost:50030/jobdetails.jsp?jobid=job_201207231123_0018
Kill Command = /HADOOP/hadoop-0.20.2/bin/../bin/hadoop job  
-Dmapred.job.tracker=localhost:9001 -kill job_201207231123_0018
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2012-07-24 14:42:03,824 Stage-1 map = 0%,  reduce = 0%
2012-07-24 14:42:09,850 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:43:10,030 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:44:10,177 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:45:10,358 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:46:10,516 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:47:10,672 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:48:10,882 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:49:11,016 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:50:11,152 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:51:11,409 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:52:11,550 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:53:11,679 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:54:11,807 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:55:11,935 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:56:12,060 Stage-1 map = 100%,  reduce = 0%


from past 10 minutes and still on...


Please suggest and Help

Thanks & Regards
Yogesh Kumar



________________________________
 
From: Bejoy Ks [bejoy...@yahoo.com]
Sent: Tuesday, July 24, 2012 2:33 PM
To: user@hive.apache.org
Subject: Re: Hive Query


Hi Yogesh

Try out this query, it should work though it is little expensive

SELECT count(*),sub.name FROM (Select * FROM sitealias JOIN site ON 
(sitealias.site_id = site.site_id) ) sub GROUP BY sub.name;



Regards
Bejoy KS


________________________________
 From: "yogesh.kuma...@wipro.com" <yogesh.kuma...@wipro.com>
To: user@hive.apache.org; bejoy...@yahoo.com 
Sent: Tuesday, July 24, 2012 1:39 PM
Subject: RE: Hive Query


 
Hi Bejoy,

even If if perform count(*) operation on table it shows error,

select count(*) from dummysite;


Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201207231123_0011, Tracking URL = 
http://localhost:50030/jobdetails.jsp?jobid=job_201207231123_0011
Kill Command = /HADOOP/hadoop-0.20.2/bin/../bin/hadoop job  
-Dmapred.job.tracker=localhost:9001 -kill job_201207231123_0011
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2012-07-24 13:38:18,928 Stage-1 map = 0%,  reduce = 0%
2012-07-24 13:38:21,938 Stage-1 map = 100%,  reduce = 0%
2012-07-24 13:39:22,170 Stage-1 map = 100%,  reduce = 0%
2012-07-24 13:39:25,181 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201207231123_0011 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201207231123_0011_m_000002 (and more) from job 
job_201207231123_0011
Exception in thread "Thread-93" java.lang.RuntimeException: Error while reading 
from task log url
    at 
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
    at 
org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
    at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
    at java.lang.Thread.run(Thread.java:680)
Caused by: java.io.IOException: Server returned HTTP response code: 407 for 
URL: 
http://10.203.33.81:50060/tasklog?taskid=attempt_201207231123_0011_r_000000_0&start=-8193
    at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
    at java.net.URL.openStream(URL.java:1010)
    at 
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
    ... 3 more
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   HDFS Read: 24 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec


Please suggest why this error is comming :-(

Regards
Yogesh Kumar



________________________________
 
From: Bejoy KS [bejoy...@yahoo.com]
Sent: Tuesday, July 24, 2012 12:52 PM
To: user@hive.apache.org
Subject: Re: Hive Query



Hi Yogesh

Can you try out this?

select count(*), site.name from sitealias join site on 
(site_alias.site_id=site.site_id) Group By site.name;


Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________
 
From: <yogesh.kuma...@wipro.com> 
Date: Tue, 24 Jul 2012 07:14:25 +0000
To: <user@hive.apache.org>
ReplyTo: user@hive.apache.org 
Subject: Hive Query

Hi all,

I have two tables
1) sitealias
2) site


sitealias contains
-------------------------
id                   site_id
----------------------------
1                        15
2                        12
3                        12
4                        15
---------------------------

site contains

-----------------------------
site_id                        name
-------------------------------
12                        google
13                        wiki    
14                        yahoo    
15                        flipcart
---------------------------------



I am runing a query to perform equi join and to result  how many times same 
site_id repeats and its name and it gets group bi site id.

result of query I want

---------------------------------
site_id                name
---------------------------------
2                        google
2                        flipcart
----------------------------------


I performed 
select sitealias.count(*), site.name from sitealias join site on 
(site_alias.site_id=site.site_id);

it shows error :  Parse Error:  mismatched input '(' expecting FROM near 
'count' in from clause


Please help and suggest a query of this kind of operations.


Thanks & Regards
Yogesh Kumar

Please do not print this email unless it is absolutely necessary. 
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 
WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email. 
www.wipro.com 
Please do not print this email unless it is absolutely necessary. 
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 
WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email. 
www.wipro.com 


Please do not print this email unless it is absolutely necessary. 
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 
WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email. 
www.wipro.com

Re: Hive Query

Reply via email to