First of , I want that cluster specs :-) I am gong to propose a few theories without proof here
* From the info u have provided, there are no partitions….that might be an issue creating this problem * Is it possible to partition the data based on say days (to start with) and run this Left Outer Join on one day of data * This specific error is saying the JOIN if failing for that one row – tough to say from here which row is problematic * Have u seen the MapReduce / YARN logs to see if they have more information * If u r using YARN then on HDFS there is a kind of dir on HDFS that looks like this /var/log/hadoop-yarn/….and u will have applicationxxxXXXX directory and all logs for that job under this In summary , if u have not already done it , u should run this query on a smaller subset and creating partitions by say date might be beneficial for this volume of data Thanks Warm Regards Sanjay linkedin:http://www.linkedin.com/in/subramaniansanjay From: Siddharth Tiwari <siddharth.tiw...@live.com<mailto:siddharth.tiw...@live.com>> Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>> Date: Sunday, March 2, 2014 at 8:48 PM To: hive user list <user@hive.apache.org<mailto:user@hive.apache.org>>, "sanjaysubraman...@yahoo.com<mailto:sanjaysubraman...@yahoo.com>" <sanjaysubraman...@yahoo.com<mailto:sanjaysubraman...@yahoo.com>> Subject: RE: Query hangs at 99.97 % for one reducer in Hive Hi Sanjay, here is teh detail Even 500 reducers sounds a high number but I don't know the deatils of your cluster. Can u provide some details How many nodes in cluster : 21 Nodes Hive version : Hive-0.10.x Which distribution (Hortonworks, Apache, CDH, Amazon): CDH 4.3 Node specs: Each node 64 cores ( with HT), 128 GB RAM, 3*10 TB disks Partitions in the table: none Number of records: around 17 Billion Also, I see following exception thrown by hanging task, I have no idea what does this mean java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) {"key":{"joinkey0":""},"value":{"_col2":"92","_col11":"-60-01-21,00","_col12":"-03-07-04,00"},"alias":1} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:270) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=1) {"key":{"joinkey0":""},"value":{"_col2":"92","_col11":"-60-01-21,00","_col12":"-03-07-04,00"},"alias":1} at org.apache.hadoop.hive.ql.exec.ExecRedu *------------------------* Cheers !!! SiddharthTiwari Have a refreshing day !!! "Every duty is holy, and devotion to duty is the highest form of worship of God.” "Maybe other people will try to limit me but I don't limit myself" ________________________________ From: siddharth.tiw...@live.com<mailto:siddharth.tiw...@live.com> To: u...@hadoop.apache.org<mailto:u...@hadoop.apache.org> Subject: FW: Query hangs at 99.97 % for one reducer in Hive Date: Sun, 2 Mar 2014 23:12:47 +0000 Forwarding message to hadoop list as well for any help. Appreciate any help *------------------------* Cheers !!! SiddharthTiwari Have a refreshing day !!! "Every duty is holy, and devotion to duty is the highest form of worship of God.” "Maybe other people will try to limit me but I don't limit myself" ________________________________ From: siddharth.tiw...@live.com<mailto:siddharth.tiw...@live.com> To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Query hangs at 99.97 % for one reducer in Hive Date: Sun, 2 Mar 2014 23:09:25 +0000 Hi team, following query hangs at 99.97% for one reducer, kindly help or point to what can be cause drop table if exists sample.dpi_short_lt; create table sample.dpi_short_lt as select b.msisdn, a.area_erb, a.longitude, a.latitude, substring(b.msisdn,1,2) as country, substring(b.msisdn,3,2) as area_code, substring(b.start_time,1,4) as year, substring(b.start_time,6,2) as month, substring(b.start_time,9,2) as day, substring(b.start_time,12,2) as hour, cast(b.procedure_duration as double) as duracao_ms, cast(b.internet_latency as double) as int_internet_latency, cast(b.ran_latency as double) as int_ran_latency, cast(b.http_latency as double) as int_http_latency, (case when b.internet_latency='' then 1 else 0 end) as internet_latency_missing, (case when b.ran_latency='' then 1 else 0 end) as ran_latency_missing, (case when b.http_latency='' then 1 else 0 end) as http_latency_missing, (cast(b.mean_throughput_ul as int) * cast( procedure_duration as int) / 1000) as total_up_bytes, (cast(b.mean_throughput_dl as int) * cast(procedure_duration as int) / 1000) as total_dl_bytes, cast(b.missing_packets_ul as int) as int_missing_packets_ul, cast(b.missing_packets_dl as int) as int_missing_packets_dl from sample.dpi_large b left outer join sample.science_new a on b.cgi = regexp_replace(a.codigo_cgi_ecgi,'-','') where msisdn!=''; Hive was heuristically selecting 1000 reducers and it was hanging at 99.97 percent on one reduce task. I then changed the above values to 3GB per reducer and 500 reducers and started hitting this error. java.lang.RuntimeException: Hive Runtime Error while closing operators: Unable to rename output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_task_tmp.-ext-10001/_tmp.000003_0 to: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_8390586541316719852-1/_tmp.-ext-10001/000003_0 at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:313) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:516) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rename output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812 I have 22 node cluster running cdh 4.3. Please try to locate what can be teh issue. *------------------------* Cheers !!! SiddharthTiwari Have a refreshing day !!! "Every duty is holy, and devotion to duty is the highest form of worship of God.” "Maybe other people will try to limit me but I don't limit myself"