Hi Feng, I've seen exactly same problem with one of my queries. There is one reducer hanging forever. I didn't see data skew for that reducer. It has similar amount of REDUCE_INPUT_RECORDS as other reducers. But this number stopped changing any more and just hanging..
Does anybody else know what's happening there? Daniel >From "Feng Yuan" <tomson8...@126.com> Subject In reduce task,i have a join operation ,and i found "org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1" cast much long Date Mon, 10 Apr 2017 06:51:26 GMT The log is : 2017-04-10 01:34:22,375 INFO [main] org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1 2017-04-10 01:36:32,551 INFO [main] ExecReducer: ExecReducer: processing 2000000 rows: used memory = 101789096 2017-04-10 01:37:03,284 INFO [main] org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 1000 rows for join key [4092813312923569] 2017-04-10 01:37:03,286 INFO [main] org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 2000 rows for join key [4092813312923569] 2017-04-10 01:37:03,291 INFO [main] org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 4000 rows for join key [4092813312923569] 2017-04-10 01:37:03,301 INFO [main] org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 8000 rows for join key [4092813312923569] 2017-04-10 01:37:03,379 INFO [main] org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created temp file /data9/hadoop/local/usercache/xx/appcache/application_1482905245692_7777364/container_1482905245692_7777364_01_000330/tmp/hive-rowcontainer5366426093735775537/RowContainer3525630608978801813.tmp 2017-04-10 01:37:04,559 INFO [main] org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1 2017-04-10 07:17:47,584 INFO [main] org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created temp file /data9/hadoop/local/usercache/xx/appcache/application_1482905245692_7777364/container_1482905245692_7777364_01_000330/tmp/hive-rowcontainer8292833982081568523/RowContainer734749216866467280.tmp 2017-04-10 07:17:47,775 INFO [main] org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1 2017-04-10 07:21:57,890 INFO [main] org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created temp file /data9/hadoop/local/usercache/xx/appcache/application_1482905245692_7777364/container_1482905245692_7777364_01_000330/tmp/hive-rowcontainer3072958941479299308/RowContainer1838954978169271208.tmp 2017-04-10 07:21:58,119 INFO [main] org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1 2017-04-10 07:24:07,796 INFO [main] org.apach ========= what i know is there is a join operation,but what did "org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1" mean? is there some data it need to read? from hdfs?More critical why it is so slow? from 2017-04-10 01:37:04 to 2017-04-10 07:17:47