Re: In reduce task,i have a join operation ,and i found "org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1" cast much long

2017-10-20 Thread Daniel Bruce
Moreover, the IP 10.224.174.71 is a different node, not the one executing the reduce task. Why did that happen? On Fri, Oct 20, 2017 at 3:37 PM, Daniel Bruce wrote: > OK, more updates. Today I was running the query with Yarn and also turned > on DEBUG logging. Here's what I

Re: In reduce task,i have a join operation ,and i found "org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1" cast much long

2017-10-20 Thread Daniel Bruce
OK, more updates. Today I was running the query with Yarn and also turned on DEBUG logging. Here's what I found from the task log for the dangling task. I noticed that after the RowContainer has been created by Hive, there are a lot of IPC/RPC related logs (still printing), every 3 second apart.

Re: In reduce task,i have a join operation ,and i found "org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1" cast much long

2017-10-20 Thread Daniel Bruce
Hi Gopal, Thanks for your input! In my case I'm using MapReduce not Tez. I figured I'd better be more specific so as to provide you more details. For this job there are 298 maps and 74 reduces. All the maps completed real fast within 1 minute, and 73 reduces completed in about 2 minutes. Now

Re: In reduce task,i have a join operation ,and i found "org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1" cast much long

2017-10-19 Thread Gopal Vijayaraghavan
> . I didn't see data skew for that reducer. It has similar amount of > REDUCE_INPUT_RECORDS as other reducers. … > org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 8000 rows for > join key [4092813312923569] The ratio of REDUCE_INPUT_RECORDS and REDUCE_INPUT_GROUPS is what is

Re: In reduce task,i have a join operation ,and i found "org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1" cast much long

2017-10-19 Thread Daniel Bruce
Hi Feng, I've seen exactly same problem with one of my queries. There is one reducer hanging forever. I didn't see data skew for that reducer. It has similar amount of REDUCE_INPUT_RECORDS as other reducers. But this number stopped changing any more and just hanging.. Does anybody else know

In reduce task,i have a join operation ,and i found "org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1" cast much long

2017-04-10 Thread Feng Yuan
The log is : 2017-04-10 01:34:22,375 INFO [main] org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1 2017-04-10 01:36:32,551 INFO [main] ExecReducer: ExecReducer: processing 200 rows: used memory = 101789096 2017-04-10 01:37:03,284 INFO [main]