Hi, Please see comments in https://issues.apache.org/jira/browse/MAPREDUCE-1932
On Sat, Jun 15, 2013 at 12:09 PM, 小强 <[email protected]> wrote: > Hi, I found the SkippingRecordReader is no longer supported in the new api > and I am curious about the reason, can anyone tell me. > > Besides, when I look into the old api and try to figure out what skip mode > was doing, I am a little confused about the logic there. > In my comprehension, if java api is used we can always precisely locate > which one is the bad record. > If streaming is used, as long as user can handle the counter correctly (I > mean accumulate the counter for each record in), we can also locate the > exact bad record. (I wonder if I miss something here) > But if user don't care about the counter it's always a disaster for the > framework to locate bad records (even using binary search) > > To sum up: > Ques 1: why skip mode is removed in the new api > Ques 2: if user handle counter correctly in streaming, can we locate the > exact bad record > Ques 3: when in skip mode, why not locate more bad records by restart the > user logic instead of locate one bad record for each task attempt > > Thank you! > > Dasheng Jiang -- Harsh J
