Hi, I found the SkippingRecordReader is no longer supported in the new api and 
I am curious about the reason, can anyone tell me.


Besides, when I look into the old api and try to figure out what skip mode was 
doing, I am a little confused about the logic there.
In my comprehension, if java api is used we can always precisely locate which 
one is the bad record. 
If streaming is used, as long as user can handle the counter correctly (I mean 
accumulate the counter for each record in), we can also locate the exact bad 
record. (I wonder if I miss something here)
But if user don't care about the counter it's always a disaster for the 
framework to locate bad records (even using binary search)


To sum up:
Ques 1:  why skip mode is removed in the new api
Ques 2:  if user handle counter correctly in streaming, can we locate the exact 
bad record
Ques 3:  when in skip mode, why not locate more bad records by restart the user 
logic instead of locate one bad record for each task attempt


Thank you!


Dasheng Jiang

Reply via email to