TableInputFormat should handle as much errors as possible
---------------------------------------------------------

                 Key: HBASE-5757
                 URL: https://issues.apache.org/jira/browse/HBASE-5757
             Project: HBase
          Issue Type: Bug
          Components: mapred, mapreduce
    Affects Versions: 0.90.6
            Reporter: Jan Lukavsky


Prior to HBASE-4196 there was different handling of IOExceptions thrown from 
scanner in mapred and mapreduce API. The patch to HBASE-4196 unified this 
handling so that if exception is caught a reconnect is attempted (without 
bothering the mapred client). After that, HBASE-4269 changed this behavior 
back, but in both mapred and mapreduce APIs. The question is, is there any 
reason not to handle all errors that the input format can handle? In other 
words, why not try to reissue the request after *any* IOException? I see the 
following disadvantages of current approach
 * the client may see exceptions like LeaseException and 
ScannerTimeoutException if he fails to process all fetched data in timeout
 * to avoid ScannerTimeoutException the client must raise 
hbase.regionserver.lease.period
 * timeouts for tasks is aready configured in mapred.task.timeout, so this 
seems to me a bit redundant, because typically one needs to update both these 
parameters
 * I don't see any possibility to get rid of LeaseException (this is configured 
on server side)

I think all of these issues would be gone, if the DoNotRetryIOException would 
not be rethrown. On the other hand, handling errors in InputFormat has 
disadvantage, that it may hide from the user some inefficiency. Eg. if I have 
very big scanner.caching, and I manage to process only a few rows in timeout, I 
will end up with single row being fetched many times (and will not be 
explicitly notified about this). Could we solve this problem by adding some 
counter to the InputFormat?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to