I agree it needs some clarification, since that stuff evolved in disparate ways. Historically UnknownScannerException has been fatal and wasn't recovered from. Right now, the client will recover only if the timeout hasn't expired (so you get this only when the region moves or it took more than 60 seconds to call next). On top of that, TableRecordReaderImpl will recover even if there's a timeout by restarting a new scanner. The DoNotRetryIOException is only a way for HBase to tell the HBase client that it shouldn't retry in the normal retry code inside HConnectionManager, it's not a way to tell the actual user that he shouldn't create a new scanner and retry.
Thus, the way I understand it, the fact that TRRI recovers from USE is a design choice the same way someone using Scan in his code could decide to retry scanning with a new scanner upon getting that error. I like the way it currently works because if USE comes out of the ResultScanner, it means that it took more than 60 seconds to process one next() invocation so something is wrong (but the user can ignore it like TRRI does). That said, the exception should be printed as a WARN in the region server log and probably shouldn't care printing a stack trace. J-D On Fri, Sep 17, 2010 at 10:21 AM, Ted Yu <[email protected]> wrote: > J-D: > public class UnknownScannerException extends DoNotRetryIOException { > > When e (IOException) below was an UnknownScannerException, the code would > try to restart. > > I have two questions: > 1. what contract should recipient of DoNotRetryIOException follow ? On the > surface the way TableRecordReaderImpl handles potential > UnknownScannerException is a little confusing. > 2. can the logic below be applied to HRegionServer.next() ? > > Regards > > On Thu, Sep 16, 2010 at 10:02 AM, Jean-Daniel Cryans > <[email protected]>wrote: > >> Well that error will be treated as fatal if it really took you more >> than 60 seconds to do a next() invocation, but if you are using TIF >> then the TableRecordReaderImpl will do the following: >> >> try { >> value = this.scanner.next(); >> } catch (IOException e) { >> LOG.debug("recovered from " + StringUtils.stringifyException(e)); >> restart(lastRow); >> scanner.next(); // skip presumed already mapped row >> value = scanner.next(); >> } >> >> So your mapper will recover. Do you see that message in your mappers' logs? >> >> J-D >> >> On Thu, Sep 16, 2010 at 9:57 AM, Ken Weiner <[email protected]> wrote: >> > Hi J-D, >> > >> > Yes, I do see this INFO message on the same region server just a few >> seconds >> > before the UnknownScannerException: >> > >> > >> > 2010-09-16 09:20:49,953 INFO >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner >> > -8711007779313115048 lease expired >> > >> > >> > I'm unclear about whether this is something I should try to address. Is >> > this condition impacting the performance of the job and I should consider >> > increasing the scanner lease time? If this is a completely normal >> behavior >> > of HBase, is it reasonable to change the log level of that >> > UnknownScannerException in the code from ERROR to WARN or INFO? >> > >> > Thanks. >> > >> > -Ken >> > >> > On Thu, Sep 16, 2010 at 9:38 AM, Jean-Daniel Cryans <[email protected] >> >wrote: >> > >> >> It's usually because your scanner timed out, or because the region >> >> moved to a new server. You can see how it's handled in >> >> HTable.ClientScanner.next(). In any case you should see a message like >> >> "Scanner -8711007779313115048 lease expired" in some region server, >> >> then see when you get the exception. Check if it's the same region >> >> server, and the time between both. >> >> >> >> J-D >> >> >> >> On Thu, Sep 16, 2010 at 9:29 AM, Ken Weiner <[email protected]> wrote: >> >> > Every time we run a map reduce job against data in HBase, we see >> >> > hundreds of UnknownScannerExceptions in the hbase log at ERROR level. >> >> > The job seems to complete fine and there are no other errors. Should >> >> > I be concerned with these UnknownScannerExceptions? Is this message >> >> > really more of a warning than an error? >> >> > >> >> > >> >> > >> >> > Example: >> >> > >> >> > >> >> > >> >> > 2010-09-16 09:20:52,398 ERROR >> >> > org.apache.hadoop.hbase.regionserver.HRegionServer: >> >> > org.apache.hadoop.hbase.UnknownScannerException: Name: >> >> -8711007779313115048 >> >> > at >> >> >> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1880) >> >> > at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) >> >> > at >> >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> >> > at java.lang.reflect.Method.invoke(Method.java:597) >> >> > at >> >> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) >> >> > at >> >> >> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) >> >> > >> >> >> > >> >
