Thanks J-D. I filed https://issues.apache.org/jira/browse/HBASE-3014 to suggest changing the log level to WARN.
On Fri, Sep 17, 2010 at 10:32 AM, Jean-Daniel Cryans <[email protected]>wrote: > I agree it needs some clarification, since that stuff evolved in > disparate ways. Historically UnknownScannerException has been fatal > and wasn't recovered from. Right now, the client will recover only if > the timeout hasn't expired (so you get this only when the region moves > or it took more than 60 seconds to call next). On top of that, > TableRecordReaderImpl will recover even if there's a timeout by > restarting a new scanner. The DoNotRetryIOException is only a way for > HBase to tell the HBase client that it shouldn't retry in the normal > retry code inside HConnectionManager, it's not a way to tell the > actual user that he shouldn't create a new scanner and retry. > > Thus, the way I understand it, the fact that TRRI recovers from USE is > a design choice the same way someone using Scan in his code could > decide to retry scanning with a new scanner upon getting that error. I > like the way it currently works because if USE comes out of the > ResultScanner, it means that it took more than 60 seconds to process > one next() invocation so something is wrong (but the user can ignore > it like TRRI does). > > That said, the exception should be printed as a WARN in the region > server log and probably shouldn't care printing a stack trace. > > J-D > > On Fri, Sep 17, 2010 at 10:21 AM, Ted Yu <[email protected]> wrote: > > J-D: > > public class UnknownScannerException extends DoNotRetryIOException { > > > > When e (IOException) below was an UnknownScannerException, the code would > > try to restart. > > > > I have two questions: > > 1. what contract should recipient of DoNotRetryIOException follow ? On > the > > surface the way TableRecordReaderImpl handles potential > > UnknownScannerException is a little confusing. > > 2. can the logic below be applied to HRegionServer.next() ? > > > > Regards > > > > On Thu, Sep 16, 2010 at 10:02 AM, Jean-Daniel Cryans < > [email protected]>wrote: > > > >> Well that error will be treated as fatal if it really took you more > >> than 60 seconds to do a next() invocation, but if you are using TIF > >> then the TableRecordReaderImpl will do the following: > >> > >> try { > >> value = this.scanner.next(); > >> } catch (IOException e) { > >> LOG.debug("recovered from " + StringUtils.stringifyException(e)); > >> restart(lastRow); > >> scanner.next(); // skip presumed already mapped row > >> value = scanner.next(); > >> } > >> > >> So your mapper will recover. Do you see that message in your mappers' > logs? > >> > >> J-D > >> > >> On Thu, Sep 16, 2010 at 9:57 AM, Ken Weiner <[email protected]> wrote: > >> > Hi J-D, > >> > > >> > Yes, I do see this INFO message on the same region server just a few > >> seconds > >> > before the UnknownScannerException: > >> > > >> > > >> > 2010-09-16 09:20:49,953 INFO > >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner > >> > -8711007779313115048 lease expired > >> > > >> > > >> > I'm unclear about whether this is something I should try to address. > Is > >> > this condition impacting the performance of the job and I should > consider > >> > increasing the scanner lease time? If this is a completely normal > >> behavior > >> > of HBase, is it reasonable to change the log level of that > >> > UnknownScannerException in the code from ERROR to WARN or INFO? > >> > > >> > Thanks. > >> > > >> > -Ken > >> > > >> > On Thu, Sep 16, 2010 at 9:38 AM, Jean-Daniel Cryans < > [email protected] > >> >wrote: > >> > > >> >> It's usually because your scanner timed out, or because the region > >> >> moved to a new server. You can see how it's handled in > >> >> HTable.ClientScanner.next(). In any case you should see a message > like > >> >> "Scanner -8711007779313115048 lease expired" in some region server, > >> >> then see when you get the exception. Check if it's the same region > >> >> server, and the time between both. > >> >> > >> >> J-D > >> >> > >> >> On Thu, Sep 16, 2010 at 9:29 AM, Ken Weiner <[email protected]> wrote: > >> >> > Every time we run a map reduce job against data in HBase, we see > >> >> > hundreds of UnknownScannerExceptions in the hbase log at ERROR > level. > >> >> > The job seems to complete fine and there are no other errors. > Should > >> >> > I be concerned with these UnknownScannerExceptions? Is this > message > >> >> > really more of a warning than an error? > >> >> > > >> >> > > >> >> > > >> >> > Example: > >> >> > > >> >> > > >> >> > > >> >> > 2010-09-16 09:20:52,398 ERROR > >> >> > org.apache.hadoop.hbase.regionserver.HRegionServer: > >> >> > org.apache.hadoop.hbase.UnknownScannerException: Name: > >> >> -8711007779313115048 > >> >> > at > >> >> > >> > org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1880) > >> >> > at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown > Source) > >> >> > at > >> >> > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >> >> > at java.lang.reflect.Method.invoke(Method.java:597) > >> >> > at > >> >> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657) > >> >> > at > >> >> > >> > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915) > >> >> > > >> >> > >> > > >> > > >
