Re: UnknownScannerException a problem?

Ken Weiner Fri, 17 Sep 2010 17:33:39 -0700

Thanks J-D.  I filed https://issues.apache.org/jira/browse/HBASE-3014 to
suggest changing the log level to WARN.


On Fri, Sep 17, 2010 at 10:32 AM, Jean-Daniel Cryans <[email protected]>wrote:

> I agree it needs some clarification, since that stuff evolved in
> disparate ways. Historically UnknownScannerException has been fatal
> and wasn't recovered from. Right now, the client will recover only if
> the timeout hasn't expired (so you get this only when the region moves
> or it took more than 60 seconds to call next). On top of that,
> TableRecordReaderImpl will recover even if there's a timeout by
> restarting a new scanner. The DoNotRetryIOException is only a way for
> HBase to tell the HBase client that it shouldn't retry in the normal
> retry code inside HConnectionManager, it's not a way to tell the
> actual user that he shouldn't create a new scanner and retry.
>
> Thus, the way I understand it, the fact that TRRI recovers from USE is
> a design choice the same way someone using Scan in his code could
> decide to retry scanning with a new scanner upon getting that error. I
> like the way it currently works because if USE comes out of the
> ResultScanner, it means that it took more than 60 seconds to process
> one next() invocation so something is wrong (but the user can ignore
> it like TRRI does).
>
> That said, the exception should be printed as a WARN in the region
> server log and probably shouldn't care printing a stack trace.
>
> J-D
>
> On Fri, Sep 17, 2010 at 10:21 AM, Ted Yu <[email protected]> wrote:
> > J-D:
> > public class UnknownScannerException extends DoNotRetryIOException {
> >
> > When e (IOException) below was an UnknownScannerException, the code would
> > try to restart.
> >
> > I have two questions:
> > 1. what contract should recipient of DoNotRetryIOException follow ? On
> the
> > surface the way TableRecordReaderImpl handles potential
> > UnknownScannerException is a little confusing.
> > 2. can the logic below be applied to HRegionServer.next() ?
> >
> > Regards
> >
> > On Thu, Sep 16, 2010 at 10:02 AM, Jean-Daniel Cryans <
> [email protected]>wrote:
> >
> >> Well that error will be treated as fatal if it really took you more
> >> than 60 seconds to do a next() invocation, but if you are using TIF
> >> then the TableRecordReaderImpl will do the following:
> >>
> >> try {
> >>      value = this.scanner.next();
> >>    } catch (IOException e) {
> >>      LOG.debug("recovered from " + StringUtils.stringifyException(e));
> >>      restart(lastRow);
> >>      scanner.next();    // skip presumed already mapped row
> >>      value = scanner.next();
> >>    }
> >>
> >> So your mapper will recover. Do you see that message in your mappers'
> logs?
> >>
> >> J-D
> >>
> >> On Thu, Sep 16, 2010 at 9:57 AM, Ken Weiner <[email protected]> wrote:
> >> > Hi J-D,
> >> >
> >> > Yes, I do see this INFO message on the same region server just a few
> >> seconds
> >> > before the UnknownScannerException:
> >> >
> >> >
> >> > 2010-09-16 09:20:49,953 INFO
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer: Scanner
> >> > -8711007779313115048 lease expired
> >> >
> >> >
> >> > I'm unclear about whether this is something I should try to address.
>  Is
> >> > this condition impacting the performance of the job and I should
> consider
> >> > increasing the scanner lease time?  If this is a completely normal
> >> behavior
> >> > of HBase, is it reasonable to change the log level of that
> >> > UnknownScannerException in the code from ERROR to WARN or INFO?
> >> >
> >> > Thanks.
> >> >
> >> > -Ken
> >> >
> >> > On Thu, Sep 16, 2010 at 9:38 AM, Jean-Daniel Cryans <
> [email protected]
> >> >wrote:
> >> >
> >> >> It's usually because your scanner timed out, or because the region
> >> >> moved to a new server. You can see how it's handled in
> >> >> HTable.ClientScanner.next(). In any case you should see a message
> like
> >> >> "Scanner -8711007779313115048 lease expired" in some region server,
> >> >> then see when you get the exception. Check if it's the same region
> >> >> server, and the time between both.
> >> >>
> >> >> J-D
> >> >>
> >> >> On Thu, Sep 16, 2010 at 9:29 AM, Ken Weiner <[email protected]> wrote:
> >> >> > Every time we run a map reduce job against data in HBase, we see
> >> >> > hundreds of UnknownScannerExceptions in the hbase log at ERROR
> level.
> >> >> > The job seems to complete fine and there are no other errors.
>  Should
> >> >> > I be concerned with these UnknownScannerExceptions?  Is this
> message
> >> >> > really more of a warning than an error?
> >> >> >
> >> >> >
> >> >> >
> >> >> > Example:
> >> >> >
> >> >> >
> >> >> >
> >> >> > 2010-09-16 09:20:52,398 ERROR
> >> >> > org.apache.hadoop.hbase.regionserver.HRegionServer:
> >> >> > org.apache.hadoop.hbase.UnknownScannerException: Name:
> >> >> -8711007779313115048
> >> >> >        at
> >> >>
> >>
> org.apache.hadoop.hbase.regionserver.HRegionServer.next(HRegionServer.java:1880)
> >> >> >        at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown
> Source)
> >> >> >        at
> >> >>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >> >> >        at java.lang.reflect.Method.invoke(Method.java:597)
> >> >> >        at
> >> >> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:657)
> >> >> >        at
> >> >>
> >>
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: UnknownScannerException a problem?

Reply via email to