Re: ScannerTimeoutException during MapReduce

Jan Lukavský Fri, 12 Aug 2011 02:57:10 -0700

Hi J-D,

thanks for reply.


On 11.8.2011 23:37, Jean-Daniel Cryans wrote:

Usual reasons would be like a mix of taking a long time to process
rows in the mapper and scanners that grab a lot of rows (using scanner
caching and maybe filters).

Do you enable DEBUG for HBase in your mapreduce context? This would
give relevant information like if the client was doing lots of retries
trying to start scanning.


I'll enable it and try to get more info. Thanks.

About the bug, can you open a jira?

Sure.
https://issues.apache.org/jira/browse/HBASE-4196

J-D

On Thu, Aug 11, 2011 at 6:58 AM, Jan Lukavský
<[email protected]>  wrote:

Hi,

we've recently moved to HBase 0.90.3 (cdh3u1) from 0.20.6, which resolved
most of our previous issues, but we are now having much more
ScannerTimeoutExceptions than before. All these exceptions come from trace
like this

org.apache.hadoop.hbase.client.ScannerTimeoutException: 307127ms passed
since the last invocation, timeout is currently set to 60000
        at
org.apache.hadoop.hbase.client.HTable$ClientScanner.next(HTable.java:1133)
        at
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:143)
        at
org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:142)
        at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456)
        at
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)


After a bit of investigation, I suppose that cause of this is that the first
call to scanner.next() after HTable.getScanner() times out. What could be
the cause of this? I see neither any region moving around in the cluster nor
any compation on the side of the regionserver. As long as I can tell
everything looks just fine. This would suggest, that it took too long to
locate the regionserver in call to HTable.getScanner(), but I cannot see any
reason. Could this issue be resolved on the side of TableRecordReader? Eg.
at TableRecordReaderImpl.java:143 the ScannerTimeoutException could be
caught and the scanner restarted a couple more times (say configurable?).

After looking at the code it also seems to me, that there may be a bug
causing the reader to skip the first row of region. The scenario is as
follows:
  - the reader is initialized with TableRecordReader.init()
  - then nextKeyValue is called, causing call to scanner.next() - here
ScannerTimeoutException occurs
  - the scanner is restarted by call to restart() and then *two* calls to
scanner.next() occur, causing we have lost the first row

Can anyone confirm this?

Thanks,
  Jan



--

Jan Lukavský
programátor
Seznam.cz, a.s.
Radlická 608/2
15000, Praha 5

[email protected]
http://www.seznam.cz

Re: ScannerTimeoutException during MapReduce

Reply via email to