Thanks Vladimir. I am using option 2 as a short term fix for now. I will definitely look into key design.
Regards, Arun. > On Jun 6, 2015, at 3:18 PM, Vladimir Rodionov <[email protected]> wrote: > > The scanner fails at the very beginning. The reason is because they need a > very few rows from a large file and HBase needs > to fill RPC buffer (which is 100 rows, yes?) before it can return first > batch. This takes more than 60 sec and scanner fails (do not ask me why its > not the timeout exception) > > 1. HBASE-13090 will help (can be back ported I presume to 1.0 and 0.98.x) > 2. Smaller region size will help > 3. Smaller hbase.client.scanner.caching will help > 4. Larger hbase.client.scanner.timeout.period will help > 5. Better data store design (rowkeys) is preferred. > > Too many options to choose from. > > -Vlad > > >> On Sat, Jun 6, 2015 at 3:04 PM, Arun Mishra <[email protected]> wrote: >> >> Thanks TED. >> >> Regards, >> Arun. >> >>> On Jun 6, 2015, at 2:34 PM, Ted Yu <[email protected]> wrote: >>> >>> HBASE-13090 'Progress heartbeats for long running scanners' solves the >>> problem you faced. >>> >>> It is in the 1.1.0 release. >>> >>> FYI >>> >>>> On Sat, Jun 6, 2015 at 12:54 PM, Arun Mishra <[email protected]> wrote: >>>> >>>> Hello, >>>> >>>> I have a query on OutOfOrderScannerNextException. I am using hbase >> 0.98.6 >>>> with 45 nodes. >>>> >>>> I have a mapreduce job which scan 1 table for last 1 day worth data >> using >>>> timerange. It has been running fine for months without any failure. But >>>> last couple of days it has been failing with below exception. I have >> traced >>>> the failure to a single region. This region has 1 store and 1 hfile of >>>> 5+GB. What we realized was that, we were writing some bulk data, which >> used >>>> to land on this region. After we stopped writing this data, this region >> has >>>> been receiving very few writes per day. >>>> >>>> When mapreduce job runs, it creates a map task for this region and that >>>> task fails with OutOfOrderScannerNextException. I was able to reproduce >>>> this error by running a scan command with same start/stop row and >> timerange >>>> option. Finally, we split this region to be small enough for scan >> command >>>> to work. >>>> >>>> My query is if there is any option, apart from increasing the timeout, >>>> which can solve this use case? I am thinking of a use case where data >> comes >>>> in for 3 days a week in bulk and then nothing for next 3 days. Kind of >>>> creating a data hole in region. >>>> My understanding is that I am hit with this error because I have big >> store >>>> files and timerange scan is reading entire file even though it contains >>>> very few rowkeys for that timerange. >>>> >>>> hbase.client.scanner.caching = 100 >>>> hbase.client.scanner.timeout.period = 60s >>>> >>>> scan 'dummytable',{ STARTROW=>'dummyrowkey-start', >>>> STOPROW=>'dummyrowkey-end', LIMIT=>1000, >>>> TIMERANGE=>[1433462400000,1433548800000]} >>>> ROW COLUMN+CELL >>>> >>>> ERROR: >> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: >>>> Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; >>>> request=scanner_id: 33648 number_of_rows: 100 close_scanner: false >>>> next_call_seq: 0 >>>> at >> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3193) >>>> at >> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:29587) >>>> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2031) >>>> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108) >>>> at >> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:114) >>>> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:94) >>>> at java.lang.Thread.run(Thread.java:745) >>>> >>>> >>>> Regards, >>>> Arun >>
