Hmm...
Does something like the below help?
diff --git
a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
index f9627ed..0cee8e3 100644
--- a/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
+++ b/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
@@ -2137,11 +2137,7 @@ public class HRegionServer implements HRegionInterface,
HBaseRPCErrorHandler,
}
throw e;
}
- Leases.Lease lease = null;
try {
- // Remove lease while its being processed in server; protects against
case
- // where processing of request takes > lease expiration time.
- lease = this.leases.removeLease(scannerName);
List<Result> results = new ArrayList<Result>(nbRows);
long currentScanResultSize = 0;
List<KeyValue> values = new ArrayList<KeyValue>();
@@ -2197,10 +2193,9 @@ public class HRegionServer implements HRegionInterface,
HBaseRPCErrorHandler,
}
throw convertThrowableToIOE(cleanup(t));
} finally {
- // We're done. On way out readd the above removed lease. Adding resets
- // expiration time on lease.
+ // We're done. On way out reset expiration time on lease.
if (this.scanners.containsKey(scannerName)) {
- if (lease != null) this.leases.addLease(lease);
+ this.leases.renewLease(scannerName);
}
}
}
Best regards,
- Andy
Problems worthy of attack prove their worth by hitting back. - Piet Hein (via
Tom White)
----- Original Message -----
> From: Jean-Daniel Cryans <[email protected]>
> To: [email protected]
> Cc:
> Sent: Wednesday, February 15, 2012 10:17 AM
> Subject: Re: LeaseException while extracting data via pig/hbase integration
>
> You would have to grep the lease's id, in your first email it was
> "-7220618182832784549".
>
> About the time it takes to process each row, I meant client (pig) side
> not in the RS.
>
> J-D
>
> On Tue, Feb 14, 2012 at 1:33 PM, Mikael Sitruk <[email protected]>
> wrote:
>> Please see answer inline
>> Thanks
>> Mikael.S
>>
>> On Tue, Feb 14, 2012 at 8:30 PM, Jean-Daniel Cryans
> <[email protected]>wrote:
>>
>>> On Tue, Feb 14, 2012 at 2:01 AM, Mikael Sitruk
> <[email protected]>
>>> wrote:
>>> > hi,
>>> > Well no, i can't figure out what is the problem, but i saw
> that someone
>>> > else had the same problem (see email: "LeaseException despite
> high
>>> > hbase.regionserver.lease.period")
>>> > What can i tell is the following:
>>> > Last week the problem was consistent
>>> > 1. I updated hbase.regionserver.lease.period=300000 (5 mins),
> restarted
>>> the
>>> > cluster and still got the problem, the map got this exception
> event
>>> before
>>> > the 5 mins, (some after 1 min and 20 sec)
>>>
>>> That's extremely suspicious. Are you sure the setting is getting
> picked
>>> up? :) I hope so :-)
>>>
>>> You should be able to tell when the lease really expires by simply
>>> grepping for the number in the region server log, it should give you a
>>> good idea of what your lease period is.
>>> greeping on which value? the lease configured here:300000? It does not
>>> return anything, also tried in current execution where some were ok and
>>> some were not
>>>
>>> 2. The problem occurs only on job that will extract a large number of
>>> > columns (>150 cols per row)
>>>
>>> What's your scanner caching set to? Are you spending a lot of time
>>> processing each row? from the job configuration generated by pig i can
> see
>>> caching set to 1, regarding the processing time of each row i have no
> clue
>>> how many time it spent. the data for each row is 150 columns of 2k
> each.
>>> This is approx 5 block to bring.
>>>
>>> > 3. The problem never occurred when only 1 map per server is
> running (i
>>> have
>>> > 8 CPU with hyper-threaded enabled = 16, so using only 1 map per
> machine
>>> is
>>> > just a waste), (at this stage I was thinking perhaps there is a
>>> > multi-threaded problem)
>>>
>>> More mappers would pull more data from the region servers so more
>>> concurrency from the disks, using more mappers might just slow you
>>> down enough that you hit the issue.
>>>
>> Today i ran with 8 mappers and some failed and some didn't (2 of 4),
> they
>> got the lease exception after 5 mins, i will try to check the
>> logs/sar/metric files for additional info
>>
>>>
>>> >
>>> > This week i got a sightly different behavior, after having
> restarted the
>>> > servers. The extract were able to ran ok in most of the runs even
> with 4
>>> > maps running (per servers), i got only once the exception but the
> job was
>>> > not killed as other runs last week
>>>
>>> If the client got an UnknownScannerException before the timeout
>>> expires (the client also keeps track of it, although it may have a
>>> different configuration), it will recreate the scanner.
>>>
>> No this is not the case.
>>
>>>
>>> Which reminds me, are your regions moving around? If so, and your
>>> clients don't know about the high timeout, then they might let the
>>> exception pass on to your own code.
>>>
>> Region are presplited ahead, i do not have any region split during the run,
>> region size is set of 8GB, storefile is around 3.5G
>>
>> The test was run after major compaction, so the number of store file is 1
>> per RS/family
>>
>>
>>>
>>> J-D
>>>
>>
>>
>>
>> --
>> Mikael.S
>