Re: How Long Will HBase Hold A Row Write Lock?

2018-03-11 Thread Saad Mufti
Thanks. I left a comment on that ticket.


Saad


On Sat, Mar 10, 2018 at 11:57 PM, Anoop John  wrote:

> Hi Saad
>In your initial mail you mentioned that there are lots
> of checkAndPut ops but on different rows. The failure in obtaining
> locks (write lock as it is checkAndPut) means there is contention on
> the same row key.  If that is the case , ya that is the 1st step
> before BC reads and it make sense.
>
> On the Q on why not caching the compacted file content, yes it is this
> way. Even if cache on write is true. This is because some times the
> compacted result file could be so large (what is major compaction) and
> that will exhaust the BC if written. Also it might contain some data
> which are very old.  There is a jira recently raised jira which
> discuss abt this.  Pls see HBASE-20045
>
>
> -Anoop-
>
> On Sun, Mar 11, 2018 at 7:57 AM, Saad Mufti  wrote:
> > Although now that I think about this a bit more, all the failures we saw
> > were failure to obtain a row lock, and in the thread stack traces we
> always
> > saw it somewhere inside getRowLockInternal and similar. Never saw any
> > contention on bucket cache lock that I could see.
> >
> > Cheers.
> >
> > 
> > Saad
> >
> >
> > On Sat, Mar 10, 2018 at 8:04 PM, Saad Mufti 
> wrote:
> >
> >> Also, for now we have mitigated this problem by using the new setting in
> >> HBase 1.4.0 that prevents one slow region server from blocking all
> client
> >> requests. Of course it causes some timeouts but our overall ecosystem
> >> contains Kafka queues for retries, so we can live with that. From what I
> >> can see, it looks like this setting also has the good effect of
> preventing
> >> clients from hammering a region server that is slow because its IPC
> queues
> >> are backed up, allowing it to recover faster.
> >>
> >> Does that make sense?
> >>
> >> Cheers.
> >>
> >> 
> >> Saad
> >>
> >>
> >> On Sat, Mar 10, 2018 at 7:04 PM, Saad Mufti 
> wrote:
> >>
> >>> So if I understand correctly, we would mitigate the problem by not
> >>> evicting blocks for archived files immediately? Wouldn't this
> potentially
> >>> lead to problems later if the LRU algo chooses to evict blocks for
> active
> >>> files and leave blocks for archived files in there?
> >>>
> >>> I would definitely love to test this!!! Unfortunately we are running on
> >>> EMR and the details of how to patch HBase under EMR are not clear to
> me :-(
> >>>
> >>> What we would really love would be a setting for actually immediately
> >>> caching blocks for a new compacted file. I have seen in the code that
> even
> >>> is we have the cache on write setting set to true, it will refuse to
> cache
> >>> blocks for a file that is a newly compacted one. In our case we have
> sized
> >>> the bucket cache to be big enough to hold all our data, and really
> want to
> >>> avoid having to go to S3 until the last possible moment. A config
> setting
> >>> to test this would be great.
> >>>
> >>> But thanks everyone for your feedback. Any more would also be welcome
> on
> >>> the idea to let a user cache all newly compacted files.
> >>>
> >>> 
> >>> Saad
> >>>
> >>>
> >>> On Wed, Mar 7, 2018 at 12:00 AM, Anoop John 
> >>> wrote:
> >>>
>  >>a) it was indeed one of the regions that was being compacted, major
>  compaction in one case, minor compaction in another, the issue started
>  just
>  after compaction completed blowing away bucket cached blocks for the
>  older
>  HFile's
> 
>  About this part.Ya after the compaction, there is a step where the
>  compacted away HFile's blocks getting removed from cache. This op
> takes a
>  write lock for this region (In Bucket Cache layer)..  Every read op
> which
>  is part of checkAndPut will try read from BC and that in turn need a
> read
>  lock for this region.  So there is chances that the read locks starve
>  because of so many frequent write locks .  Each block evict will
> attain
>  the
>  write lock one after other.  Will it be possible for you to patch this
>  evict and test once? We can avoid the immediate evict from BC after
>  compaction. I can help you with a patch if you wish
> 
>  Anoop
> 
> 
> 
>  On Mon, Mar 5, 2018 at 11:07 AM, ramkrishna vasudevan <
>  ramkrishna.s.vasude...@gmail.com> wrote:
>  > Hi Saad
>  >
>  > Your argument here
>  >>> The
>  >>>theory is that since prefetch is an async operation, a lot of the
>  reads
>  in
>  >>>the checkAndPut for the region in question start reading from S3
>  which is
>  >>>slow. So the write lock obtained for the checkAndPut is held for a
>  longer
>  >>>duration than normal. This has cascading upstream effects. Does
> that
>  sound
>  >>>plausible?
>  >
>  > Seems very much plausible. So before even 

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Anoop John
Hi Saad
   In your initial mail you mentioned that there are lots
of checkAndPut ops but on different rows. The failure in obtaining
locks (write lock as it is checkAndPut) means there is contention on
the same row key.  If that is the case , ya that is the 1st step
before BC reads and it make sense.

On the Q on why not caching the compacted file content, yes it is this
way. Even if cache on write is true. This is because some times the
compacted result file could be so large (what is major compaction) and
that will exhaust the BC if written. Also it might contain some data
which are very old.  There is a jira recently raised jira which
discuss abt this.  Pls see HBASE-20045


-Anoop-

On Sun, Mar 11, 2018 at 7:57 AM, Saad Mufti  wrote:
> Although now that I think about this a bit more, all the failures we saw
> were failure to obtain a row lock, and in the thread stack traces we always
> saw it somewhere inside getRowLockInternal and similar. Never saw any
> contention on bucket cache lock that I could see.
>
> Cheers.
>
> 
> Saad
>
>
> On Sat, Mar 10, 2018 at 8:04 PM, Saad Mufti  wrote:
>
>> Also, for now we have mitigated this problem by using the new setting in
>> HBase 1.4.0 that prevents one slow region server from blocking all client
>> requests. Of course it causes some timeouts but our overall ecosystem
>> contains Kafka queues for retries, so we can live with that. From what I
>> can see, it looks like this setting also has the good effect of preventing
>> clients from hammering a region server that is slow because its IPC queues
>> are backed up, allowing it to recover faster.
>>
>> Does that make sense?
>>
>> Cheers.
>>
>> 
>> Saad
>>
>>
>> On Sat, Mar 10, 2018 at 7:04 PM, Saad Mufti  wrote:
>>
>>> So if I understand correctly, we would mitigate the problem by not
>>> evicting blocks for archived files immediately? Wouldn't this potentially
>>> lead to problems later if the LRU algo chooses to evict blocks for active
>>> files and leave blocks for archived files in there?
>>>
>>> I would definitely love to test this!!! Unfortunately we are running on
>>> EMR and the details of how to patch HBase under EMR are not clear to me :-(
>>>
>>> What we would really love would be a setting for actually immediately
>>> caching blocks for a new compacted file. I have seen in the code that even
>>> is we have the cache on write setting set to true, it will refuse to cache
>>> blocks for a file that is a newly compacted one. In our case we have sized
>>> the bucket cache to be big enough to hold all our data, and really want to
>>> avoid having to go to S3 until the last possible moment. A config setting
>>> to test this would be great.
>>>
>>> But thanks everyone for your feedback. Any more would also be welcome on
>>> the idea to let a user cache all newly compacted files.
>>>
>>> 
>>> Saad
>>>
>>>
>>> On Wed, Mar 7, 2018 at 12:00 AM, Anoop John 
>>> wrote:
>>>
 >>a) it was indeed one of the regions that was being compacted, major
 compaction in one case, minor compaction in another, the issue started
 just
 after compaction completed blowing away bucket cached blocks for the
 older
 HFile's

 About this part.Ya after the compaction, there is a step where the
 compacted away HFile's blocks getting removed from cache. This op takes a
 write lock for this region (In Bucket Cache layer)..  Every read op which
 is part of checkAndPut will try read from BC and that in turn need a read
 lock for this region.  So there is chances that the read locks starve
 because of so many frequent write locks .  Each block evict will attain
 the
 write lock one after other.  Will it be possible for you to patch this
 evict and test once? We can avoid the immediate evict from BC after
 compaction. I can help you with a patch if you wish

 Anoop



 On Mon, Mar 5, 2018 at 11:07 AM, ramkrishna vasudevan <
 ramkrishna.s.vasude...@gmail.com> wrote:
 > Hi Saad
 >
 > Your argument here
 >>> The
 >>>theory is that since prefetch is an async operation, a lot of the
 reads
 in
 >>>the checkAndPut for the region in question start reading from S3
 which is
 >>>slow. So the write lock obtained for the checkAndPut is held for a
 longer
 >>>duration than normal. This has cascading upstream effects. Does that
 sound
 >>>plausible?
 >
 > Seems very much plausible. So before even the prefetch happens say for
 > 'block 1' - and you have already issues N checkAndPut calls for the
 rows
 in
 > that 'block 1' -  all those checkAndPut will have to read that block
 from
 > S3 to perform the get() and then apply the mutation.
 >
 > This may happen for multiple threads at the same time because we are
 not
 > sure when the prefetch would 

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Saad Mufti
Although now that I think about this a bit more, all the failures we saw
were failure to obtain a row lock, and in the thread stack traces we always
saw it somewhere inside getRowLockInternal and similar. Never saw any
contention on bucket cache lock that I could see.

Cheers.


Saad


On Sat, Mar 10, 2018 at 8:04 PM, Saad Mufti  wrote:

> Also, for now we have mitigated this problem by using the new setting in
> HBase 1.4.0 that prevents one slow region server from blocking all client
> requests. Of course it causes some timeouts but our overall ecosystem
> contains Kafka queues for retries, so we can live with that. From what I
> can see, it looks like this setting also has the good effect of preventing
> clients from hammering a region server that is slow because its IPC queues
> are backed up, allowing it to recover faster.
>
> Does that make sense?
>
> Cheers.
>
> 
> Saad
>
>
> On Sat, Mar 10, 2018 at 7:04 PM, Saad Mufti  wrote:
>
>> So if I understand correctly, we would mitigate the problem by not
>> evicting blocks for archived files immediately? Wouldn't this potentially
>> lead to problems later if the LRU algo chooses to evict blocks for active
>> files and leave blocks for archived files in there?
>>
>> I would definitely love to test this!!! Unfortunately we are running on
>> EMR and the details of how to patch HBase under EMR are not clear to me :-(
>>
>> What we would really love would be a setting for actually immediately
>> caching blocks for a new compacted file. I have seen in the code that even
>> is we have the cache on write setting set to true, it will refuse to cache
>> blocks for a file that is a newly compacted one. In our case we have sized
>> the bucket cache to be big enough to hold all our data, and really want to
>> avoid having to go to S3 until the last possible moment. A config setting
>> to test this would be great.
>>
>> But thanks everyone for your feedback. Any more would also be welcome on
>> the idea to let a user cache all newly compacted files.
>>
>> 
>> Saad
>>
>>
>> On Wed, Mar 7, 2018 at 12:00 AM, Anoop John 
>> wrote:
>>
>>> >>a) it was indeed one of the regions that was being compacted, major
>>> compaction in one case, minor compaction in another, the issue started
>>> just
>>> after compaction completed blowing away bucket cached blocks for the
>>> older
>>> HFile's
>>>
>>> About this part.Ya after the compaction, there is a step where the
>>> compacted away HFile's blocks getting removed from cache. This op takes a
>>> write lock for this region (In Bucket Cache layer)..  Every read op which
>>> is part of checkAndPut will try read from BC and that in turn need a read
>>> lock for this region.  So there is chances that the read locks starve
>>> because of so many frequent write locks .  Each block evict will attain
>>> the
>>> write lock one after other.  Will it be possible for you to patch this
>>> evict and test once? We can avoid the immediate evict from BC after
>>> compaction. I can help you with a patch if you wish
>>>
>>> Anoop
>>>
>>>
>>>
>>> On Mon, Mar 5, 2018 at 11:07 AM, ramkrishna vasudevan <
>>> ramkrishna.s.vasude...@gmail.com> wrote:
>>> > Hi Saad
>>> >
>>> > Your argument here
>>> >>> The
>>> >>>theory is that since prefetch is an async operation, a lot of the
>>> reads
>>> in
>>> >>>the checkAndPut for the region in question start reading from S3
>>> which is
>>> >>>slow. So the write lock obtained for the checkAndPut is held for a
>>> longer
>>> >>>duration than normal. This has cascading upstream effects. Does that
>>> sound
>>> >>>plausible?
>>> >
>>> > Seems very much plausible. So before even the prefetch happens say for
>>> > 'block 1' - and you have already issues N checkAndPut calls for the
>>> rows
>>> in
>>> > that 'block 1' -  all those checkAndPut will have to read that block
>>> from
>>> > S3 to perform the get() and then apply the mutation.
>>> >
>>> > This may happen for multiple threads at the same time because we are
>>> not
>>> > sure when the prefetch would have actually been completed. I don know
>>> what
>>> > are the general read characteristics when a read happens from S3 but
>>> you
>>> > could try to see how things work when a read happens from S3 and after
>>> the
>>> > prefetch completes ensure the same checkandPut() is done (from cache
>>> this
>>> > time) to really know the difference what S3 does there.
>>> >
>>> > Regards
>>> > Ram
>>> >
>>> > On Fri, Mar 2, 2018 at 2:57 AM, Saad Mufti 
>>> wrote:
>>> >
>>> >> So after much investigation I can confirm:
>>> >>
>>> >> a) it was indeed one of the regions that was being compacted, major
>>> >> compaction in one case, minor compaction in another, the issue started
>>> just
>>> >> after compaction completed blowing away bucket cached blocks for the
>>> older
>>> >> HFile's
>>> >> b) in another case there was no compaction just a newly opened region
>>> in
>>> 

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Saad Mufti
Also, for now we have mitigated this problem by using the new setting in
HBase 1.4.0 that prevents one slow region server from blocking all client
requests. Of course it causes some timeouts but our overall ecosystem
contains Kafka queues for retries, so we can live with that. From what I
can see, it looks like this setting also has the good effect of preventing
clients from hammering a region server that is slow because its IPC queues
are backed up, allowing it to recover faster.

Does that make sense?

Cheers.


Saad


On Sat, Mar 10, 2018 at 7:04 PM, Saad Mufti  wrote:

> So if I understand correctly, we would mitigate the problem by not
> evicting blocks for archived files immediately? Wouldn't this potentially
> lead to problems later if the LRU algo chooses to evict blocks for active
> files and leave blocks for archived files in there?
>
> I would definitely love to test this!!! Unfortunately we are running on
> EMR and the details of how to patch HBase under EMR are not clear to me :-(
>
> What we would really love would be a setting for actually immediately
> caching blocks for a new compacted file. I have seen in the code that even
> is we have the cache on write setting set to true, it will refuse to cache
> blocks for a file that is a newly compacted one. In our case we have sized
> the bucket cache to be big enough to hold all our data, and really want to
> avoid having to go to S3 until the last possible moment. A config setting
> to test this would be great.
>
> But thanks everyone for your feedback. Any more would also be welcome on
> the idea to let a user cache all newly compacted files.
>
> 
> Saad
>
>
> On Wed, Mar 7, 2018 at 12:00 AM, Anoop John  wrote:
>
>> >>a) it was indeed one of the regions that was being compacted, major
>> compaction in one case, minor compaction in another, the issue started
>> just
>> after compaction completed blowing away bucket cached blocks for the older
>> HFile's
>>
>> About this part.Ya after the compaction, there is a step where the
>> compacted away HFile's blocks getting removed from cache. This op takes a
>> write lock for this region (In Bucket Cache layer)..  Every read op which
>> is part of checkAndPut will try read from BC and that in turn need a read
>> lock for this region.  So there is chances that the read locks starve
>> because of so many frequent write locks .  Each block evict will attain
>> the
>> write lock one after other.  Will it be possible for you to patch this
>> evict and test once? We can avoid the immediate evict from BC after
>> compaction. I can help you with a patch if you wish
>>
>> Anoop
>>
>>
>>
>> On Mon, Mar 5, 2018 at 11:07 AM, ramkrishna vasudevan <
>> ramkrishna.s.vasude...@gmail.com> wrote:
>> > Hi Saad
>> >
>> > Your argument here
>> >>> The
>> >>>theory is that since prefetch is an async operation, a lot of the reads
>> in
>> >>>the checkAndPut for the region in question start reading from S3 which
>> is
>> >>>slow. So the write lock obtained for the checkAndPut is held for a
>> longer
>> >>>duration than normal. This has cascading upstream effects. Does that
>> sound
>> >>>plausible?
>> >
>> > Seems very much plausible. So before even the prefetch happens say for
>> > 'block 1' - and you have already issues N checkAndPut calls for the rows
>> in
>> > that 'block 1' -  all those checkAndPut will have to read that block
>> from
>> > S3 to perform the get() and then apply the mutation.
>> >
>> > This may happen for multiple threads at the same time because we are not
>> > sure when the prefetch would have actually been completed. I don know
>> what
>> > are the general read characteristics when a read happens from S3 but you
>> > could try to see how things work when a read happens from S3 and after
>> the
>> > prefetch completes ensure the same checkandPut() is done (from cache
>> this
>> > time) to really know the difference what S3 does there.
>> >
>> > Regards
>> > Ram
>> >
>> > On Fri, Mar 2, 2018 at 2:57 AM, Saad Mufti 
>> wrote:
>> >
>> >> So after much investigation I can confirm:
>> >>
>> >> a) it was indeed one of the regions that was being compacted, major
>> >> compaction in one case, minor compaction in another, the issue started
>> just
>> >> after compaction completed blowing away bucket cached blocks for the
>> older
>> >> HFile's
>> >> b) in another case there was no compaction just a newly opened region
>> in
>> a
>> >> region server that hadn't finished perfetching its pages from S3
>> >>
>> >> We have prefetch on open set to true. Our load is heavy on checkAndPut
>> .The
>> >> theory is that since prefetch is an async operation, a lot of the reads
>> in
>> >> the checkAndPut for the region in question start reading from S3 which
>> is
>> >> slow. So the write lock obtained for the checkAndPut is held for a
>> longer
>> >> duration than normal. This has cascading upstream effects. Does that
>> sound
>> >> plausible?

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Saad Mufti
So if I understand correctly, we would mitigate the problem by not evicting
blocks for archived files immediately? Wouldn't this potentially lead to
problems later if the LRU algo chooses to evict blocks for active files and
leave blocks for archived files in there?

I would definitely love to test this!!! Unfortunately we are running on EMR
and the details of how to patch HBase under EMR are not clear to me :-(

What we would really love would be a setting for actually immediately
caching blocks for a new compacted file. I have seen in the code that even
is we have the cache on write setting set to true, it will refuse to cache
blocks for a file that is a newly compacted one. In our case we have sized
the bucket cache to be big enough to hold all our data, and really want to
avoid having to go to S3 until the last possible moment. A config setting
to test this would be great.

But thanks everyone for your feedback. Any more would also be welcome on
the idea to let a user cache all newly compacted files.


Saad


On Wed, Mar 7, 2018 at 12:00 AM, Anoop John  wrote:

> >>a) it was indeed one of the regions that was being compacted, major
> compaction in one case, minor compaction in another, the issue started just
> after compaction completed blowing away bucket cached blocks for the older
> HFile's
>
> About this part.Ya after the compaction, there is a step where the
> compacted away HFile's blocks getting removed from cache. This op takes a
> write lock for this region (In Bucket Cache layer)..  Every read op which
> is part of checkAndPut will try read from BC and that in turn need a read
> lock for this region.  So there is chances that the read locks starve
> because of so many frequent write locks .  Each block evict will attain the
> write lock one after other.  Will it be possible for you to patch this
> evict and test once? We can avoid the immediate evict from BC after
> compaction. I can help you with a patch if you wish
>
> Anoop
>
>
>
> On Mon, Mar 5, 2018 at 11:07 AM, ramkrishna vasudevan <
> ramkrishna.s.vasude...@gmail.com> wrote:
> > Hi Saad
> >
> > Your argument here
> >>> The
> >>>theory is that since prefetch is an async operation, a lot of the reads
> in
> >>>the checkAndPut for the region in question start reading from S3 which
> is
> >>>slow. So the write lock obtained for the checkAndPut is held for a
> longer
> >>>duration than normal. This has cascading upstream effects. Does that
> sound
> >>>plausible?
> >
> > Seems very much plausible. So before even the prefetch happens say for
> > 'block 1' - and you have already issues N checkAndPut calls for the rows
> in
> > that 'block 1' -  all those checkAndPut will have to read that block from
> > S3 to perform the get() and then apply the mutation.
> >
> > This may happen for multiple threads at the same time because we are not
> > sure when the prefetch would have actually been completed. I don know
> what
> > are the general read characteristics when a read happens from S3 but you
> > could try to see how things work when a read happens from S3 and after
> the
> > prefetch completes ensure the same checkandPut() is done (from cache this
> > time) to really know the difference what S3 does there.
> >
> > Regards
> > Ram
> >
> > On Fri, Mar 2, 2018 at 2:57 AM, Saad Mufti  wrote:
> >
> >> So after much investigation I can confirm:
> >>
> >> a) it was indeed one of the regions that was being compacted, major
> >> compaction in one case, minor compaction in another, the issue started
> just
> >> after compaction completed blowing away bucket cached blocks for the
> older
> >> HFile's
> >> b) in another case there was no compaction just a newly opened region in
> a
> >> region server that hadn't finished perfetching its pages from S3
> >>
> >> We have prefetch on open set to true. Our load is heavy on checkAndPut
> .The
> >> theory is that since prefetch is an async operation, a lot of the reads
> in
> >> the checkAndPut for the region in question start reading from S3 which
> is
> >> slow. So the write lock obtained for the checkAndPut is held for a
> longer
> >> duration than normal. This has cascading upstream effects. Does that
> sound
> >> plausible?
> >>
> >> The part I don't understand still is all the locks held are for the same
> >> region but are all for different rows. So once the prefetch is
> completed,
> >> shouldn't the problem clear up quickly? Or does the slow region slow
> down
> >> anyone trying to do checkAndPut on any row in the same region even after
> >> the prefetch has completed. That is, do the long held row locks prevent
> >> others from getting a row lock on a different row in the same region?
> >>
> >> In any case, we trying to use
> >> https://issues.apache.org/jira/browse/HBASE-16388 support in HBase
> 1.4.0
> >> to
> >> both insulate the app a bit from this situation and hoping that it will
> >> reduce pressure on the region server in question, allowing 

How Long Will HBase Hold A Row Write Lock?

2018-03-06 Thread Anoop John
>>a) it was indeed one of the regions that was being compacted, major
compaction in one case, minor compaction in another, the issue started just
after compaction completed blowing away bucket cached blocks for the older
HFile's

About this part.Ya after the compaction, there is a step where the
compacted away HFile's blocks getting removed from cache. This op takes a
write lock for this region (In Bucket Cache layer)..  Every read op which
is part of checkAndPut will try read from BC and that in turn need a read
lock for this region.  So there is chances that the read locks starve
because of so many frequent write locks .  Each block evict will attain the
write lock one after other.  Will it be possible for you to patch this
evict and test once? We can avoid the immediate evict from BC after
compaction. I can help you with a patch if you wish

Anoop



On Mon, Mar 5, 2018 at 11:07 AM, ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com> wrote:
> Hi Saad
>
> Your argument here
>>> The
>>>theory is that since prefetch is an async operation, a lot of the reads
in
>>>the checkAndPut for the region in question start reading from S3 which is
>>>slow. So the write lock obtained for the checkAndPut is held for a longer
>>>duration than normal. This has cascading upstream effects. Does that
sound
>>>plausible?
>
> Seems very much plausible. So before even the prefetch happens say for
> 'block 1' - and you have already issues N checkAndPut calls for the rows
in
> that 'block 1' -  all those checkAndPut will have to read that block from
> S3 to perform the get() and then apply the mutation.
>
> This may happen for multiple threads at the same time because we are not
> sure when the prefetch would have actually been completed. I don know what
> are the general read characteristics when a read happens from S3 but you
> could try to see how things work when a read happens from S3 and after the
> prefetch completes ensure the same checkandPut() is done (from cache this
> time) to really know the difference what S3 does there.
>
> Regards
> Ram
>
> On Fri, Mar 2, 2018 at 2:57 AM, Saad Mufti  wrote:
>
>> So after much investigation I can confirm:
>>
>> a) it was indeed one of the regions that was being compacted, major
>> compaction in one case, minor compaction in another, the issue started
just
>> after compaction completed blowing away bucket cached blocks for the
older
>> HFile's
>> b) in another case there was no compaction just a newly opened region in
a
>> region server that hadn't finished perfetching its pages from S3
>>
>> We have prefetch on open set to true. Our load is heavy on checkAndPut
.The
>> theory is that since prefetch is an async operation, a lot of the reads
in
>> the checkAndPut for the region in question start reading from S3 which is
>> slow. So the write lock obtained for the checkAndPut is held for a longer
>> duration than normal. This has cascading upstream effects. Does that
sound
>> plausible?
>>
>> The part I don't understand still is all the locks held are for the same
>> region but are all for different rows. So once the prefetch is completed,
>> shouldn't the problem clear up quickly? Or does the slow region slow down
>> anyone trying to do checkAndPut on any row in the same region even after
>> the prefetch has completed. That is, do the long held row locks prevent
>> others from getting a row lock on a different row in the same region?
>>
>> In any case, we trying to use
>> https://issues.apache.org/jira/browse/HBASE-16388 support in HBase 1.4.0
>> to
>> both insulate the app a bit from this situation and hoping that it will
>> reduce pressure on the region server in question, allowing it to recover
>> faster. I haven't quite tested that yet, any advice in the meantime would
>> be appreciated.
>>
>> Cheers.
>>
>> 
>> Saad
>>
>>
>>
>> On Thu, Mar 1, 2018 at 9:21 AM, Saad Mufti  wrote:
>>
>> > Actually it happened again while some minior compactions were running,
so
>> > don't think it related to our major compaction tool, which isn't even
>> > running right now. I will try to capture a debug dump of threads and
>> > everything while the event is ongoing. Seems to last at least half an
>> hour
>> > or so and sometimes longer.
>> >
>> > 
>> > Saad
>> >
>> >
>> > On Thu, Mar 1, 2018 at 7:54 AM, Saad Mufti 
wrote:
>> >
>> >> Unfortunately I lost the stack trace overnight. But it does seem
related
>> >> to compaction, because now that the compaction tool is done, I don't
see
>> >> the issue anymore. I will run our incremental major compaction tool
>> again
>> >> and see if I can reproduce the issue.
>> >>
>> >> On the plus side the system stayed stable and eventually recovered,
>> >> although it did suffer all those timeouts.
>> >>
>> >> 
>> >> Saad
>> >>
>> >>
>> >> On Wed, Feb 28, 2018 at 10:18 PM, Saad Mufti 
>> >> wrote:
>> >>
>> >>> I'll paste a thread dump later, writing this 

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-04 Thread ramkrishna vasudevan
Hi Saad

Your argument here
>> The
>>theory is that since prefetch is an async operation, a lot of the reads in
>>the checkAndPut for the region in question start reading from S3 which is
>>slow. So the write lock obtained for the checkAndPut is held for a longer
>>duration than normal. This has cascading upstream effects. Does that sound
>>plausible?

Seems very much plausible. So before even the prefetch happens say for
'block 1' - and you have already issues N checkAndPut calls for the rows in
that 'block 1' -  all those checkAndPut will have to read that block from
S3 to perform the get() and then apply the mutation.

This may happen for multiple threads at the same time because we are not
sure when the prefetch would have actually been completed. I don know what
are the general read characteristics when a read happens from S3 but you
could try to see how things work when a read happens from S3 and after the
prefetch completes ensure the same checkandPut() is done (from cache this
time) to really know the difference what S3 does there.

Regards
Ram

On Fri, Mar 2, 2018 at 2:57 AM, Saad Mufti  wrote:

> So after much investigation I can confirm:
>
> a) it was indeed one of the regions that was being compacted, major
> compaction in one case, minor compaction in another, the issue started just
> after compaction completed blowing away bucket cached blocks for the older
> HFile's
> b) in another case there was no compaction just a newly opened region in a
> region server that hadn't finished perfetching its pages from S3
>
> We have prefetch on open set to true. Our load is heavy on checkAndPut .The
> theory is that since prefetch is an async operation, a lot of the reads in
> the checkAndPut for the region in question start reading from S3 which is
> slow. So the write lock obtained for the checkAndPut is held for a longer
> duration than normal. This has cascading upstream effects. Does that sound
> plausible?
>
> The part I don't understand still is all the locks held are for the same
> region but are all for different rows. So once the prefetch is completed,
> shouldn't the problem clear up quickly? Or does the slow region slow down
> anyone trying to do checkAndPut on any row in the same region even after
> the prefetch has completed. That is, do the long held row locks prevent
> others from getting a row lock on a different row in the same region?
>
> In any case, we trying to use
> https://issues.apache.org/jira/browse/HBASE-16388 support in HBase 1.4.0
> to
> both insulate the app a bit from this situation and hoping that it will
> reduce pressure on the region server in question, allowing it to recover
> faster. I haven't quite tested that yet, any advice in the meantime would
> be appreciated.
>
> Cheers.
>
> 
> Saad
>
>
>
> On Thu, Mar 1, 2018 at 9:21 AM, Saad Mufti  wrote:
>
> > Actually it happened again while some minior compactions were running, so
> > don't think it related to our major compaction tool, which isn't even
> > running right now. I will try to capture a debug dump of threads and
> > everything while the event is ongoing. Seems to last at least half an
> hour
> > or so and sometimes longer.
> >
> > 
> > Saad
> >
> >
> > On Thu, Mar 1, 2018 at 7:54 AM, Saad Mufti  wrote:
> >
> >> Unfortunately I lost the stack trace overnight. But it does seem related
> >> to compaction, because now that the compaction tool is done, I don't see
> >> the issue anymore. I will run our incremental major compaction tool
> again
> >> and see if I can reproduce the issue.
> >>
> >> On the plus side the system stayed stable and eventually recovered,
> >> although it did suffer all those timeouts.
> >>
> >> 
> >> Saad
> >>
> >>
> >> On Wed, Feb 28, 2018 at 10:18 PM, Saad Mufti 
> >> wrote:
> >>
> >>> I'll paste a thread dump later, writing this from my phone  :-)
> >>>
> >>> So the same issue has happened at different times for different
> regions,
> >>> but I couldn't see that the region in question was the one being
> compacted,
> >>> either this time or earlier. Although I might have missed an earlier
> >>> correlation in the logs where the issue started just after the
> compaction
> >>> completed.
> >>>
> >>> Usually a compaction for this table's regions take around 5-10 minutes,
> >>> much less for its smaller column family which is block cache enabled,
> >>> around a minute or less, and 5-10 minutes for the much larger one for
> which
> >>> we have block cache disabled in the schema, because we don't ever read
> it
> >>> in the primary cluster. So the only impact on reads would be from that
> >>> smaller column family which takes less than a minute to compact.
> >>>
> >>> But the issue once started doesn't seem to recover for a long time,
> long
> >>> past when any compaction on the region itself could impact anything.
> The
> >>> compaction tool which is our own code has long since moved to other

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-01 Thread Saad Mufti
So after much investigation I can confirm:

a) it was indeed one of the regions that was being compacted, major
compaction in one case, minor compaction in another, the issue started just
after compaction completed blowing away bucket cached blocks for the older
HFile's
b) in another case there was no compaction just a newly opened region in a
region server that hadn't finished perfetching its pages from S3

We have prefetch on open set to true. Our load is heavy on checkAndPut .The
theory is that since prefetch is an async operation, a lot of the reads in
the checkAndPut for the region in question start reading from S3 which is
slow. So the write lock obtained for the checkAndPut is held for a longer
duration than normal. This has cascading upstream effects. Does that sound
plausible?

The part I don't understand still is all the locks held are for the same
region but are all for different rows. So once the prefetch is completed,
shouldn't the problem clear up quickly? Or does the slow region slow down
anyone trying to do checkAndPut on any row in the same region even after
the prefetch has completed. That is, do the long held row locks prevent
others from getting a row lock on a different row in the same region?

In any case, we trying to use
https://issues.apache.org/jira/browse/HBASE-16388 support in HBase 1.4.0 to
both insulate the app a bit from this situation and hoping that it will
reduce pressure on the region server in question, allowing it to recover
faster. I haven't quite tested that yet, any advice in the meantime would
be appreciated.

Cheers.


Saad



On Thu, Mar 1, 2018 at 9:21 AM, Saad Mufti  wrote:

> Actually it happened again while some minior compactions were running, so
> don't think it related to our major compaction tool, which isn't even
> running right now. I will try to capture a debug dump of threads and
> everything while the event is ongoing. Seems to last at least half an hour
> or so and sometimes longer.
>
> 
> Saad
>
>
> On Thu, Mar 1, 2018 at 7:54 AM, Saad Mufti  wrote:
>
>> Unfortunately I lost the stack trace overnight. But it does seem related
>> to compaction, because now that the compaction tool is done, I don't see
>> the issue anymore. I will run our incremental major compaction tool again
>> and see if I can reproduce the issue.
>>
>> On the plus side the system stayed stable and eventually recovered,
>> although it did suffer all those timeouts.
>>
>> 
>> Saad
>>
>>
>> On Wed, Feb 28, 2018 at 10:18 PM, Saad Mufti 
>> wrote:
>>
>>> I'll paste a thread dump later, writing this from my phone  :-)
>>>
>>> So the same issue has happened at different times for different regions,
>>> but I couldn't see that the region in question was the one being compacted,
>>> either this time or earlier. Although I might have missed an earlier
>>> correlation in the logs where the issue started just after the compaction
>>> completed.
>>>
>>> Usually a compaction for this table's regions take around 5-10 minutes,
>>> much less for its smaller column family which is block cache enabled,
>>> around a minute or less, and 5-10 minutes for the much larger one for which
>>> we have block cache disabled in the schema, because we don't ever read it
>>> in the primary cluster. So the only impact on reads would be from that
>>> smaller column family which takes less than a minute to compact.
>>>
>>> But the issue once started doesn't seem to recover for a long time, long
>>> past when any compaction on the region itself could impact anything. The
>>> compaction tool which is our own code has long since moved to other
>>> regions.
>>>
>>> Cheers.
>>>
>>> 
>>> Saad
>>>
>>>
>>> On Wed, Feb 28, 2018 at 9:39 PM Ted Yu  wrote:
>>>
 bq. timing out trying to obtain write locks on rows in that region.

 Can you confirm that the region under contention was the one being major
 compacted ?

 Can you pastebin thread dump so that we can have better idea of the
 scenario ?

 For the region being compacted, how long would the compaction take (just
 want to see if there was correlation between this duration and timeout)
 ?

 Cheers

 On Wed, Feb 28, 2018 at 6:31 PM, Saad Mufti 
 wrote:

 > Hi,
 >
 > We are running on Amazon EMR based HBase 1.4.0 . We are currently
 seeing a
 > situation where sometimes a particular region gets into a situation
 where a
 > lot of write requests to any row in that region timeout saying they
 failed
 > to obtain a lock on a row in a region and eventually they experience
 an IPC
 > timeout. This causes the IPC queue to blow up in size as requests get
 > backed up, and that region server experiences a much higher than
 normal
 > timeout rate for all requests, not just those timing out for failing
 to
 > obtain the row lock.

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-01 Thread Saad Mufti
Actually it happened again while some minior compactions were running, so
don't think it related to our major compaction tool, which isn't even
running right now. I will try to capture a debug dump of threads and
everything while the event is ongoing. Seems to last at least half an hour
or so and sometimes longer.


Saad


On Thu, Mar 1, 2018 at 7:54 AM, Saad Mufti  wrote:

> Unfortunately I lost the stack trace overnight. But it does seem related
> to compaction, because now that the compaction tool is done, I don't see
> the issue anymore. I will run our incremental major compaction tool again
> and see if I can reproduce the issue.
>
> On the plus side the system stayed stable and eventually recovered,
> although it did suffer all those timeouts.
>
> 
> Saad
>
>
> On Wed, Feb 28, 2018 at 10:18 PM, Saad Mufti  wrote:
>
>> I'll paste a thread dump later, writing this from my phone  :-)
>>
>> So the same issue has happened at different times for different regions,
>> but I couldn't see that the region in question was the one being compacted,
>> either this time or earlier. Although I might have missed an earlier
>> correlation in the logs where the issue started just after the compaction
>> completed.
>>
>> Usually a compaction for this table's regions take around 5-10 minutes,
>> much less for its smaller column family which is block cache enabled,
>> around a minute or less, and 5-10 minutes for the much larger one for which
>> we have block cache disabled in the schema, because we don't ever read it
>> in the primary cluster. So the only impact on reads would be from that
>> smaller column family which takes less than a minute to compact.
>>
>> But the issue once started doesn't seem to recover for a long time, long
>> past when any compaction on the region itself could impact anything. The
>> compaction tool which is our own code has long since moved to other
>> regions.
>>
>> Cheers.
>>
>> 
>> Saad
>>
>>
>> On Wed, Feb 28, 2018 at 9:39 PM Ted Yu  wrote:
>>
>>> bq. timing out trying to obtain write locks on rows in that region.
>>>
>>> Can you confirm that the region under contention was the one being major
>>> compacted ?
>>>
>>> Can you pastebin thread dump so that we can have better idea of the
>>> scenario ?
>>>
>>> For the region being compacted, how long would the compaction take (just
>>> want to see if there was correlation between this duration and timeout) ?
>>>
>>> Cheers
>>>
>>> On Wed, Feb 28, 2018 at 6:31 PM, Saad Mufti 
>>> wrote:
>>>
>>> > Hi,
>>> >
>>> > We are running on Amazon EMR based HBase 1.4.0 . We are currently
>>> seeing a
>>> > situation where sometimes a particular region gets into a situation
>>> where a
>>> > lot of write requests to any row in that region timeout saying they
>>> failed
>>> > to obtain a lock on a row in a region and eventually they experience
>>> an IPC
>>> > timeout. This causes the IPC queue to blow up in size as requests get
>>> > backed up, and that region server experiences a much higher than normal
>>> > timeout rate for all requests, not just those timing out for failing to
>>> > obtain the row lock.
>>> >
>>> > The strange thing is the rows are always different but the region is
>>> always
>>> > the same. So the question is, is there a region component to how long
>>> a row
>>> > write lock would be held? I looked at the debug dump and the RowLocks
>>> > section shows a long list of write row locks held, all of them are
>>> from the
>>> > same region but different rows.
>>> >
>>> > Will trying to obtain a write row lock experience delays if no one else
>>> > holds a lock on the same row but the region itself is experiencing read
>>> > delays? We do have an incremental compaction tool running that major
>>> > compacts one region per region server at a time, so that will drive out
>>> > pages from the bucket cache. But for most regions the impact is
>>> > transitional until the bucket cache gets populated by pages from the
>>> new
>>> > HFile. But for this one region we start timing out trying to obtain
>>> write
>>> > locks on rows in that region.
>>> >
>>> > Any insight anyone can provide would be most welcome.
>>> >
>>> > Cheers.
>>> >
>>> > 
>>> > Saad
>>> >
>>>
>>
>


Re: How Long Will HBase Hold A Row Write Lock?

2018-03-01 Thread Saad Mufti
Unfortunately I lost the stack trace overnight. But it does seem related to
compaction, because now that the compaction tool is done, I don't see the
issue anymore. I will run our incremental major compaction tool again and
see if I can reproduce the issue.

On the plus side the system stayed stable and eventually recovered,
although it did suffer all those timeouts.


Saad


On Wed, Feb 28, 2018 at 10:18 PM, Saad Mufti  wrote:

> I'll paste a thread dump later, writing this from my phone  :-)
>
> So the same issue has happened at different times for different regions,
> but I couldn't see that the region in question was the one being compacted,
> either this time or earlier. Although I might have missed an earlier
> correlation in the logs where the issue started just after the compaction
> completed.
>
> Usually a compaction for this table's regions take around 5-10 minutes,
> much less for its smaller column family which is block cache enabled,
> around a minute or less, and 5-10 minutes for the much larger one for which
> we have block cache disabled in the schema, because we don't ever read it
> in the primary cluster. So the only impact on reads would be from that
> smaller column family which takes less than a minute to compact.
>
> But the issue once started doesn't seem to recover for a long time, long
> past when any compaction on the region itself could impact anything. The
> compaction tool which is our own code has long since moved to other
> regions.
>
> Cheers.
>
> 
> Saad
>
>
> On Wed, Feb 28, 2018 at 9:39 PM Ted Yu  wrote:
>
>> bq. timing out trying to obtain write locks on rows in that region.
>>
>> Can you confirm that the region under contention was the one being major
>> compacted ?
>>
>> Can you pastebin thread dump so that we can have better idea of the
>> scenario ?
>>
>> For the region being compacted, how long would the compaction take (just
>> want to see if there was correlation between this duration and timeout) ?
>>
>> Cheers
>>
>> On Wed, Feb 28, 2018 at 6:31 PM, Saad Mufti  wrote:
>>
>> > Hi,
>> >
>> > We are running on Amazon EMR based HBase 1.4.0 . We are currently
>> seeing a
>> > situation where sometimes a particular region gets into a situation
>> where a
>> > lot of write requests to any row in that region timeout saying they
>> failed
>> > to obtain a lock on a row in a region and eventually they experience an
>> IPC
>> > timeout. This causes the IPC queue to blow up in size as requests get
>> > backed up, and that region server experiences a much higher than normal
>> > timeout rate for all requests, not just those timing out for failing to
>> > obtain the row lock.
>> >
>> > The strange thing is the rows are always different but the region is
>> always
>> > the same. So the question is, is there a region component to how long a
>> row
>> > write lock would be held? I looked at the debug dump and the RowLocks
>> > section shows a long list of write row locks held, all of them are from
>> the
>> > same region but different rows.
>> >
>> > Will trying to obtain a write row lock experience delays if no one else
>> > holds a lock on the same row but the region itself is experiencing read
>> > delays? We do have an incremental compaction tool running that major
>> > compacts one region per region server at a time, so that will drive out
>> > pages from the bucket cache. But for most regions the impact is
>> > transitional until the bucket cache gets populated by pages from the new
>> > HFile. But for this one region we start timing out trying to obtain
>> write
>> > locks on rows in that region.
>> >
>> > Any insight anyone can provide would be most welcome.
>> >
>> > Cheers.
>> >
>> > 
>> > Saad
>> >
>>
>


Re: How Long Will HBase Hold A Row Write Lock?

2018-02-28 Thread Saad Mufti
I'll paste a thread dump later, writing this from my phone  :-)

So the same issue has happened at different times for different regions,
but I couldn't see that the region in question was the one being compacted,
either this time or earlier. Although I might have missed an earlier
correlation in the logs where the issue started just after the compaction
completed.

Usually a compaction for this table's regions take around 5-10 minutes,
much less for its smaller column family which is block cache enabled,
around a minute or less, and 5-10 minutes for the much larger one for which
we have block cache disabled in the schema, because we don't ever read it
in the primary cluster. So the only impact on reads would be from that
smaller column family which takes less than a minute to compact.

But the issue once started doesn't seem to recover for a long time, long
past when any compaction on the region itself could impact anything. The
compaction tool which is our own code has long since moved to other
regions.

Cheers.


Saad


On Wed, Feb 28, 2018 at 9:39 PM Ted Yu  wrote:

> bq. timing out trying to obtain write locks on rows in that region.
>
> Can you confirm that the region under contention was the one being major
> compacted ?
>
> Can you pastebin thread dump so that we can have better idea of the
> scenario ?
>
> For the region being compacted, how long would the compaction take (just
> want to see if there was correlation between this duration and timeout) ?
>
> Cheers
>
> On Wed, Feb 28, 2018 at 6:31 PM, Saad Mufti  wrote:
>
> > Hi,
> >
> > We are running on Amazon EMR based HBase 1.4.0 . We are currently seeing
> a
> > situation where sometimes a particular region gets into a situation
> where a
> > lot of write requests to any row in that region timeout saying they
> failed
> > to obtain a lock on a row in a region and eventually they experience an
> IPC
> > timeout. This causes the IPC queue to blow up in size as requests get
> > backed up, and that region server experiences a much higher than normal
> > timeout rate for all requests, not just those timing out for failing to
> > obtain the row lock.
> >
> > The strange thing is the rows are always different but the region is
> always
> > the same. So the question is, is there a region component to how long a
> row
> > write lock would be held? I looked at the debug dump and the RowLocks
> > section shows a long list of write row locks held, all of them are from
> the
> > same region but different rows.
> >
> > Will trying to obtain a write row lock experience delays if no one else
> > holds a lock on the same row but the region itself is experiencing read
> > delays? We do have an incremental compaction tool running that major
> > compacts one region per region server at a time, so that will drive out
> > pages from the bucket cache. But for most regions the impact is
> > transitional until the bucket cache gets populated by pages from the new
> > HFile. But for this one region we start timing out trying to obtain write
> > locks on rows in that region.
> >
> > Any insight anyone can provide would be most welcome.
> >
> > Cheers.
> >
> > 
> > Saad
> >
>


Re: How Long Will HBase Hold A Row Write Lock?

2018-02-28 Thread Saad Mufti
One additional data point, I tried to manually re-assign the region in
question from the shell, that for some reason caused the region server to
restart and the region did get assigned to another region server. But then
the problem moved to that region server almost immediately.

Does that just mean our write load is disproportionately hitting that one
region? We have a prefix scheme in place for all our keys where we prepend
an MD5 hash based 4 digit prefix to all keys to make sure we get good
randomization, so that would be surprising.

As usual any feedback would be appreciated.

Cheers.


Saad



On Wed, Feb 28, 2018 at 9:31 PM, Saad Mufti  wrote:

> Hi,
>
> We are running on Amazon EMR based HBase 1.4.0 . We are currently seeing a
> situation where sometimes a particular region gets into a situation where a
> lot of write requests to any row in that region timeout saying they failed
> to obtain a lock on a row in a region and eventually they experience an IPC
> timeout. This causes the IPC queue to blow up in size as requests get
> backed up, and that region server experiences a much higher than normal
> timeout rate for all requests, not just those timing out for failing to
> obtain the row lock.
>
> The strange thing is the rows are always different but the region is
> always the same. So the question is, is there a region component to how
> long a row write lock would be held? I looked at the debug dump and the
> RowLocks section shows a long list of write row locks held, all of them are
> from the same region but different rows.
>
> Will trying to obtain a write row lock experience delays if no one else
> holds a lock on the same row but the region itself is experiencing read
> delays? We do have an incremental compaction tool running that major
> compacts one region per region server at a time, so that will drive out
> pages from the bucket cache. But for most regions the impact is
> transitional until the bucket cache gets populated by pages from the new
> HFile. But for this one region we start timing out trying to obtain write
> locks on rows in that region.
>
> Any insight anyone can provide would be most welcome.
>
> Cheers.
>
> 
> Saad
>
>


Re: How Long Will HBase Hold A Row Write Lock?

2018-02-28 Thread Ted Yu
bq. timing out trying to obtain write locks on rows in that region.

Can you confirm that the region under contention was the one being major
compacted ?

Can you pastebin thread dump so that we can have better idea of the
scenario ?

For the region being compacted, how long would the compaction take (just
want to see if there was correlation between this duration and timeout) ?

Cheers

On Wed, Feb 28, 2018 at 6:31 PM, Saad Mufti  wrote:

> Hi,
>
> We are running on Amazon EMR based HBase 1.4.0 . We are currently seeing a
> situation where sometimes a particular region gets into a situation where a
> lot of write requests to any row in that region timeout saying they failed
> to obtain a lock on a row in a region and eventually they experience an IPC
> timeout. This causes the IPC queue to blow up in size as requests get
> backed up, and that region server experiences a much higher than normal
> timeout rate for all requests, not just those timing out for failing to
> obtain the row lock.
>
> The strange thing is the rows are always different but the region is always
> the same. So the question is, is there a region component to how long a row
> write lock would be held? I looked at the debug dump and the RowLocks
> section shows a long list of write row locks held, all of them are from the
> same region but different rows.
>
> Will trying to obtain a write row lock experience delays if no one else
> holds a lock on the same row but the region itself is experiencing read
> delays? We do have an incremental compaction tool running that major
> compacts one region per region server at a time, so that will drive out
> pages from the bucket cache. But for most regions the impact is
> transitional until the bucket cache gets populated by pages from the new
> HFile. But for this one region we start timing out trying to obtain write
> locks on rows in that region.
>
> Any insight anyone can provide would be most welcome.
>
> Cheers.
>
> 
> Saad
>


How Long Will HBase Hold A Row Write Lock?

2018-02-28 Thread Saad Mufti
Hi,

We are running on Amazon EMR based HBase 1.4.0 . We are currently seeing a
situation where sometimes a particular region gets into a situation where a
lot of write requests to any row in that region timeout saying they failed
to obtain a lock on a row in a region and eventually they experience an IPC
timeout. This causes the IPC queue to blow up in size as requests get
backed up, and that region server experiences a much higher than normal
timeout rate for all requests, not just those timing out for failing to
obtain the row lock.

The strange thing is the rows are always different but the region is always
the same. So the question is, is there a region component to how long a row
write lock would be held? I looked at the debug dump and the RowLocks
section shows a long list of write row locks held, all of them are from the
same region but different rows.

Will trying to obtain a write row lock experience delays if no one else
holds a lock on the same row but the region itself is experiencing read
delays? We do have an incremental compaction tool running that major
compacts one region per region server at a time, so that will drive out
pages from the bucket cache. But for most regions the impact is
transitional until the bucket cache gets populated by pages from the new
HFile. But for this one region we start timing out trying to obtain write
locks on rows in that region.

Any insight anyone can provide would be most welcome.

Cheers.


Saad