bq. timing out trying to obtain write locks on rows in that region.

Can you confirm that the region under contention was the one being major
compacted ?

Can you pastebin thread dump so that we can have better idea of the
scenario ?

For the region being compacted, how long would the compaction take (just
want to see if there was correlation between this duration and timeout) ?

Cheers

On Wed, Feb 28, 2018 at 6:31 PM, Saad Mufti <saad.mu...@gmail.com> wrote:

> Hi,
>
> We are running on Amazon EMR based HBase 1.4.0 . We are currently seeing a
> situation where sometimes a particular region gets into a situation where a
> lot of write requests to any row in that region timeout saying they failed
> to obtain a lock on a row in a region and eventually they experience an IPC
> timeout. This causes the IPC queue to blow up in size as requests get
> backed up, and that region server experiences a much higher than normal
> timeout rate for all requests, not just those timing out for failing to
> obtain the row lock.
>
> The strange thing is the rows are always different but the region is always
> the same. So the question is, is there a region component to how long a row
> write lock would be held? I looked at the debug dump and the RowLocks
> section shows a long list of write row locks held, all of them are from the
> same region but different rows.
>
> Will trying to obtain a write row lock experience delays if no one else
> holds a lock on the same row but the region itself is experiencing read
> delays? We do have an incremental compaction tool running that major
> compacts one region per region server at a time, so that will drive out
> pages from the bucket cache. But for most regions the impact is
> transitional until the bucket cache gets populated by pages from the new
> HFile. But for this one region we start timing out trying to obtain write
> locks on rows in that region.
>
> Any insight anyone can provide would be most welcome.
>
> Cheers.
>
> ----
> Saad
>

Reply via email to