Re: How Long Will HBase Hold A Row Write Lock?

2018-03-11 Thread Saad Mufti
Thanks. I left a comment on that ticket. Saad On Sat, Mar 10, 2018 at 11:57 PM, Anoop John wrote: > Hi Saad >In your initial mail you mentioned that there are lots > of checkAndPut ops but on different rows. The failure in obtaining > locks (write

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Anoop John
Hi Saad In your initial mail you mentioned that there are lots of checkAndPut ops but on different rows. The failure in obtaining locks (write lock as it is checkAndPut) means there is contention on the same row key. If that is the case , ya that is the 1st step before BC reads and

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Saad Mufti
Although now that I think about this a bit more, all the failures we saw were failure to obtain a row lock, and in the thread stack traces we always saw it somewhere inside getRowLockInternal and similar. Never saw any contention on bucket cache lock that I could see. Cheers. Saad On Sat,

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Saad Mufti
Also, for now we have mitigated this problem by using the new setting in HBase 1.4.0 that prevents one slow region server from blocking all client requests. Of course it causes some timeouts but our overall ecosystem contains Kafka queues for retries, so we can live with that. From what I can see,

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Saad Mufti
So if I understand correctly, we would mitigate the problem by not evicting blocks for archived files immediately? Wouldn't this potentially lead to problems later if the LRU algo chooses to evict blocks for active files and leave blocks for archived files in there? I would definitely love to

How Long Will HBase Hold A Row Write Lock?

2018-03-06 Thread Anoop John
>>a) it was indeed one of the regions that was being compacted, major compaction in one case, minor compaction in another, the issue started just after compaction completed blowing away bucket cached blocks for the older HFile's About this part.Ya after the compaction, there is a step where

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-04 Thread ramkrishna vasudevan
Hi Saad Your argument here >> The >>theory is that since prefetch is an async operation, a lot of the reads in >>the checkAndPut for the region in question start reading from S3 which is >>slow. So the write lock obtained for the checkAndPut is held for a longer >>duration than normal. This has

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-01 Thread Saad Mufti
So after much investigation I can confirm: a) it was indeed one of the regions that was being compacted, major compaction in one case, minor compaction in another, the issue started just after compaction completed blowing away bucket cached blocks for the older HFile's b) in another case there

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-01 Thread Saad Mufti
Actually it happened again while some minior compactions were running, so don't think it related to our major compaction tool, which isn't even running right now. I will try to capture a debug dump of threads and everything while the event is ongoing. Seems to last at least half an hour or so and

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-01 Thread Saad Mufti
Unfortunately I lost the stack trace overnight. But it does seem related to compaction, because now that the compaction tool is done, I don't see the issue anymore. I will run our incremental major compaction tool again and see if I can reproduce the issue. On the plus side the system stayed

Re: How Long Will HBase Hold A Row Write Lock?

2018-02-28 Thread Saad Mufti
I'll paste a thread dump later, writing this from my phone :-) So the same issue has happened at different times for different regions, but I couldn't see that the region in question was the one being compacted, either this time or earlier. Although I might have missed an earlier correlation in

Re: How Long Will HBase Hold A Row Write Lock?

2018-02-28 Thread Saad Mufti
One additional data point, I tried to manually re-assign the region in question from the shell, that for some reason caused the region server to restart and the region did get assigned to another region server. But then the problem moved to that region server almost immediately. Does that just

Re: How Long Will HBase Hold A Row Write Lock?

2018-02-28 Thread Ted Yu
bq. timing out trying to obtain write locks on rows in that region. Can you confirm that the region under contention was the one being major compacted ? Can you pastebin thread dump so that we can have better idea of the scenario ? For the region being compacted, how long would the compaction

How Long Will HBase Hold A Row Write Lock?

2018-02-28 Thread Saad Mufti
Hi, We are running on Amazon EMR based HBase 1.4.0 . We are currently seeing a situation where sometimes a particular region gets into a situation where a lot of write requests to any row in that region timeout saying they failed to obtain a lock on a row in a region and eventually they