Re: Region state is PENDING_CLOSE persists.

2023-08-06 Thread Reid Chan
Please apply this patch: https://issues.apache.org/jira/browse/HBASE-24099,
if you couldn't do any version upgrade

After that, you can tune *hbase.regionserver.executor.closeregion.threads*
and *hbase.regionserver.executor.openregion.threads* to speed up close/open
regions.

---

Best Regards,
R.C


On Sun, Aug 6, 2023 at 12:11 PM Manimekalai Kunjithapatham <
k.manimeka...@gmail.com> wrote:

> Dear Team,
>
> In one of the Hbase Cluster, occasionally the some of the region has been
> stuck in PENDING_CLOSE state for long time. After that I need to restart
> particular region server holding region and the only it resolves.
>
> The cluster has write loaded as this cluster which receives replication
> from another cluster.
>
> The HBase version is 1.2.6.
>
> Please help to solve this issue
>
> Below is the thread dump
>
> regionserver//10.x.x.x:16020-shortCompactions-1691219096144" daemon prio=10
> tid=0x7f07b06a8000 nid=0x38ff9 runnable [0x7f0741f21000]
>
>java.lang.Thread.State: RUNNABLE
>
> at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>
> at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>
> at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
>
> at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>
> - locked <0x0005418009f8> (a sun.nio.ch.Util$2)
>
> - locked <0x000541800a08> (a
> java.util.Collections$UnmodifiableSet)
>
> - locked <0x0005418009b0> (a sun.nio.ch.EPollSelectorImpl)
>
> at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
>
> at
> org.apache.hadoop.net
> .SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
>
> at
> org.apache.hadoop.net
> .SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
>
> at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>
> at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:258)
>
> at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:209)
>
> at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:171)
>
> at
>
> org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
>
> at
>
> org.apache.hadoop.hdfs.RemoteBlockReader2.readNextPacket(RemoteBlockReader2.java:186)
>
> at
> org.apache.hadoop.hdfs.RemoteBlockReader2.read(RemoteBlockReader2.java:146)
>
> - locked <0x0004c986b3c0> (a
> org.apache.hadoop.hdfs.RemoteBlockReader2)
>
> at
>
> org.apache.hadoop.hdfs.DFSInputStream$ByteArrayStrategy.doRead(DFSInputStream.java:686)
>
> at
> org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:742)
>
> - eliminated <0x0004c986b358> (a
> org.apache.hadoop.hdfs.DFSInputStream)
>
> at
>
> org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:799)
>
> at
> org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:840)
>
> - locked <0x0004c986b358> (a
> org.apache.hadoop.hdfs.DFSInputStream)
>
> at java.io.DataInputStream.read(DataInputStream.java:149)
>
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileBlock.readWithExtra(HFileBlock.java:709)
>
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1440)
>
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1648)
>
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1532)
>
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileReaderV2.readBlock(HFileReaderV2.java:452)
>
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileReaderV2$AbstractScannerV2.readNextDataBlock(HFileReaderV2.java:729)
>
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileReaderV2$ScannerV2.isNextBlock(HFileReaderV2.java:854)
>
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileReaderV2$ScannerV2.positionForNextBlock(HFileReaderV2.java:849)
>
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileReaderV2$ScannerV2._next(HFileReaderV2.java:866)
>
> at
> org.apache.hadoop.hbase.io
> .hfile.HFileReaderV2$ScannerV2.next(HFileReaderV2.java:886)
>
> at
>
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.next(StoreFileScanner.java:154)
>
> at
>
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:111)
>
> at
>
> org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:588)
>
> at
>
> org.apache.hadoop.hbase.regionserver.compactions.Compactor.performCompaction(Compactor.java:318)
>
> at
>
> org.apache.hadoop.hbase.regionserver.compactions.DefaultCompactor.compact(DefaultCompactor.java:111)
>
> at
>
> 

Re: Region state is PENDING_CLOSE persists.

2018-07-13 Thread Kang Minwoo
Thank you for reply.

I found one handler thread state is RUNNABLE.
Other handler thread state is TIMED_WAITING.

I think the RUNNABLE handler thread is issue.



[Thread dump]
"RpcServer"
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:2020)
at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:208)
at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:184)
at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap$KVScannerComparator.compare(KeyValueHeap.java:174)
at 
java.util.PriorityQueue.siftDownUsingComparator(PriorityQueue.java:721)
at java.util.PriorityQueue.siftDown(PriorityQueue.java:687)
at java.util.PriorityQueue.poll(PriorityQueue.java:595)
at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.pollRealKV(KeyValueHeap.java:355)
at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:123)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:588)
at 
org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:150)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5733)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5896)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5670)
at 
org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2580)
- locked <> (a 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl)
at 
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33648)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2196)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
at 
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
at java.lang.Thread.run(Thread.java:748)
   Locked ownable synchronizers:
- <> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)



I don't know why that handler running is running too long.
I will be more research on how to avoid this problem.

Best regards,
Minwoo Kang


보낸 사람: Allan Yang 
보낸 날짜: 2018년 7월 11일 수요일 18:54
받는 사람: user@hbase.apache.org
제목: Re: Region state is PENDING_CLOSE persists.

There must be a handler thread is running (or stuck) somewhere, so the
close region thread can't obtain the write lock. You can look closely in
your thread dump.
The handler thread you pasted above it is just a thread can't obtain the
read lock since the close thread is trying write lock.

Best Regards
Allan Yang


Kang Minwoo  于2018年7月11日周三 下午2:25写道:

> Hello.
>
> Occasionally, when closing a region, the RS_CLOSE_REGION thread is unable
> to acquire a lock and is still in the WAITING.
> (These days, the cluster load increase.)
> So the Region state is PENDING_CLOSE persists.
> The thread holding the lock is the RPC handler.
>
> If you have any good tips on moving regions, please share them.
> It would be nice if the timeout could be set.
>
> The HBase version is 1.2.6.
>
> Best regards,
> Minwoo Kang
>
> 
>
> [thread dump]
> "RS_CLOSE_REGION" waiting on condition [abc]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for   (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1426)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1372)
> - locked  (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>

Re: Region state is PENDING_CLOSE persists.

2018-07-11 Thread Allan Yang
There must be a handler thread is running (or stuck) somewhere, so the
close region thread can't obtain the write lock. You can look closely in
your thread dump.
The handler thread you pasted above it is just a thread can't obtain the
read lock since the close thread is trying write lock.

Best Regards
Allan Yang


Kang Minwoo  于2018年7月11日周三 下午2:25写道:

> Hello.
>
> Occasionally, when closing a region, the RS_CLOSE_REGION thread is unable
> to acquire a lock and is still in the WAITING.
> (These days, the cluster load increase.)
> So the Region state is PENDING_CLOSE persists.
> The thread holding the lock is the RPC handler.
>
> If you have any good tips on moving regions, please share them.
> It would be nice if the timeout could be set.
>
> The HBase version is 1.2.6.
>
> Best regards,
> Minwoo Kang
>
> 
>
> [thread dump]
> "RS_CLOSE_REGION" waiting on condition [abc]
>java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for   (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1426)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1372)
> - locked  (a java.lang.Object)
> at
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
> at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
> -  (a java.util.concurrent.ThreadPoolExecutor$Worker)
>
> "RpcServer.handler" waiting on condition [bcd]
>java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for   (a
> java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
> at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1037)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1328)
> at
> java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.tryLock(ReentrantReadWriteLock.java:871)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:8177)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.lock(HRegion.java:8164)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:8073)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2547)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:2541)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6830)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:6809)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2049)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:33644)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2196)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:112)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:133)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:108)
> at java.lang.Thread.run(Thread.java:748)
>Locked ownable synchronizers:
> - None
>
>
>