Re: Potential causes for very slow DELETEs?

Pedro Boado Sat, 19 Aug 2017 16:06:44 -0700

Thanks Sergey for your response, it helped a lot indeed.

It looks like the problem is located in the RS that keep SYSTEM.CATALOG. In
this RS all RPC handlers but one are in WAIT status waiting for a row lock
( see stacktrace below ).


So I think that basically all deletes are being processed in sequence. I
also noticed around 400 threads with name htable-poolXXXXX-t1 waiting on
condition in the same RS -- all other RS are normal - . This threads
dissapear when the delete process finishes.

This problem started happening after an HBase cluster reboot, but... why?

"B.defaultRpcServer.handler=0,queue=0,port=60020" daemon prio=10
tid=0x00007f142e96c000 nid=0x68bb waiting on condition [0x00007f0ba7289000]
   java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00007f102be9b3a0> (a
java.util.concurrent.CountDownLatch$Sync)
at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(AbstractQueuedSynchronizer.java:1033)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:282)
at
org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:4830)
at
org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:4800)
at
org.apache.hadoop.hbase.regionserver.HRegion.getRowLock(HRegion.java:4853)
at
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2386)
at
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.doGetTable(MetaDataEndpointImpl.java:2354)
at
org.apache.phoenix.coprocessor.MetaDataEndpointImpl.getTable(MetaDataEndpointImpl.java:436)
at
org.apache.phoenix.coprocessor.generated.MetaDataProtos$MetaDataService.callMethod(MetaDataProtos.java:11609)
at
org.apache.hadoop.hbase.regionserver.HRegion.execService(HRegion.java:7395)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.execServiceOnRegion(RSRpcServices.java:1776)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices.execService(RSRpcServices.java:1758)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32209)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2034)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:107)
at
org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)



Many thanks!!
Pedro


On 18 August 2017 at 21:15, Sergey Soldatov <sergeysolda...@gmail.com>
wrote:

> Hi Pedro,
>
> Usually that kind of behavior should be reflected in the region server
> logs. Try to turn DEBUG level and check what exactly RS is doing during
> that time. Also you may check the thread dump of RS during the execution
> and see what are rpc handlers are doing. One thing that should be checked
> first - the RPC handlers. If they are all busy you may consider to increase
> the number of handlers. If you have RPC scheduler and controller
> configured, double check that regular handlers are used, but not IndexRPC
> (there was a bug that client is sending all rpc with index priority). If
> you see it, remove controller factory property on client side.
>
> Thanks,
> Sergey
>
> On Fri, Aug 18, 2017 at 4:46 AM, Pedro Boado <pedro.bo...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> We have two HBase 1.0 clusters running the same process in parallel
>> -effectively keeps the same data in both Phoenix tables-
>>
>> This process feeds data into Phoenix 4.5 via HFile and once the data is
>> loaded a Spark process deletes a few thousand rows from the tables
>> -secondary indexing is disabled in our installation- .
>>
>> After an HBase restart -no config changes involved-, one of the clusters
>> have started running these deletes too slowly (the fast run is taking 5min
>> and the slow one around 1h). And more worryingly while the process is
>> running Phoenix queries are taking hundreds of seconds instead of being sub
>> second (even opening sqlline is very slow).
>>
>> We've almost run out of ideas trying to find the cause of this behaviour.
>> There are no evident GC pauses, CPU usage,  Hdfs IO is normal, Memory usage
>> is normal, etc.
>>
>> As soon as the delete process finishes Phoenix goes back to normal
>> behaviour.
>>
>> Does anybody have any ideas for potential causes of this behaviour?
>>
>> Many thanks!!
>>
>> Pedro.
>>
>
>


-- 
Un saludo.
Pedro Boado.

Re: Potential causes for very slow DELETEs?

Reply via email to