Re: HBase strange behaviour

Ted Yu Tue, 07 Jul 2015 00:51:27 -0700

These znodes seemed to be related to YARN, not HBase. 

Maybe ask on yarn user mailing list ?


Cheers



On Jul 7, 2015, at 12:05 AM, Akmal Abbasov <[email protected]> wrote:

>> Have you run the following command in hbase shell ?
>> balance_switch true
> I’ve tried, and this did the trick. Thank you.
> 
> One more thing is not clear for me is what I can do with ~4000 znodes in 
> /hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot
> What will happen with them if I’ll do nothing, will the system try to 
> complete all of these applications?
> 
> Thank you.
> 
> 
>> On 07 Jul 2015, at 00:16, Ted Yu <[email protected]> wrote:
>> 
>> Have you run the following command in hbase shell ?
>> balance_switch true
>> 
>> Cheers
>> 
>> On Mon, Jul 6, 2015 at 12:16 PM, Akmal Abbasov <[email protected]>
>> wrote:
>> 
>>>> Do you see in the master log something similar to the following ?
>>>> 
>>>> master.HMaster: Not running balancer because 1 region(s) in transition
>>> yes, I have several of them, but all of them were 3 days ago.
>>> 
>>> I check the ‘ritCount’ metric, and it is 0, also I checked the
>>> /hbase/region-in-transition znode, which is also empty.
>>> But I can’t start balancer manually.
>>> 
>>> I took snapshot of tables each our.
>>> I’ve checked the path
>>> /hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot under in zookeeper,
>>> and there
>>> are ~4000 applications. It looks that all of them are create snapshot
>>> operations. Also I’ve observed that the CPU
>>> usage of the master is much higher that it was in the past.
>>> Is it possible that all of this applications are causing the problem?
>>> 
>>> Can I delete all of this applications?
>>> 
>>> 
>>>> On 06 Jul 2015, at 18:45, Ted Yu <[email protected]> wrote:
>>>> 
>>>> Do you see in the master log something similar to the following ?
>>>> 
>>>> master.HMaster: Not running balancer because 1 region(s) in transition
>>>> 
>>>> You can search backwards for balancer / assignment related logs.
>>>> 
>>>> Cheers
>>>> 
>>>> On Mon, Jul 6, 2015 at 8:49 AM, Akmal Abbasov <[email protected]>
>>>> wrote:
>>>> 
>>>>>> What error(s) did you get when trying to restart the region server ?
>>> Have
>>>>>> you checked its log files ?
>>>>> it was a VM, and I was not able to access it any more, I can’t login to
>>>>> it. Restarting several times didn’t helped.
>>>>> 
>>>>> 
>>>>>> Can you check master log around this time ? If there was region in
>>>>>> transition, balancer wouldn't balance.
>>>>> I have a lot of this
>>>>> 2015-07-06 15:15:39,918 INFO  [snapshot-log-cleaner-cache-refresher]
>>>>> util.FSVisitor: No logs under
>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_18.14/WALs
>>>>> 2015-07-06 15:15:39,918 INFO  [snapshot-log-cleaner-cache-refresher]
>>>>> util.FSVisitor: No logs under
>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_19.14/WALs
>>>>> 2015-07-06 15:15:39,921 INFO  [snapshot-log-cleaner-cache-refresher]
>>>>> util.FSVisitor: No logs under
>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_20.13/WALs
>>>>> 2015-07-06 15:15:39,925 INFO  [snapshot-log-cleaner-cache-refresher]
>>>>> util.FSVisitor: No logs under
>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_21.14/WALs
>>>>> 2015-07-06 15:15:39,926 INFO  [snapshot-log-cleaner-cache-refresher]
>>>>> util.FSVisitor: No logs under
>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_22.14/WALs
>>>>> 2015-07-06 15:15:39,927 INFO  [snapshot-log-cleaner-cache-refresher]
>>>>> util.FSVisitor: No logs under
>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_23.14/WALs
>>>>> 2015-07-06 15:15:39,928 INFO  [snapshot-log-cleaner-cache-refresher]
>>>>> util.FSVisitor: No logs under
>>>>> directory:hdfs://test/hbase/.hbase-snapshot/testsnap/WALs
>>>>> 2015-07-06 15:15:47,324 INFO  [FifoRpcScheduler.handler1-thread-18]
>>>>> master.HMaster: Client=hadoop//10.32.0.140 set balanceSwitch=false
>>>>> 2015-07-06 15:23:31,265 DEBUG [master:hbase-m2:60000.oldLogCleaner]
>>>>> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
>>>>> hbase-rs1%2C60020%2C1436189457794.1436190023718
>>>>> 2015-07-06 15:23:31,504 DEBUG [master:hbase-m2:60000.oldLogCleaner]
>>>>> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
>>>>> hbase-rs1%2C60020%2C1436189457794.1436193624562
>>>>> 2015-07-06 15:32:49,382 INFO  [FifoRpcScheduler.handler1-thread-14]
>>>>> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false
>>>>> 2015-07-06 15:32:56,936 INFO  [FifoRpcScheduler.handler1-thread-1]
>>>>> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false
>>>>> 
>>>>> Thank you.
>>>>> 
>>>>>> On 06 Jul 2015, at 17:37, Ted Yu <[email protected]> wrote:
>>>>>> 
>>>>>> bq. I had to delete and recreate it
>>>>>> 
>>>>>> What error(s) did you get when trying to restart the region server ?
>>> Have
>>>>>> you checked its log files ?
>>>>>> 
>>>>>> bq. start balancer manually, but it returned false
>>>>>> 
>>>>>> Can you check master log around this time ? If there was region in
>>>>>> transition, balancer wouldn't balance.
>>>>>> 
>>>>>> Cheers
>>>>>> 
>>>>>> On Mon, Jul 6, 2015 at 8:29 AM, Akmal Abbasov <
>>> [email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hi all,
>>>>>>> I have a strange behaviour in my HBase cluster. I have 5 rs and 2
>>>>> masters.
>>>>>>> One of the rs stopped working, restart didn’t worked, and I had to
>>>>> delete
>>>>>>> and recreate it.
>>>>>>> But when this rs have stopped, the cluster also stopped functioning.
>>>>>>> There were a lot of inconsistencies. When I recreated the rs with
>>> disks
>>>>> of
>>>>>>> the previous one, cluster started working.
>>>>>>> But now, only 3 rs host the regions, other 2 have 0 regions.
>>>>>>> I’ve tried to start balancer manually, but it returned false?
>>>>>>> Any idea?
>>>>>>> 
>>>>>>> I am using hbase hbase-0.98.7-hadoop2.
>>>>>>> Thank you.
>>>>>>> 
>>>>>>> Kind regards,
>>>>>>> Akmal Abbasov
>

Re: HBase strange behaviour

Reply via email to