These znodes seemed to be related to YARN, not HBase. Maybe ask on yarn user mailing list ?
Cheers On Jul 7, 2015, at 12:05 AM, Akmal Abbasov <[email protected]> wrote: >> Have you run the following command in hbase shell ? >> balance_switch true > I’ve tried, and this did the trick. Thank you. > > One more thing is not clear for me is what I can do with ~4000 znodes in > /hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot > What will happen with them if I’ll do nothing, will the system try to > complete all of these applications? > > Thank you. > > >> On 07 Jul 2015, at 00:16, Ted Yu <[email protected]> wrote: >> >> Have you run the following command in hbase shell ? >> balance_switch true >> >> Cheers >> >> On Mon, Jul 6, 2015 at 12:16 PM, Akmal Abbasov <[email protected]> >> wrote: >> >>>> Do you see in the master log something similar to the following ? >>>> >>>> master.HMaster: Not running balancer because 1 region(s) in transition >>> yes, I have several of them, but all of them were 3 days ago. >>> >>> I check the ‘ritCount’ metric, and it is 0, also I checked the >>> /hbase/region-in-transition znode, which is also empty. >>> But I can’t start balancer manually. >>> >>> I took snapshot of tables each our. >>> I’ve checked the path >>> /hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot under in zookeeper, >>> and there >>> are ~4000 applications. It looks that all of them are create snapshot >>> operations. Also I’ve observed that the CPU >>> usage of the master is much higher that it was in the past. >>> Is it possible that all of this applications are causing the problem? >>> >>> Can I delete all of this applications? >>> >>> >>>> On 06 Jul 2015, at 18:45, Ted Yu <[email protected]> wrote: >>>> >>>> Do you see in the master log something similar to the following ? >>>> >>>> master.HMaster: Not running balancer because 1 region(s) in transition >>>> >>>> You can search backwards for balancer / assignment related logs. >>>> >>>> Cheers >>>> >>>> On Mon, Jul 6, 2015 at 8:49 AM, Akmal Abbasov <[email protected]> >>>> wrote: >>>> >>>>>> What error(s) did you get when trying to restart the region server ? >>> Have >>>>>> you checked its log files ? >>>>> it was a VM, and I was not able to access it any more, I can’t login to >>>>> it. Restarting several times didn’t helped. >>>>> >>>>> >>>>>> Can you check master log around this time ? If there was region in >>>>>> transition, balancer wouldn't balance. >>>>> I have a lot of this >>>>> 2015-07-06 15:15:39,918 INFO [snapshot-log-cleaner-cache-refresher] >>>>> util.FSVisitor: No logs under >>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_18.14/WALs >>>>> 2015-07-06 15:15:39,918 INFO [snapshot-log-cleaner-cache-refresher] >>>>> util.FSVisitor: No logs under >>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_19.14/WALs >>>>> 2015-07-06 15:15:39,921 INFO [snapshot-log-cleaner-cache-refresher] >>>>> util.FSVisitor: No logs under >>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_20.13/WALs >>>>> 2015-07-06 15:15:39,925 INFO [snapshot-log-cleaner-cache-refresher] >>>>> util.FSVisitor: No logs under >>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_21.14/WALs >>>>> 2015-07-06 15:15:39,926 INFO [snapshot-log-cleaner-cache-refresher] >>>>> util.FSVisitor: No logs under >>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_22.14/WALs >>>>> 2015-07-06 15:15:39,927 INFO [snapshot-log-cleaner-cache-refresher] >>>>> util.FSVisitor: No logs under >>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_23.14/WALs >>>>> 2015-07-06 15:15:39,928 INFO [snapshot-log-cleaner-cache-refresher] >>>>> util.FSVisitor: No logs under >>>>> directory:hdfs://test/hbase/.hbase-snapshot/testsnap/WALs >>>>> 2015-07-06 15:15:47,324 INFO [FifoRpcScheduler.handler1-thread-18] >>>>> master.HMaster: Client=hadoop//10.32.0.140 set balanceSwitch=false >>>>> 2015-07-06 15:23:31,265 DEBUG [master:hbase-m2:60000.oldLogCleaner] >>>>> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: >>>>> hbase-rs1%2C60020%2C1436189457794.1436190023718 >>>>> 2015-07-06 15:23:31,504 DEBUG [master:hbase-m2:60000.oldLogCleaner] >>>>> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: >>>>> hbase-rs1%2C60020%2C1436189457794.1436193624562 >>>>> 2015-07-06 15:32:49,382 INFO [FifoRpcScheduler.handler1-thread-14] >>>>> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false >>>>> 2015-07-06 15:32:56,936 INFO [FifoRpcScheduler.handler1-thread-1] >>>>> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false >>>>> >>>>> Thank you. >>>>> >>>>>> On 06 Jul 2015, at 17:37, Ted Yu <[email protected]> wrote: >>>>>> >>>>>> bq. I had to delete and recreate it >>>>>> >>>>>> What error(s) did you get when trying to restart the region server ? >>> Have >>>>>> you checked its log files ? >>>>>> >>>>>> bq. start balancer manually, but it returned false >>>>>> >>>>>> Can you check master log around this time ? If there was region in >>>>>> transition, balancer wouldn't balance. >>>>>> >>>>>> Cheers >>>>>> >>>>>> On Mon, Jul 6, 2015 at 8:29 AM, Akmal Abbasov < >>> [email protected]> >>>>>> wrote: >>>>>> >>>>>>> Hi all, >>>>>>> I have a strange behaviour in my HBase cluster. I have 5 rs and 2 >>>>> masters. >>>>>>> One of the rs stopped working, restart didn’t worked, and I had to >>>>> delete >>>>>>> and recreate it. >>>>>>> But when this rs have stopped, the cluster also stopped functioning. >>>>>>> There were a lot of inconsistencies. When I recreated the rs with >>> disks >>>>> of >>>>>>> the previous one, cluster started working. >>>>>>> But now, only 3 rs host the regions, other 2 have 0 regions. >>>>>>> I’ve tried to start balancer manually, but it returned false? >>>>>>> Any idea? >>>>>>> >>>>>>> I am using hbase hbase-0.98.7-hadoop2. >>>>>>> Thank you. >>>>>>> >>>>>>> Kind regards, >>>>>>> Akmal Abbasov >
