> Have you run the following command in hbase shell ? > balance_switch true I’ve tried, and this did the trick. Thank you.
One more thing is not clear for me is what I can do with ~4000 znodes in /hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot What will happen with them if I’ll do nothing, will the system try to complete all of these applications? Thank you. > On 07 Jul 2015, at 00:16, Ted Yu <[email protected]> wrote: > > Have you run the following command in hbase shell ? > balance_switch true > > Cheers > > On Mon, Jul 6, 2015 at 12:16 PM, Akmal Abbasov <[email protected]> > wrote: > >>> Do you see in the master log something similar to the following ? >>> >>> master.HMaster: Not running balancer because 1 region(s) in transition >> yes, I have several of them, but all of them were 3 days ago. >> >> I check the ‘ritCount’ metric, and it is 0, also I checked the >> /hbase/region-in-transition znode, which is also empty. >> But I can’t start balancer manually. >> >> I took snapshot of tables each our. >> I’ve checked the path >> /hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot under in zookeeper, >> and there >> are ~4000 applications. It looks that all of them are create snapshot >> operations. Also I’ve observed that the CPU >> usage of the master is much higher that it was in the past. >> Is it possible that all of this applications are causing the problem? >> >> Can I delete all of this applications? >> >> >>> On 06 Jul 2015, at 18:45, Ted Yu <[email protected]> wrote: >>> >>> Do you see in the master log something similar to the following ? >>> >>> master.HMaster: Not running balancer because 1 region(s) in transition >>> >>> You can search backwards for balancer / assignment related logs. >>> >>> Cheers >>> >>> On Mon, Jul 6, 2015 at 8:49 AM, Akmal Abbasov <[email protected]> >>> wrote: >>> >>>>> What error(s) did you get when trying to restart the region server ? >> Have >>>>> you checked its log files ? >>>> it was a VM, and I was not able to access it any more, I can’t login to >>>> it. Restarting several times didn’t helped. >>>> >>>> >>>>> Can you check master log around this time ? If there was region in >>>>> transition, balancer wouldn't balance. >>>> I have a lot of this >>>> 2015-07-06 15:15:39,918 INFO [snapshot-log-cleaner-cache-refresher] >>>> util.FSVisitor: No logs under >>>> >> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_18.14/WALs >>>> 2015-07-06 15:15:39,918 INFO [snapshot-log-cleaner-cache-refresher] >>>> util.FSVisitor: No logs under >>>> >> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_19.14/WALs >>>> 2015-07-06 15:15:39,921 INFO [snapshot-log-cleaner-cache-refresher] >>>> util.FSVisitor: No logs under >>>> >> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_20.13/WALs >>>> 2015-07-06 15:15:39,925 INFO [snapshot-log-cleaner-cache-refresher] >>>> util.FSVisitor: No logs under >>>> >> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_21.14/WALs >>>> 2015-07-06 15:15:39,926 INFO [snapshot-log-cleaner-cache-refresher] >>>> util.FSVisitor: No logs under >>>> >> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_22.14/WALs >>>> 2015-07-06 15:15:39,927 INFO [snapshot-log-cleaner-cache-refresher] >>>> util.FSVisitor: No logs under >>>> >> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_23.14/WALs >>>> 2015-07-06 15:15:39,928 INFO [snapshot-log-cleaner-cache-refresher] >>>> util.FSVisitor: No logs under >>>> directory:hdfs://test/hbase/.hbase-snapshot/testsnap/WALs >>>> 2015-07-06 15:15:47,324 INFO [FifoRpcScheduler.handler1-thread-18] >>>> master.HMaster: Client=hadoop//10.32.0.140 set balanceSwitch=false >>>> 2015-07-06 15:23:31,265 DEBUG [master:hbase-m2:60000.oldLogCleaner] >>>> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: >>>> hbase-rs1%2C60020%2C1436189457794.1436190023718 >>>> 2015-07-06 15:23:31,504 DEBUG [master:hbase-m2:60000.oldLogCleaner] >>>> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: >>>> hbase-rs1%2C60020%2C1436189457794.1436193624562 >>>> 2015-07-06 15:32:49,382 INFO [FifoRpcScheduler.handler1-thread-14] >>>> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false >>>> 2015-07-06 15:32:56,936 INFO [FifoRpcScheduler.handler1-thread-1] >>>> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false >>>> >>>> Thank you. >>>> >>>>> On 06 Jul 2015, at 17:37, Ted Yu <[email protected]> wrote: >>>>> >>>>> bq. I had to delete and recreate it >>>>> >>>>> What error(s) did you get when trying to restart the region server ? >> Have >>>>> you checked its log files ? >>>>> >>>>> bq. start balancer manually, but it returned false >>>>> >>>>> Can you check master log around this time ? If there was region in >>>>> transition, balancer wouldn't balance. >>>>> >>>>> Cheers >>>>> >>>>> On Mon, Jul 6, 2015 at 8:29 AM, Akmal Abbasov < >> [email protected]> >>>>> wrote: >>>>> >>>>>> Hi all, >>>>>> I have a strange behaviour in my HBase cluster. I have 5 rs and 2 >>>> masters. >>>>>> One of the rs stopped working, restart didn’t worked, and I had to >>>> delete >>>>>> and recreate it. >>>>>> But when this rs have stopped, the cluster also stopped functioning. >>>>>> There were a lot of inconsistencies. When I recreated the rs with >> disks >>>> of >>>>>> the previous one, cluster started working. >>>>>> But now, only 3 rs host the regions, other 2 have 0 regions. >>>>>> I’ve tried to start balancer manually, but it returned false? >>>>>> Any idea? >>>>>> >>>>>> I am using hbase hbase-0.98.7-hadoop2. >>>>>> Thank you. >>>>>> >>>>>> Kind regards, >>>>>> Akmal Abbasov >>>>>> >>>>>> >>>> >>>> >> >>
