Have you run the following command in hbase shell ? balance_switch true Cheers
On Mon, Jul 6, 2015 at 12:16 PM, Akmal Abbasov <[email protected]> wrote: > > Do you see in the master log something similar to the following ? > > > > master.HMaster: Not running balancer because 1 region(s) in transition > yes, I have several of them, but all of them were 3 days ago. > > I check the ‘ritCount’ metric, and it is 0, also I checked the > /hbase/region-in-transition znode, which is also empty. > But I can’t start balancer manually. > > I took snapshot of tables each our. > I’ve checked the path > /hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot under in zookeeper, > and there > are ~4000 applications. It looks that all of them are create snapshot > operations. Also I’ve observed that the CPU > usage of the master is much higher that it was in the past. > Is it possible that all of this applications are causing the problem? > > Can I delete all of this applications? > > > > On 06 Jul 2015, at 18:45, Ted Yu <[email protected]> wrote: > > > > Do you see in the master log something similar to the following ? > > > > master.HMaster: Not running balancer because 1 region(s) in transition > > > > You can search backwards for balancer / assignment related logs. > > > > Cheers > > > > On Mon, Jul 6, 2015 at 8:49 AM, Akmal Abbasov <[email protected]> > > wrote: > > > >>> What error(s) did you get when trying to restart the region server ? > Have > >>> you checked its log files ? > >> it was a VM, and I was not able to access it any more, I can’t login to > >> it. Restarting several times didn’t helped. > >> > >> > >>> Can you check master log around this time ? If there was region in > >>> transition, balancer wouldn't balance. > >> I have a lot of this > >> 2015-07-06 15:15:39,918 INFO [snapshot-log-cleaner-cache-refresher] > >> util.FSVisitor: No logs under > >> > directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_18.14/WALs > >> 2015-07-06 15:15:39,918 INFO [snapshot-log-cleaner-cache-refresher] > >> util.FSVisitor: No logs under > >> > directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_19.14/WALs > >> 2015-07-06 15:15:39,921 INFO [snapshot-log-cleaner-cache-refresher] > >> util.FSVisitor: No logs under > >> > directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_20.13/WALs > >> 2015-07-06 15:15:39,925 INFO [snapshot-log-cleaner-cache-refresher] > >> util.FSVisitor: No logs under > >> > directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_21.14/WALs > >> 2015-07-06 15:15:39,926 INFO [snapshot-log-cleaner-cache-refresher] > >> util.FSVisitor: No logs under > >> > directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_22.14/WALs > >> 2015-07-06 15:15:39,927 INFO [snapshot-log-cleaner-cache-refresher] > >> util.FSVisitor: No logs under > >> > directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_23.14/WALs > >> 2015-07-06 15:15:39,928 INFO [snapshot-log-cleaner-cache-refresher] > >> util.FSVisitor: No logs under > >> directory:hdfs://test/hbase/.hbase-snapshot/testsnap/WALs > >> 2015-07-06 15:15:47,324 INFO [FifoRpcScheduler.handler1-thread-18] > >> master.HMaster: Client=hadoop//10.32.0.140 set balanceSwitch=false > >> 2015-07-06 15:23:31,265 DEBUG [master:hbase-m2:60000.oldLogCleaner] > >> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: > >> hbase-rs1%2C60020%2C1436189457794.1436190023718 > >> 2015-07-06 15:23:31,504 DEBUG [master:hbase-m2:60000.oldLogCleaner] > >> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: > >> hbase-rs1%2C60020%2C1436189457794.1436193624562 > >> 2015-07-06 15:32:49,382 INFO [FifoRpcScheduler.handler1-thread-14] > >> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false > >> 2015-07-06 15:32:56,936 INFO [FifoRpcScheduler.handler1-thread-1] > >> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false > >> > >> Thank you. > >> > >>> On 06 Jul 2015, at 17:37, Ted Yu <[email protected]> wrote: > >>> > >>> bq. I had to delete and recreate it > >>> > >>> What error(s) did you get when trying to restart the region server ? > Have > >>> you checked its log files ? > >>> > >>> bq. start balancer manually, but it returned false > >>> > >>> Can you check master log around this time ? If there was region in > >>> transition, balancer wouldn't balance. > >>> > >>> Cheers > >>> > >>> On Mon, Jul 6, 2015 at 8:29 AM, Akmal Abbasov < > [email protected]> > >>> wrote: > >>> > >>>> Hi all, > >>>> I have a strange behaviour in my HBase cluster. I have 5 rs and 2 > >> masters. > >>>> One of the rs stopped working, restart didn’t worked, and I had to > >> delete > >>>> and recreate it. > >>>> But when this rs have stopped, the cluster also stopped functioning. > >>>> There were a lot of inconsistencies. When I recreated the rs with > disks > >> of > >>>> the previous one, cluster started working. > >>>> But now, only 3 rs host the regions, other 2 have 0 regions. > >>>> I’ve tried to start balancer manually, but it returned false? > >>>> Any idea? > >>>> > >>>> I am using hbase hbase-0.98.7-hadoop2. > >>>> Thank you. > >>>> > >>>> Kind regards, > >>>> Akmal Abbasov > >>>> > >>>> > >> > >> > >
