> These znodes seemed to be related to YARN, not HBase. > > Maybe ask on yarn user mailing list ? Right. Thank you.
> On 07 Jul 2015, at 09:50, Ted Yu <[email protected]> wrote: > > These znodes seemed to be related to YARN, not HBase. > > Maybe ask on yarn user mailing list ? > > Cheers > > > > On Jul 7, 2015, at 12:05 AM, Akmal Abbasov <[email protected]> wrote: > >>> Have you run the following command in hbase shell ? >>> balance_switch true >> I’ve tried, and this did the trick. Thank you. >> >> One more thing is not clear for me is what I can do with ~4000 znodes in >> /hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot >> What will happen with them if I’ll do nothing, will the system try to >> complete all of these applications? >> >> Thank you. >> >> >>> On 07 Jul 2015, at 00:16, Ted Yu <[email protected]> wrote: >>> >>> Have you run the following command in hbase shell ? >>> balance_switch true >>> >>> Cheers >>> >>> On Mon, Jul 6, 2015 at 12:16 PM, Akmal Abbasov <[email protected]> >>> wrote: >>> >>>>> Do you see in the master log something similar to the following ? >>>>> >>>>> master.HMaster: Not running balancer because 1 region(s) in transition >>>> yes, I have several of them, but all of them were 3 days ago. >>>> >>>> I check the ‘ritCount’ metric, and it is 0, also I checked the >>>> /hbase/region-in-transition znode, which is also empty. >>>> But I can’t start balancer manually. >>>> >>>> I took snapshot of tables each our. >>>> I’ve checked the path >>>> /hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot under in zookeeper, >>>> and there >>>> are ~4000 applications. It looks that all of them are create snapshot >>>> operations. Also I’ve observed that the CPU >>>> usage of the master is much higher that it was in the past. >>>> Is it possible that all of this applications are causing the problem? >>>> >>>> Can I delete all of this applications? >>>> >>>> >>>>> On 06 Jul 2015, at 18:45, Ted Yu <[email protected]> wrote: >>>>> >>>>> Do you see in the master log something similar to the following ? >>>>> >>>>> master.HMaster: Not running balancer because 1 region(s) in transition >>>>> >>>>> You can search backwards for balancer / assignment related logs. >>>>> >>>>> Cheers >>>>> >>>>> On Mon, Jul 6, 2015 at 8:49 AM, Akmal Abbasov <[email protected]> >>>>> wrote: >>>>> >>>>>>> What error(s) did you get when trying to restart the region server ? >>>> Have >>>>>>> you checked its log files ? >>>>>> it was a VM, and I was not able to access it any more, I can’t login to >>>>>> it. Restarting several times didn’t helped. >>>>>> >>>>>> >>>>>>> Can you check master log around this time ? If there was region in >>>>>>> transition, balancer wouldn't balance. >>>>>> I have a lot of this >>>>>> 2015-07-06 15:15:39,918 INFO [snapshot-log-cleaner-cache-refresher] >>>>>> util.FSVisitor: No logs under >>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_18.14/WALs >>>>>> 2015-07-06 15:15:39,918 INFO [snapshot-log-cleaner-cache-refresher] >>>>>> util.FSVisitor: No logs under >>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_19.14/WALs >>>>>> 2015-07-06 15:15:39,921 INFO [snapshot-log-cleaner-cache-refresher] >>>>>> util.FSVisitor: No logs under >>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_20.13/WALs >>>>>> 2015-07-06 15:15:39,925 INFO [snapshot-log-cleaner-cache-refresher] >>>>>> util.FSVisitor: No logs under >>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_21.14/WALs >>>>>> 2015-07-06 15:15:39,926 INFO [snapshot-log-cleaner-cache-refresher] >>>>>> util.FSVisitor: No logs under >>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_22.14/WALs >>>>>> 2015-07-06 15:15:39,927 INFO [snapshot-log-cleaner-cache-refresher] >>>>>> util.FSVisitor: No logs under >>>> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_23.14/WALs >>>>>> 2015-07-06 15:15:39,928 INFO [snapshot-log-cleaner-cache-refresher] >>>>>> util.FSVisitor: No logs under >>>>>> directory:hdfs://test/hbase/.hbase-snapshot/testsnap/WALs >>>>>> 2015-07-06 15:15:47,324 INFO [FifoRpcScheduler.handler1-thread-18] >>>>>> master.HMaster: Client=hadoop//10.32.0.140 set balanceSwitch=false >>>>>> 2015-07-06 15:23:31,265 DEBUG [master:hbase-m2:60000.oldLogCleaner] >>>>>> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: >>>>>> hbase-rs1%2C60020%2C1436189457794.1436190023718 >>>>>> 2015-07-06 15:23:31,504 DEBUG [master:hbase-m2:60000.oldLogCleaner] >>>>>> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting: >>>>>> hbase-rs1%2C60020%2C1436189457794.1436193624562 >>>>>> 2015-07-06 15:32:49,382 INFO [FifoRpcScheduler.handler1-thread-14] >>>>>> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false >>>>>> 2015-07-06 15:32:56,936 INFO [FifoRpcScheduler.handler1-thread-1] >>>>>> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false >>>>>> >>>>>> Thank you. >>>>>> >>>>>>> On 06 Jul 2015, at 17:37, Ted Yu <[email protected]> wrote: >>>>>>> >>>>>>> bq. I had to delete and recreate it >>>>>>> >>>>>>> What error(s) did you get when trying to restart the region server ? >>>> Have >>>>>>> you checked its log files ? >>>>>>> >>>>>>> bq. start balancer manually, but it returned false >>>>>>> >>>>>>> Can you check master log around this time ? If there was region in >>>>>>> transition, balancer wouldn't balance. >>>>>>> >>>>>>> Cheers >>>>>>> >>>>>>> On Mon, Jul 6, 2015 at 8:29 AM, Akmal Abbasov < >>>> [email protected]> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> I have a strange behaviour in my HBase cluster. I have 5 rs and 2 >>>>>> masters. >>>>>>>> One of the rs stopped working, restart didn’t worked, and I had to >>>>>> delete >>>>>>>> and recreate it. >>>>>>>> But when this rs have stopped, the cluster also stopped functioning. >>>>>>>> There were a lot of inconsistencies. When I recreated the rs with >>>> disks >>>>>> of >>>>>>>> the previous one, cluster started working. >>>>>>>> But now, only 3 rs host the regions, other 2 have 0 regions. >>>>>>>> I’ve tried to start balancer manually, but it returned false? >>>>>>>> Any idea? >>>>>>>> >>>>>>>> I am using hbase hbase-0.98.7-hadoop2. >>>>>>>> Thank you. >>>>>>>> >>>>>>>> Kind regards, >>>>>>>> Akmal Abbasov >>
