Re: HBase strange behaviour

Ted Yu Mon, 06 Jul 2015 15:18:01 -0700

Have you run the following command in hbase shell ?
balance_switch true

Cheers


On Mon, Jul 6, 2015 at 12:16 PM, Akmal Abbasov <[email protected]>
wrote:

> > Do you see in the master log something similar to the following ?
> >
> > master.HMaster: Not running balancer because 1 region(s) in transition
> yes, I have several of them, but all of them were 3 days ago.
>
> I check the ‘ritCount’ metric, and it is 0, also I checked the
> /hbase/region-in-transition znode, which is also empty.
> But I can’t start balancer manually.
>
> I took snapshot of tables each our.
> I’ve checked the path
> /hadoop-ha/testhbase1/rmstore/ZKRMStateRoot/RMAppRoot under in zookeeper,
> and there
> are ~4000 applications. It looks that all of them are create snapshot
> operations. Also I’ve observed that the CPU
> usage of the master is much higher that it was in the past.
> Is it possible that all of this applications are causing the problem?
>
> Can I delete all of this applications?
>
>
> > On 06 Jul 2015, at 18:45, Ted Yu <[email protected]> wrote:
> >
> > Do you see in the master log something similar to the following ?
> >
> > master.HMaster: Not running balancer because 1 region(s) in transition
> >
> > You can search backwards for balancer / assignment related logs.
> >
> > Cheers
> >
> > On Mon, Jul 6, 2015 at 8:49 AM, Akmal Abbasov <[email protected]>
> > wrote:
> >
> >>> What error(s) did you get when trying to restart the region server ?
> Have
> >>> you checked its log files ?
> >> it was a VM, and I was not able to access it any more, I can’t login to
> >> it. Restarting several times didn’t helped.
> >>
> >>
> >>> Can you check master log around this time ? If there was region in
> >>> transition, balancer wouldn't balance.
> >> I have a lot of this
> >> 2015-07-06 15:15:39,918 INFO  [snapshot-log-cleaner-cache-refresher]
> >> util.FSVisitor: No logs under
> >>
> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_18.14/WALs
> >> 2015-07-06 15:15:39,918 INFO  [snapshot-log-cleaner-cache-refresher]
> >> util.FSVisitor: No logs under
> >>
> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_19.14/WALs
> >> 2015-07-06 15:15:39,921 INFO  [snapshot-log-cleaner-cache-refresher]
> >> util.FSVisitor: No logs under
> >>
> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_20.13/WALs
> >> 2015-07-06 15:15:39,925 INFO  [snapshot-log-cleaner-cache-refresher]
> >> util.FSVisitor: No logs under
> >>
> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_21.14/WALs
> >> 2015-07-06 15:15:39,926 INFO  [snapshot-log-cleaner-cache-refresher]
> >> util.FSVisitor: No logs under
> >>
> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_22.14/WALs
> >> 2015-07-06 15:15:39,927 INFO  [snapshot-log-cleaner-cache-refresher]
> >> util.FSVisitor: No logs under
> >>
> directory:hdfs://test/hbase/.hbase-snapshot/table1-snapshot-31.05.2015_23.14/WALs
> >> 2015-07-06 15:15:39,928 INFO  [snapshot-log-cleaner-cache-refresher]
> >> util.FSVisitor: No logs under
> >> directory:hdfs://test/hbase/.hbase-snapshot/testsnap/WALs
> >> 2015-07-06 15:15:47,324 INFO  [FifoRpcScheduler.handler1-thread-18]
> >> master.HMaster: Client=hadoop//10.32.0.140 set balanceSwitch=false
> >> 2015-07-06 15:23:31,265 DEBUG [master:hbase-m2:60000.oldLogCleaner]
> >> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
> >> hbase-rs1%2C60020%2C1436189457794.1436190023718
> >> 2015-07-06 15:23:31,504 DEBUG [master:hbase-m2:60000.oldLogCleaner]
> >> master.ReplicationLogCleaner: Didn't find this log in ZK, deleting:
> >> hbase-rs1%2C60020%2C1436189457794.1436193624562
> >> 2015-07-06 15:32:49,382 INFO  [FifoRpcScheduler.handler1-thread-14]
> >> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false
> >> 2015-07-06 15:32:56,936 INFO  [FifoRpcScheduler.handler1-thread-1]
> >> master.HMaster: Client=hadoop//10.32.0.156 set balanceSwitch=false
> >>
> >> Thank you.
> >>
> >>> On 06 Jul 2015, at 17:37, Ted Yu <[email protected]> wrote:
> >>>
> >>> bq. I had to delete and recreate it
> >>>
> >>> What error(s) did you get when trying to restart the region server ?
> Have
> >>> you checked its log files ?
> >>>
> >>> bq. start balancer manually, but it returned false
> >>>
> >>> Can you check master log around this time ? If there was region in
> >>> transition, balancer wouldn't balance.
> >>>
> >>> Cheers
> >>>
> >>> On Mon, Jul 6, 2015 at 8:29 AM, Akmal Abbasov <
> [email protected]>
> >>> wrote:
> >>>
> >>>> Hi all,
> >>>> I have a strange behaviour in my HBase cluster. I have 5 rs and 2
> >> masters.
> >>>> One of the rs stopped working, restart didn’t worked, and I had to
> >> delete
> >>>> and recreate it.
> >>>> But when this rs have stopped, the cluster also stopped functioning.
> >>>> There were a lot of inconsistencies. When I recreated the rs with
> disks
> >> of
> >>>> the previous one, cluster started working.
> >>>> But now, only 3 rs host the regions, other 2 have 0 regions.
> >>>> I’ve tried to start balancer manually, but it returned false?
> >>>> Any idea?
> >>>>
> >>>> I am using hbase hbase-0.98.7-hadoop2.
> >>>> Thank you.
> >>>>
> >>>> Kind regards,
> >>>> Akmal Abbasov
> >>>>
> >>>>
> >>
> >>
>
>

Re: HBase strange behaviour

Reply via email to