subject:"Some problems in one accident on my production cluster"

Re: Some problems in one accident on my production cluster

2016-02-24 Thread Heng Chen

Thanks stack and ted for your help. After check the code, i think the reason is RS send split request with parent region, two daughter regions, then RS crash. Master update two daughter regions to be SPLIT_NEW state and put them in regionsInTransition which is stored in memory of master. And

Re: Some problems in one accident on my production cluster

2016-02-24 Thread Stack

On Wed, Feb 24, 2016 at 3:31 PM, Heng Chen wrote: > The story is I run one MR job on my production cluster (0.98.6), it needs > to scan one table during map procedure. > > Because of the heavy load from the job, all my RS crashed due to OOM. > > Really big rows? If

Re: Some problems in one accident on my production cluster

2016-02-24 Thread Ted Yu

bq. RegionStates: THIS SHOULD NOT HAPPEN: unexpected { ad283942aff2bba6c0b94ff98a904d1a state=SPLITTING_NEW Looks like the above wouldn't have happened if you are using 0.98.11+ See HBASE-12958 On Wed, Feb 24, 2016 at 6:39 PM, Heng Chen wrote: > I pick up some logs

Re: Some problems in one accident on my production cluster

2016-02-24 Thread Heng Chen

Thanks @ted, your suggestions about 2 and 3 are what i need ! 2016-02-25 10:39 GMT+08:00 Heng Chen : > I pick up some logs in master.log about one region > "ad283942aff2bba6c0b94ff98a904d1a" > > > 2016-02-24 16:24:35,610 INFO [AM.ZK.Worker-pool2-t3491] >

Re: Some problems in one accident on my production cluster

2016-02-24 Thread Heng Chen

I pick up some logs in master.log about one region "ad283942aff2bba6c0b94ff98a904d1a" 2016-02-24 16:24:35,610 INFO [AM.ZK.Worker-pool2-t3491] master.RegionStates: Transition null to {ad283942aff2bba6c0b94ff98a904d1a state=SPLITTING_NEW, ts=1456302275610,

Re: Some problems in one accident on my production cluster

2016-02-24 Thread Ted Yu

bq. two regions were in transition Can you pastebin related server logs w.r.t. these two regions so that we can have more clue ? For #2, please see http://hbase.apache.org/book.html#big.cluster.config For #3, please see

Some problems in one accident on my production cluster

2016-02-24 Thread Heng Chen

The story is I run one MR job on my production cluster (0.98.6), it needs to scan one table during map procedure. Because of the heavy load from the job, all my RS crashed due to OOM. After i restart all RS, i found one problem. All regions were reopened on one RS, and balancer could not

Re: Some problems in one accident on my production cluster

Re: Some problems in one accident on my production cluster

Re: Some problems in one accident on my production cluster

Re: Some problems in one accident on my production cluster

Re: Some problems in one accident on my production cluster

Re: Some problems in one accident on my production cluster

Some problems in one accident on my production cluster

7 matches

Site Navigation

Mail list logo

Footer information