Thanks stack and ted for your help.
After check the code, i think the reason is RS send split request with
parent region, two daughter regions, then RS crash.
Master update two daughter regions to be SPLIT_NEW state and put them
in regionsInTransition
which is stored in memory of master.
And
On Wed, Feb 24, 2016 at 3:31 PM, Heng Chen wrote:
> The story is I run one MR job on my production cluster (0.98.6), it needs
> to scan one table during map procedure.
>
> Because of the heavy load from the job, all my RS crashed due to OOM.
>
>
Really big rows? If
bq. RegionStates: THIS SHOULD NOT HAPPEN: unexpected {
ad283942aff2bba6c0b94ff98a904d1a state=SPLITTING_NEW
Looks like the above wouldn't have happened if you are using 0.98.11+
See HBASE-12958
On Wed, Feb 24, 2016 at 6:39 PM, Heng Chen wrote:
> I pick up some logs
Thanks @ted, your suggestions about 2 and 3 are what i need !
2016-02-25 10:39 GMT+08:00 Heng Chen :
> I pick up some logs in master.log about one region
> "ad283942aff2bba6c0b94ff98a904d1a"
>
>
> 2016-02-24 16:24:35,610 INFO [AM.ZK.Worker-pool2-t3491]
>
I pick up some logs in master.log about one region
"ad283942aff2bba6c0b94ff98a904d1a"
2016-02-24 16:24:35,610 INFO [AM.ZK.Worker-pool2-t3491]
master.RegionStates: Transition null to {ad283942aff2bba6c0b94ff98a904d1a
state=SPLITTING_NEW, ts=1456302275610,
bq. two regions were in transition
Can you pastebin related server logs w.r.t. these two regions so that we
can have more clue ?
For #2, please see http://hbase.apache.org/book.html#big.cluster.config
For #3, please see
The story is I run one MR job on my production cluster (0.98.6), it needs
to scan one table during map procedure.
Because of the heavy load from the job, all my RS crashed due to OOM.
After i restart all RS, i found one problem.
All regions were reopened on one RS, and balancer could not