Re: Regions in Transition: FAILED_CLOSE status

jeff saremi Tue, 23 May 2017 14:55:53 -0700

Vladimir, thanks a lot for helping us out

So I checked the no of RS in the master console. It was more than what we 
alloted.


Then I went to the list of FAIL_CLOSED regions, copied server names and then 
issued delete against those nodes in ZK.

I restarted masters (I don't think i need to do this step) and now all regions 
show as fine

Happy now!

________________________________
From: Vladimir Rodionov <[email protected]>
Sent: Tuesday, May 23, 2017 2:41:30 PM
To: [email protected]
Subject: Re: Regions in Transition: FAILED_CLOSE status

My bad, that is FAIL_CLOSE

Anyway, start with Master log, find region name in a FAIL_CLOSE, check RS
log that hosts this region.

On Tue, May 23, 2017 at 2:35 PM, James Moore <[email protected]> wrote:

> How many region servers are dead? and we're they colocated with DataNodes?
>
> On Tue, May 23, 2017 at 5:20 PM, Vladimir Rodionov <[email protected]
> >
> wrote:
>
> > When Master attempt to assign region to RS and assignment fails, there
> > should be something in RS log file (check errors),
> > that explains reason of a failure.
> >
> > How many not-assigned region do you have? You can try to assign them
> > manually in hbase shell
> >
> > On Tue, May 23, 2017 at 1:25 PM, jeff saremi <[email protected]>
> > wrote:
> >
> > > Are dead region servers to blame? Is this possibly stale information in
> > > the ZK?
> > >
> > > ________________________________
> > > From: Vladimir Rodionov <[email protected]>
> > > Sent: Tuesday, May 23, 2017 12:20:16 PM
> > > To: [email protected]
> > > Subject: Re: Regions in Transition: FAILED_CLOSE status
> > >
> > > You should check RS logs to see why regions can not be assigned.
> > > Get RS name from master log and check RS log
> > >
> > > -Vlad
> > >
> > > On Tue, May 23, 2017 at 11:47 AM, jeff saremi <[email protected]>
> > > wrote:
> > >
> > > > Our write code throws exceptions like the following:
> > > >
> > > > org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
> > > > Failed 10331 actions: NotServingRegionException: 10331 times,at
> > > > org.apache.hadoop.hbase.client.AsyncProcess$
> BatchErrors.makeException(
> > > > AsyncProcess.java:258)
> > > >   at org.apache.hadoop.hbase.client.AsyncProcess$
> > > BatchErrors.access$2000(
> > > > AsyncProcess.java:238)
> > > >   at org.apache.hadoop.hbase.client.AsyncProcess.
> > > > waitForAllPreviousOpsAndReset(AsyncProcess.java:1817)
> > > >   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> > > > backgroundFlushCommits(BufferedMutatorImpl.java:240)
> > > >   at org.apache.hadoop.hbase.client.BufferedMutatorImpl.
> > > > mutate(BufferedMutatorImpl.java:146)
> > > >   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:1028)
> > > >   at com.microsoft.bing.malta.hbaseClient11$$anon$2.run(
> > > > ImageFeaturesHdfsToHbaseInjector.scala:115)
> > > >   at java.lang.Thread.run(Thread.java:745)
> > > >
> > > >
> > > > ________________________________
> > > > From: jeff saremi <[email protected]>
> > > > Sent: Tuesday, May 23, 2017 11:36:11 AM
> > > > To: [email protected]
> > > > Subject: Regions in Transition: FAILED_CLOSE status
> > > >
> > > > Why are a few hundred of our regions in this state? and what can we
> do
> > to
> > > > fix this?
> > > > I have been running hbck a few times (is running one time enough?) to
> > no
> > > > avail.
> > > >
> > > > Internet search does not come up with anything useful either.
> > > >
> > > > I have restarted all masters and all region servers with no luck.
> > > >
> > > > Jeff
> > > >
> > >
> >
>

Re: Regions in Transition: FAILED_CLOSE status

Reply via email to