Re: Zookeeper apparently going down

Lucas Nazário dos Santos Fri, 28 May 2010 10:08:34 -0700

I restarted the process and the logs are gone. I'll keep monitoring HBase
and if this error happen once again I post the logs here.


Thanks a lot.

Lucas



On Fri, May 28, 2010 at 1:50 PM, Jean-Daniel Cryans <jdcry...@apache.org>wrote:

> Yeah this is very suspicious. Also since the error the master tripped
> over happened just after the region server stopped logging in that
> file seems even more suspicious. Usually when there's an error in the
> regionserver's main thread it will go to sysout so that's the .out
> file instead of .log file, but every time you restart a process it
> overwrites it, so unless you didn't restart the region server we
> probably lost the info that were in there. And if the process did die,
> then it really explains why the master wasn't able to connect to it.
>
> J-D
>
> On Fri, May 28, 2010 at 8:37 AM, Lucas Nazário dos Santos
> <nazario.lu...@gmail.com> wrote:
> > Here are the complete logs:
> >
> >
> http://www.ninvest.com.br/docs/logs_hbase/hbase-root-master-ip-10-251-158-224.log
> >
> http://www.ninvest.com.br/docs/logs_hbase/hbase-root-zookeeper-ip-10-251-158-224.log
> >
> http://www.ninvest.com.br/docs/logs_hbase/hbase-root-regionserver-ip-10-251-158-224.log
> >
> > The regionserver stopped logging at 8:31am. Strange...
> >
> > I hope this help.
> >
> > Lucas
> >
> >
> > On Thu, May 27, 2010 at 8:09 PM, Jean-Daniel Cryans <jdcry...@apache.org
> >wrote:
> >
> >> On Thu, May 27, 2010 at 4:01 PM, Lucas Nazário dos Santos
> >> <nazario.lu...@gmail.com> wrote:
> >> > Thanks a lot for the responses. I'll be monitoring HBase and get back
> in
> >> > touch if it happens again.
> >> >
> >> > Maybe HBase could employ a mechanism to automatically recover from
> >> > connectivity issues like the one I had gone through. Then me and
> others
> >> > wouldn't need to manually restart it.
> >>
> >> Well usually if one machine is not reachable, it's not a big deal
> >> since there are other machines to connect to and HBase redistributes
> >> the regions to them. Also, why is it refused? Can we see the region
> >> server log?
> >>
> >> >
> >> > I still didn't get why the master kept failing even after its
> recovery,
> >> and
> >> > why I had to stop/start the cluster in order to get rid of the
> >> "Connection
> >> > refused" error.
> >>
> >> I'd also like to understand why the region server isn't responding,
> >> the master can only know so much.
> >>
> >> >
> >> > I'm assuming it's not big deal and my solution can live with it.
> >> >
> >> > More logs bellow.
> >> >
> >>
> >> Consider pastebin or a web server next time ;)
> >>
> >
>

Re: Zookeeper apparently going down

Reply via email to