I restarted the process and the logs are gone. I'll keep monitoring HBase and if this error happen once again I post the logs here.
Thanks a lot. Lucas On Fri, May 28, 2010 at 1:50 PM, Jean-Daniel Cryans <jdcry...@apache.org>wrote: > Yeah this is very suspicious. Also since the error the master tripped > over happened just after the region server stopped logging in that > file seems even more suspicious. Usually when there's an error in the > regionserver's main thread it will go to sysout so that's the .out > file instead of .log file, but every time you restart a process it > overwrites it, so unless you didn't restart the region server we > probably lost the info that were in there. And if the process did die, > then it really explains why the master wasn't able to connect to it. > > J-D > > On Fri, May 28, 2010 at 8:37 AM, Lucas Nazário dos Santos > <nazario.lu...@gmail.com> wrote: > > Here are the complete logs: > > > > > http://www.ninvest.com.br/docs/logs_hbase/hbase-root-master-ip-10-251-158-224.log > > > http://www.ninvest.com.br/docs/logs_hbase/hbase-root-zookeeper-ip-10-251-158-224.log > > > http://www.ninvest.com.br/docs/logs_hbase/hbase-root-regionserver-ip-10-251-158-224.log > > > > The regionserver stopped logging at 8:31am. Strange... > > > > I hope this help. > > > > Lucas > > > > > > On Thu, May 27, 2010 at 8:09 PM, Jean-Daniel Cryans <jdcry...@apache.org > >wrote: > > > >> On Thu, May 27, 2010 at 4:01 PM, Lucas Nazário dos Santos > >> <nazario.lu...@gmail.com> wrote: > >> > Thanks a lot for the responses. I'll be monitoring HBase and get back > in > >> > touch if it happens again. > >> > > >> > Maybe HBase could employ a mechanism to automatically recover from > >> > connectivity issues like the one I had gone through. Then me and > others > >> > wouldn't need to manually restart it. > >> > >> Well usually if one machine is not reachable, it's not a big deal > >> since there are other machines to connect to and HBase redistributes > >> the regions to them. Also, why is it refused? Can we see the region > >> server log? > >> > >> > > >> > I still didn't get why the master kept failing even after its > recovery, > >> and > >> > why I had to stop/start the cluster in order to get rid of the > >> "Connection > >> > refused" error. > >> > >> I'd also like to understand why the region server isn't responding, > >> the master can only know so much. > >> > >> > > >> > I'm assuming it's not big deal and my solution can live with it. > >> > > >> > More logs bellow. > >> > > >> > >> Consider pastebin or a web server next time ;) > >> > > >