On Wed, Jul 21, 2010 at 1:40 PM, Ted Yu <[email protected]> wrote: > J-D: > Can you elaborate why ssh isn't preferred ? > > One solution is to create a light weight Java process that monitors region > server log and perform this duty. > > On Wed, Jul 21, 2010 at 10:31 AM, Jean-Daniel Cryans > <[email protected]>wrote: > >> The stance in the hbase dev community on that issue is to let users' >> cluster management tools handle it. >> >> Also, how would you start a java process on a remote machine (if still >> alive) from another java process (the master)? If you can find an >> elegant way that doesn't rely on SSH, this would be a nice >> contribution that users could enable. >> >> J-D >> >> On Wed, Jul 21, 2010 at 10:25 AM, Ted Yu <[email protected]> wrote: >> > Thanks for the answer. >> > >> > GC pause seems to be a major cause for region server to come down: >> > 2010-07-21 09:07:14,138 WARN org.apache.hadoop.hbase.util.Sleeper: We >> slept >> > 291505ms, ten times longer than scheduled: 10000 >> > >> > Is it possible for HBase Master to restart dead region server in this >> case ? >> > >> > On Wed, Jul 21, 2010 at 10:02 AM, Jean-Daniel Cryans < >> [email protected]>wrote: >> > >> >> HBaseAdmin.getClusterStatus().getServers() >> >> >> >> J-D >> >> >> >> On Wed, Jul 21, 2010 at 9:56 AM, Ted Yu <[email protected]> wrote: >> >> > Hi, >> >> > Is there API to query the number of live region servers ? >> >> > >> >> > Thanks >> >> > >> >> >> > >> >
Ted, I just wrote to very relevant articles in my blog Using func instead of SSH keys http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/func_hadoop_the_end_of http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/hadoop_secondarynamenode_puppet In which I use the puppet 'service resource' to do automatic restarts of a secondary namenode. You could use puppet to accomplish auto restarts for a region server. I usually tweet my new recipes when I come out with themhttp://twitter.com/@edwardcapriolo
