Silly question...

If you lose the name node (see your thread below)... 
Why do you not restart your data nodes as well?

Your NN is a 'single point of failure' and when you lose the name node, you 
pretty much have a DOA system.
Since HBase sits on top of HDFS, you're bound to have inconsistencies and 
issues.  If you lose your NN, you should be bringing down the entire cluster 
and restarting *everything*.
Yes this is a pain, but it would solve a lot of your problems...


JMHO

-Mike


> Date: Mon, 7 Mar 2011 15:48:27 +0530
> From: [email protected]
> Subject: RE: will HBase detect NN failure?
> To: [email protected]
> 
> I have got this issue in the build taken from latest append trunk only.
> 
>  
> 
> These are the steps to reproduce.
> 
> 1. Write a file and do some syncs but not close 
> 
> 2. Restart NN
> 
> 3. Run the following while loop for the above file
> 
>  
> 
>   _____  
> 
> From: Ryan Rawson [mailto:[email protected]] 
> Sent: Monday, March 07, 2011 3:26 PM
> To: [email protected]; [email protected]
> Subject: Re: will HBase detect NN failure?
> 
>  
> 
> There are a series of patches that address this, check the recent commit
> history of append branch. 
> 
> On Mar 7, 2011 1:52 AM, "Gokulakannan M" <[email protected]> wrote:
> > Hi All,
> > 
> > 
> > 
> > In HBase 0.90 I have seen that it has a fault tolerant behavior
> > of triggering lease recovery and closing the file when the writer dies in
> > the middle. Yet does hbase have any workaround/recovery when Namenode is
> > restarted in the middle of the file write(possibly the HLog file , after
> > some syncs)???
> > 
> > I faced a problem in the above scenario. When the NN is
> > restarted(but not DN), the following code goes into infinite loop as lease
> > recovery is not at all happening. But once the DN is restarted, the file
> can
> > be recovered successfully(I think the DN is not sending those partial
> blocks
> > in blocksBeingWritten to NN when only NN is restarted). 
> > 
> > 
> > 
> > // Recover the files lease if necessary
> > boolean recovered = false;
> > while (!recovered) {
> > try {
> > FSDataOutputStream out = fs.append(logfiles[i].getPath());
> > out.close();
> > recovered = true;
> > } catch (IOException e) {
> > if (LOG.isDebugEnabled()) {
> > LOG.debug("Triggering lease recovery.");
> > }
> > try {
> > Thread.sleep(leaseRecoveryPeriod);
> > } catch (InterruptedException ex) {
> > // ignore it and try again
> > }
> > }
> > 
> > 
> > 
> > 
> > 
> > Thanks,
> > 
> > Gokul
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> 
                                          

Reply via email to