Re: What happens to data nodes when name node has failed for long time?

mark charts Sun, 14 Dec 2014 17:02:07 -0800

"Prior to the Hadoop 2.x series, the NameNode was a single point of failure in 
anHDFS cluster — in other words, if the machine on which the single NameNodewas 
configured became unavailable, the entire cluster would be unavailableuntil the 
NameNode could be restarted. This was bad news, especially in thecase of 
unplanned outages, which could result in significant downtime if thecluster 
administrator weren’t available to restart the NameNode.The solution to this 
problem is addressed by the HDFS High Availability fea-ture. The idea is to run 
two NameNodes in the same cluster — one activeNameNode and one hot standby 
NameNode. If the active NameNode crashesor needs to be stopped for planned 
maintenance, it can be quickly failed overto the hot standby NameNode, which 
now becomes the active NameNode.The key is to keep the standby node 
synchronized with the active node; thisaction is now accomplished by having 
both nodes access a shared NFS direc-tory. All namespace changes on the active 
node are logged in the shareddirectory. The standby node picks up those changes 
from the directory andapplies them to its own namespace. In this way, the 
standby NameNode actsas a current backup of the active NameNode. The standby 
node also has cur-rent block location information, because DataNode heartbeats 
are routinelysent to both active and standby NameNodes.To ensure that only one 
NameNode is the “active” node at any given time,configure a fencing process for 
the shared storage directory; then, during afailover, if it appears that the 
failed NameNode still carries the active state,the configured fencing process 
prevents that node from accessing the shareddirectory and permits the newly 
active node (the former standby node) tocomplete the failover. The machines 
that will serve as the active and standby NameNodes in yourHigh Availability 
cluster should have equivalent hardware. The shared NFSstorage directory, which 
must be accessible to both active and standbyNameNodes, is usually located on a 
separate machine and can be mounted oneach NameNode machine. To prevent this 
directory from becoming a singlepoint of failure, configure multiple network 
paths to the storage directory, andensure that there’s redundancy in the 
storage itself. Use a dedicated network-attached storage (NAS) appliance to 
contain the shared storage directory."   sic
Courtesy of Dirk deRoos, Paul C. Zikopoulos, Bruce Brown,Rafael Coss, and Roman 
B. Melnyk.


Ps. I am looking for work as Hadoop Admin/Developer (I am an Electrical Engr w/ 
MSEE). I've implemented one 6 node cluster successfully at work a few months 
ago for productivity purposes at work (that's my claim to fame). I was laid off 
shortly afterwards. No correlation I suspect. But I am in FL and willing to go 
anywhere to find contract/permanent work. If anyone knows of a position for a 
tenacious Hadoop engineer, I am interested.

Thank you.

Mark Charts
 

     On Sunday, December 14, 2014 5:30 PM, daemeon reiydelle 
<[email protected]> wrote:
   

 I found the terminology of primary and secondary to be a bit confusing in 
describing operation after a failure scenario. Perhaps it is helpful to think 
that the Hadoop instance is guided to select a node as primary for normal 
operation. If that node fails, then the backup becomes the new primary. In 
analyzing traffic it appears that the restored node does not become primary 
again until the whole instance restarts. I myself would welcome clarification 
on this observed behavior.


.......
“Life should not be a journey to the grave with the intention of arriving 
safely in a
pretty and well preserved body, but rather to skid in broadside in a cloud of 
smoke,
thoroughly used up, totally worn out, and loudly proclaiming “Wow! What a 
Ride!” 
- Hunter Thompson

Daemeon C.M. Reiydelle
USA (+1) 415.501.0198
London (+44) (0) 20 8144 9872

On Fri, Dec 12, 2014 at 7:56 AM, Rich Haase <[email protected]> wrote:

The remaining cluster services will continue to run.  That way when the 
namenode (or other failed processes) is restored the cluster will resume 
healthy operation.  This is part of hadoop’s ability to handle network 
partition events. Rich Haase | Sr. Software Engineer | Pandoram 303.887.1146 | 
[email protected]
From: Chandrashekhar Kotekar <[email protected]>
Reply-To: "[email protected]" <[email protected]>
Date: Friday, December 12, 2014 at 3:57 AM
To: "[email protected]" <[email protected]>
Subject: What happens to data nodes when name node has failed for long time?

Hi,
What happens if name node has crashed for more than one hour but secondary name 
node, all the data nodes, job tracker, task trackers are running fine? Do those 
daemon services also automatically shutdown after some time? Or those services 
keep running hoping for namenode to come back?

Regards,
Chandrash3khar Kotekar
Mobile - +91 8600011455

cover letter Hadoop 13Dec2014.doc
Description: MS-Word document

Mark Charts 12Dec2014 Hadoop CV.doc
Description: MS-Word document

Re: What happens to data nodes when name node has failed for long time?

Reply via email to