Hi Artem, Yes that usually is what most do and should work fine in production environments. If you're worried about NFS going up/down often, then using a release with both https://issues.apache.org/jira/browse/HADOOP-4885 (dfs.name.dir.restore feature, toggled to true at the NN) and https://issues.apache.org/jira/browse/HDFS-3652 (a possible edge-case your config may be exposing, when it comes to ejecting bad name-dirs at the NN) will help further.
On Tue, Sep 18, 2012 at 10:03 PM, Artem Ervits <[email protected]> wrote: > Thanks Harsh, > > I'm aware of the implications of copying periodically. This is just a test > until I get an NFS share to play with. Do you just let Hadoop write to two > directories where one is an NFS share or is there another way? > > -----Original Message----- > From: Harsh J [mailto:[email protected]] > Sent: Monday, September 17, 2012 10:44 PM > To: [email protected] > Subject: Re: Hadoop recovery test > > Hi Artem, > > You are running 1 DN in this cluster from what I see, and hence you can > ignore the reports that go: Under replicated blk_7701720691642589882_1086. > Target Replicas is 3 but found 1 replica(s). > > The two truly missing blocks are: > > /hdfs/hadoop/tmp/mapred/system/jobtracker.info: MISSING 1 blocks > /user/hduser/teragen-out/part-00000: MISSING 1 blocks > > Which may be cause of those being written at the time of your copy of the > fsimage and edits (thats a wrong way to go about it, btw - you should > configure for redundant writes such that you also sustain failures, not copy > it periodically - thats not a consistent way to keep a backup, and you can > rather go for dfsadmin methods to fetchImage instead). Does that sound likely? > > On Tue, Sep 18, 2012 at 3:08 AM, Artem Ervits <[email protected]> wrote: >> Hello all, >> >> >> >> I am testing the Hadoop recovery as per >> http://wiki.apache.org/hadoop/NameNode document. But instead of using >> an NFS share, I am copying to another directory. Then when I shut down >> the cluster, I scp that directory to another server and start Hadoop >> cluster using that machine as the namenode. I see in the log that some >> blocks are corrupt and/or missing. Do I have to wait for replication >> to recover all blocks or am I doing something else altogether? I am >> using Hadoop 1.0.3. Can someone point me to a more detailed document >> than the wiki in case I'm doing something wrong. >> >> >> >> p.s. if I restart the cluster using the original namenode, filesystem >> reports as healthy. >> >> >> >> Thank you. >> >> >> >> . >> >> /hdfs/hadoop/tmp/mapred/system/jobtracker.info: CORRUPT block >> blk_9043419219670949307 >> >> >> >> /hdfs/hadoop/tmp/mapred/system/jobtracker.info: MISSING 1 blocks of >> total size 4 B... >> >> /user/hduser/teragen/_logs/history/job_201209120941_0002_1347458152167_hduser_TeraGen: >> Under replicated blk_-976282286234272458_1079. Target Replicas is 3 >> but found 1 replica(s). >> >> . >> >> /user/hduser/teragen/_logs/history/job_201209120941_0002_conf.xml: >> Under replicated blk_137658109390447967_1075. Target Replicas is 3 but >> found 1 replica(s). >> >> . >> >> /user/hduser/teragen/_partition.lst: Under replicated >> blk_-3005280481530403302_1080. Target Replicas is 3 but found 1 replica(s). >> >> . >> >> /user/hduser/teragen/part-00000: Under replicated >> blk_-7008813028808832816_1077. Target Replicas is 3 but found 1 replica(s). >> >> . >> >> /user/hduser/teragen/part-00001: Under replicated >> blk_-5256967771026054061_1078. Target Replicas is 3 but found 1 replica(s). >> >> .. >> >> /user/hduser/teragen-out/_logs/history/job_201209120941_0003_1347458249920_hduser_TeraSort: >> Under replicated blk_1137779303840586677_1089. Target Replicas is 3 >> but found 1 replica(s). >> >> . >> >> /user/hduser/teragen-out/_logs/history/job_201209120941_0003_conf.xml: >> Under replicated blk_7701720691642589882_1086. Target Replicas is 3 >> but found 1 replica(s). >> >> . >> >> /user/hduser/teragen-out/part-00000: CORRUPT block >> blk_8059469267617478950 >> >> >> >> /user/hduser/teragen-out/part-00000: MISSING 1 blocks of total size >> 1000000 B... >> >> /user/hduser/teragen-validate/_logs/history/job_201209120941_0004_1347458495941_hduser_TeraValidate: >> Under replicated blk_5680565744062298575_1098. Target Replicas is 3 >> but found 1 replica(s). >> >> . >> >> /user/hduser/teragen-validate/_logs/history/job_201209120941_0004_conf.xml: >> Under replicated blk_1566253937037013126_1095. Target Replicas is 3 >> but found 1 replica(s). >> >> .Status: CORRUPT >> >> Total size: 1050720258 B >> >> Total dirs: 39 >> >> Total files: 32 >> >> Total blocks (validated): 42 (avg. block size 25017149 B) >> >> ******************************** >> >> CORRUPT FILES: 2 >> >> MISSING BLOCKS: 2 >> >> MISSING SIZE: 1000004 B >> >> CORRUPT BLOCKS: 2 >> >> ******************************** >> >> Minimally replicated blocks: 40 (95.2381 %) >> >> Over-replicated blocks: 0 (0.0 %) >> >> Under-replicated blocks: 40 (95.2381 %) >> >> Mis-replicated blocks: 0 (0.0 %) >> >> Default replication factor: 3 >> >> Average block replication: 0.95238096 >> >> Corrupt blocks: 2 >> >> Missing replicas: 80 (200.0 %) >> >> Number of data-nodes: 1 >> >> Number of racks: 1 >> >> FSCK ended at Mon Sep 17 17:29:08 EDT 2012 in 21 milliseconds >> >> >> >> >> >> The filesystem under path '/' is CORRUPT >> >> >> >> >> >> Artem Ervits >> >> Data Analyst >> >> New York Presbyterian Hospital >> >> >> >> >> ________________________________ >> This electronic message is intended to be for the use only of the >> named recipient, and may contain information that is confidential or >> privileged. >> If you are not the intended recipient, you are hereby notified that >> any disclosure, copying, distribution or use of the contents of this >> message is strictly prohibited. If you have received this message in >> error or are not the named recipient, please notify us immediately by >> contacting the sender at the electronic mail address noted above, and >> delete and destroy all copies of this message. Thank you. >> >> -------------------- >> >> This electronic message is intended to be for the use only of the >> named recipient, and may contain information that is confidential or >> privileged. >> If you are not the intended recipient, you are hereby notified that >> any disclosure, copying, distribution or use of the contents of this >> message is strictly prohibited. If you have received this message in >> error or are not the named recipient, please notify us immediately by >> contacting the sender at the electronic mail address noted above, and >> delete and destroy all copies of this message. Thank you. >> >> -------------------- >> >> This electronic message is intended to be for the use only of the >> named recipient, and may contain information that is confidential or >> privileged. >> If you are not the intended recipient, you are hereby notified that >> any disclosure, copying, distribution or use of the contents of this >> message is strictly prohibited. If you have received this message in >> error or are not the named recipient, please notify us immediately by >> contacting the sender at the electronic mail address noted above, and >> delete and destroy all copies of this message. Thank you. >> >> > > > > -- > Harsh J > > > -------------------- > > This electronic message is intended to be for the use only of the named > recipient, and may contain information that is confidential or privileged. > If you are not the intended recipient, you are hereby notified that any > disclosure, copying, distribution or use of the contents of this message is > strictly prohibited. If you have received this message in error or are not > the named recipient, please notify us immediately by contacting the sender at > the electronic mail address noted above, and delete and destroy all copies of > this message. Thank you. > > > > > -------------------- > > This electronic message is intended to be for the use only of the named > recipient, and may contain information that is confidential or privileged. > If you are not the intended recipient, you are hereby notified that any > disclosure, copying, distribution or use of the contents of this message is > strictly prohibited. If you have received this message in error or are not > the named recipient, please notify us immediately by contacting the sender at > the electronic mail address noted above, and delete and destroy all copies of > this message. Thank you. > > > -- Harsh J
