RE: Question about recovering from a corrupted namenode 0.16.0

dhruba Borthakur Thu, 13 Mar 2008 14:10:38 -0700

Your procedure is right:

1. Copy edit.tmp from secondary to edit on primary
2. Copy srcimage from secondary to fsimage on primary 
3. remove edits.new on primary
4. restart cluster, put in Safemode, fsck /


However, the above steps are not foolproof because the transactions that
occured between the time when the last checkpoint was taken by the
secondary and when the disk became full are lost. This could cause some
blocks to go missing too, because the last checkpoint might refer to
blocks that are no longer present. If the fsck does not report any
missing blocks, then you are good to go.

Thanks,
dhruba

-----Original Message-----
From: Jason Venner [mailto:[EMAIL PROTECTED] 
Sent: Thursday, March 13, 2008 1:37 PM
To: core-user@hadoop.apache.org
Subject: Question about recovering from a corrupted namenode 0.16.0

The namenode ran out of disk space and on restart was throwing the error

at the end of this message.

We copied in the edit.tmp to edit from the secondary, and copied in 
srcimage to fsimage, and removed edit.new and our file system started up
and /appears/ to be intact.

What is the proper procedure, we didn't find any details on the wiki.

Namenode error:
2008-03-13 13:19:32,493 ERROR org.apache.hadoop.dfs.NameNode: 
java.io.EOFException
    at java.io.DataInputStream.readFully(DataInputStream.java:180)
    at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)
    at
org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)
    at org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:507)
    at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:744)
    at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:624)
    at
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:222)
    at
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:79)
    at
org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:254)
    at org.apache.hadoop.dfs.FSNamesystem.<init>(FSNamesystem.java:235)
    at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:130)
    at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:175)
    at org.apache.hadoop.dfs.NameNode.<init>(NameNode.java:161)
    at org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:843)
    at org.apache.hadoop.dfs.NameNode.main(NameNode.java:852)



-- 
Jason Venner
Attributor - Publish with Confidence <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers, contact if interested

RE: Question about recovering from a corrupted namenode 0.16.0

Reply via email to