Hi Stack, The NPE is this: 10/12/18 15:39:07 WARN hdfs.StateChange: DIR* FSDirectory.unprotectedSetTimes: failed to setTimes /hbase/inrdb_ris_update_rrc00/fe5090c366e326cf2b123502e2d4bcce/data/1350525083587292896 because source does not exist 10/12/18 15:39:07 WARN hdfs.StateChange: DIR* FSDirectory.unprotectedSetTimes: failed to setTimes /hbase/inrdb_ris_update_rrc00/fe5090c366e326cf2b123502e2d4bcce/meta/4413022065008239343 because source does not exist 10/12/18 15:39:07 DEBUG namenode.FSNamesystem: 0: /hbase/.logs/w2r1.inrdb.ripe.net,60020,1292333234919/w2r1.inrdb.ripe.net%3A60020.1292336839737 numblocks : 0 clientHolder DFSClient_131715208 clientMachine 193.0.23.32 10/12/18 15:39:07 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: failed to remove /hbase/.logs/w2r1.inrdb.ripe.net,60020,1292333234919/w2r1.inrdb.ripe.net%3A60020.1292336839737 because it does not exist 10/12/18 15:39:07 ERROR namenode.NameNode: java.lang.NullPointerException at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1088) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1100) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1003) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:206) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:637) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1039) at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:845) at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:379) at org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:343) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:317) at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:214) at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:394) at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1148) at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1157)
10/12/18 15:39:07 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at m1r1.inrdb.ripe.net/193.0.23.51<http://m1r1.inrdb.ripe.net/193.0.23.51> ************************************************************/ It looks like the array populated by INodeDirectoryWithQuota#getExistingPathINodes(...) has a null somewhere which is not expected by the FSDirectory#addChild(...). I marked the entries for the broken file invalid in the edit log with a hex editor. Then the NN does come back online. It then reports that only about 88% of all blocks are being reported and stays in safe mode. Of course I could set the threshold lower and make it work, but I am wondering if it just stopped persisting edits at some point and the only correct version was in memory or something. I am running a secondary NN. Restoring a checkpoint doesn't solve the problem. We store NN data on a filer that does hourly, daily and weekly snapshots, so I could probably go back to a working version, but I don't think HBase would work afterwards. We do a lot of updates on data so splits and compactions are quite common, so I guess an older version of the NN data will surely point to blocks that no longer exist. We have a secured storage with all our source data, so re-importing everything is an option which mostly takes about two weeks of time and, above all, is probably quite bad for Hadoop's reputation within the organization. My main concern is this happening again. (Sorry for being a bit off topic on this list, but the hdfs-user and cdh-user didn't come up with responses on this.) Cheers, Friso On 19 dec 2010, at 19:52, Stack wrote: On Sun, Dec 19, 2010 at 1:23 AM, Friso van Vollenhoven <[email protected]<mailto:[email protected]>> wrote: Right now, however, I am in the unpleasant situation that my NN won't come up anymore after a restart (throws NPE), so I need to get that fixed first (without formatting, because I am not very keen on running the 6 day job again). I did a restart of everything to make sure that anything that was swapped out before got back to memory, but I guess restarting the NN could have better been left for another time... You running secondary namenode? What kinda NPE you seeing? St.Ack
