Hi Stack,

The NPE is this:
10/12/18 15:39:07 WARN hdfs.StateChange: DIR* FSDirectory.unprotectedSetTimes: 
failed to setTimes 
/hbase/inrdb_ris_update_rrc00/fe5090c366e326cf2b123502e2d4bcce/data/1350525083587292896
 because source does not exist
10/12/18 15:39:07 WARN hdfs.StateChange: DIR* FSDirectory.unprotectedSetTimes: 
failed to setTimes 
/hbase/inrdb_ris_update_rrc00/fe5090c366e326cf2b123502e2d4bcce/meta/4413022065008239343
 because source does not exist
10/12/18 15:39:07 DEBUG namenode.FSNamesystem: 0: 
/hbase/.logs/w2r1.inrdb.ripe.net,60020,1292333234919/w2r1.inrdb.ripe.net%3A60020.1292336839737
 numblocks : 0 clientHolder DFSClient_131715208 clientMachine 193.0.23.32
10/12/18 15:39:07 DEBUG hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: 
failed to remove 
/hbase/.logs/w2r1.inrdb.ripe.net,60020,1292333234919/w2r1.inrdb.ripe.net%3A60020.1292336839737
 because it does not exist
10/12/18 15:39:07 ERROR namenode.NameNode: java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1088)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addChild(FSDirectory.java:1100)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.addNode(FSDirectory.java:1003)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.unprotectedAddFile(FSDirectory.java:206)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadFSEdits(FSEditLog.java:637)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSEdits(FSImage.java:1039)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:845)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:379)
at 
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:99)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:343)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:317)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:214)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:394)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1148)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1157)

10/12/18 15:39:07 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at 
m1r1.inrdb.ripe.net/193.0.23.51<http://m1r1.inrdb.ripe.net/193.0.23.51>
************************************************************/

It looks like the array populated by 
INodeDirectoryWithQuota#getExistingPathINodes(...) has a null somewhere which 
is not expected by the FSDirectory#addChild(...). I marked the entries for the 
broken file invalid in the edit log with a hex editor. Then the NN does come 
back online. It then reports that only about 88% of all blocks are being 
reported and stays in safe mode. Of course I could set the threshold lower and 
make it work, but I am wondering if it just stopped persisting edits at some 
point and the only correct version was in memory or something.

I am running a secondary NN. Restoring a checkpoint doesn't solve the problem. 
We store NN data on a filer that does hourly, daily and weekly snapshots, so I 
could probably go back to a working version, but I don't think HBase would work 
afterwards. We do a lot of updates on data so splits and compactions are quite 
common, so I guess an older version of the NN data will surely point to blocks 
that no longer exist.

We have a secured storage with all our source data, so re-importing everything 
is an option which mostly takes about two weeks of time and, above all, is 
probably quite bad for Hadoop's reputation within the organization. My main 
concern is this happening again.

(Sorry for being a bit off topic on this list, but the hdfs-user and cdh-user 
didn't come up with responses on this.)


Cheers,
Friso


On 19 dec 2010, at 19:52, Stack wrote:

On Sun, Dec 19, 2010 at 1:23 AM, Friso van Vollenhoven
<[email protected]<mailto:[email protected]>> wrote:
Right now, however, I am in the unpleasant situation that my NN won't come up 
anymore after a restart (throws NPE), so I need to get that fixed first 
(without formatting, because I am not very keen on running the 6 day job 
again). I did a restart of everything to make sure that anything that was 
swapped out before got back to memory, but I guess restarting the NN could have 
better been left for another time...


You running secondary namenode?

What kinda NPE you seeing?

St.Ack

Reply via email to