Aaron T. Myers created HDFS-6289:
------------------------------------

             Summary: HA failover can fail if there are pending DN messages for 
DNs which no longer exist
                 Key: HDFS-6289
                 URL: https://issues.apache.org/jira/browse/HDFS-6289
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: ha
    Affects Versions: 2.4.0
            Reporter: Aaron T. Myers
            Assignee: Aaron T. Myers
            Priority: Critical


In an HA setup, the standby NN may receive messages from DNs for blocks which 
the standby NN is not yet aware of. It queues up these messages and replays 
them when it next reads from the edit log or fails over. On a failover, all of 
these pending DN messages must be processed successfully in order for the 
failover to succeed. If one of these pending DN messages refers to a DN 
storageId that no longer exists (because the DN with that transfer address has 
been reformatted and has re-registered with the same transfer address) then on 
transition to active the NN will not be able to process this DN message and 
will suicide with an error like the following:

{noformat}
2014-04-25 14:23:17,922 FATAL namenode.NameNode 
(NameNode.java:doImmediateShutdown(1525)) - Error encountered requiring NN 
shutdown. Shutting down immediately.
java.io.IOException: Cannot mark blk_1073741825_900(stored=blk_1073741825_1001) 
as corrupt because datanode 127.0.0.1:33324 does not exist
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to