[ 
https://issues.apache.org/jira/browse/HDFS-14500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HDFS-14500 started by Erik Krogen.
------------------------------------------
> NameNode StartupProgress continues to report edit log segments after the 
> LOADING_EDITS phase is finished
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14500
>                 URL: https://issues.apache.org/jira/browse/HDFS-14500
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 3.2.0, 2.9.2, 3.0.3, 2.8.5, 3.1.2
>            Reporter: Erik Krogen
>            Assignee: Erik Krogen
>            Priority: Major
>
> When testing out a cluster with the edit log tailing fast path feature 
> enabled (HDFS-13150), an unrelated issue caused the NameNode to remain in 
> safe mode for an extended period of time, preventing the NameNode from fully 
> completing its startup sequence. We noticed that the Startup Progress web UI 
> displayed many edit log segments (millions of them).
> I traced this problem back to {{StartupProgress}}. Within 
> {{FSEditLogLoader}}, the loader continually tries to update the startup 
> progress with a new {{Step}} any time that it loads edits. Per the Javadoc 
> for {{StartupProgress}}, this should be a no-op once startup is completed:
> {code:title=StartupProgress.java}
>  * After startup completes, the tracked data is frozen.  Any subsequent 
> updates
>  * or counter increments are no-ops.
> {code}
> However, {{StartupProgress}} only implements that logic once the _entire_ 
> startup sequence has been completed. When {{FSEditLogLoader}} calls 
> {{addStep()}}, it adds it into the {{LOADING_EDITS}} phase:
> {code:title=FSEditLogLoader.java}
>     StartupProgress prog = NameNode.getStartupProgress();
>     Step step = createStartupProgressStep(edits);
>     prog.beginStep(Phase.LOADING_EDITS, step);
> {code}
> This phase, in our case, ended long before, so it is nonsensical to continue 
> to add steps to it. I believe it is a bug that {{StartupProgress}} accepts 
> such steps instead of ignoring them; once a phase is complete, it should no 
> longer change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to