[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-4176: Resolution: Fixed Fix Version/s: 3.0.0-alpha2 2.9.0 Status: Resolved (was: Patch Available) Thanks for the reviews [~xiaochen] [~jingzhao]. I committed it to both branch-2 and trunk. And thanks [~Surendra Singh Lilhore] for the suggestions. I filed HDFS-10734 for changing the key names. > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Fix For: 2.9.0, 3.0.0-alpha2 > > Attachments: HDFS-4176-branch-2.0.patch, > HDFS-4176-branch-2.003.patch, HDFS-4176-branch-2.1.patch, > HDFS-4176-branch-2.2.patch, HDFS-4176.00.patch, HDFS-4176.01.patch, > HDFS-4176.02.patch, HDFS-4176.03.patch, HDFS-4176.04.patch, namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-4176: Attachment: HDFS-4176-branch-2.003.patch > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Attachments: HDFS-4176-branch-2.0.patch, > HDFS-4176-branch-2.003.patch, HDFS-4176-branch-2.1.patch, > HDFS-4176-branch-2.2.patch, HDFS-4176.00.patch, HDFS-4176.01.patch, > HDFS-4176.02.patch, HDFS-4176.03.patch, HDFS-4176.04.patch, namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-4176: Attachment: (was: HDFS-4176-branch-2.3.patch) > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Attachments: HDFS-4176-branch-2.0.patch, > HDFS-4176-branch-2.003.patch, HDFS-4176-branch-2.1.patch, > HDFS-4176-branch-2.2.patch, HDFS-4176.00.patch, HDFS-4176.01.patch, > HDFS-4176.02.patch, HDFS-4176.03.patch, HDFS-4176.04.patch, namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-4176: Attachment: HDFS-4176-branch-2.3.patch rebase and upload > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Attachments: HDFS-4176-branch-2.0.patch, HDFS-4176-branch-2.1.patch, > HDFS-4176-branch-2.2.patch, HDFS-4176-branch-2.3.patch, HDFS-4176.00.patch, > HDFS-4176.01.patch, HDFS-4176.02.patch, HDFS-4176.03.patch, > HDFS-4176.04.patch, namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-4176: Attachment: HDFS-4176-branch-2.2.patch Update the patch, fix {{TestHdfsConfigFields}}. [~jingzhao] would be much appreciated to have another review. > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Attachments: HDFS-4176-branch-2.0.patch, HDFS-4176-branch-2.1.patch, > HDFS-4176-branch-2.2.patch, HDFS-4176.00.patch, HDFS-4176.01.patch, > HDFS-4176.02.patch, HDFS-4176.03.patch, HDFS-4176.04.patch, namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-4176: Attachment: HDFS-4176-branch-2.1.patch Hi, [~jingzhao] I changed {{getNameNodeProxy}} to {{getRollEditsTask}} in branch-2. Could you take a look? Thanks. > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Attachments: HDFS-4176-branch-2.0.patch, HDFS-4176-branch-2.1.patch, > HDFS-4176.00.patch, HDFS-4176.01.patch, HDFS-4176.02.patch, > HDFS-4176.03.patch, HDFS-4176.04.patch, namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-4176: Attachment: HDFS-4176-branch-2.0.patch Hi, [~jingzhao] I made a patch for {{branch-2}}. The conflicts between trunk and branch-2 are due to {{MultipleNameNodeProxy}} not exist on branch-2. So I put {{getActiveNodeProxy.rollEditLog()}} to a simple {{Callable#call()}}. Would you mind to take another look? Thanks > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Attachments: HDFS-4176-branch-2.0.patch, HDFS-4176.00.patch, > HDFS-4176.01.patch, HDFS-4176.02.patch, HDFS-4176.03.patch, > HDFS-4176.04.patch, namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-4176: Attachment: HDFS-4176.04.patch Sure, [~jingzhao] Updated the patch to fix the conflicts from HDFS-10519. > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Attachments: HDFS-4176.00.patch, HDFS-4176.01.patch, > HDFS-4176.02.patch, HDFS-4176.03.patch, HDFS-4176.04.patch, namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-4176: Hadoop Flags: Reviewed > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Attachments: HDFS-4176.00.patch, HDFS-4176.01.patch, > HDFS-4176.02.patch, HDFS-4176.03.patch, namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-4176: Attachment: HDFS-4176.03.patch Thanks, [~jingzhao]. Updated the patch to use {{ThreadFactoryBuilder}}. > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Attachments: HDFS-4176.00.patch, HDFS-4176.01.patch, > HDFS-4176.02.patch, HDFS-4176.03.patch, namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-4176: Attachment: HDFS-4176.02.patch Thanks a lot for the great inputs, [~jingzhao] I have updated the patch to address all your comments. The test failures are not relevant. {{TestDFSCLI}} fails on trunk as well. > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Attachments: HDFS-4176.00.patch, HDFS-4176.01.patch, > HDFS-4176.02.patch, namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-4176: Attachment: HDFS-4176.01.patch Fix check style errors and test failures. > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Attachments: HDFS-4176.00.patch, HDFS-4176.01.patch, namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-4176: Attachment: HDFS-4176.00.patch Upload a patch to use {{Future}} to wrap {{rollEdit}} with a configurable timeout. > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Attachments: HDFS-4176.00.patch, namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-4176: Fix Version/s: (was: 3.0.0-alpha1) Target Version/s: 2.9.0, 3.0.0-alpha2 (was: 2.0.3-alpha) Status: Patch Available (was: Reopened) > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Attachments: HDFS-4176.00.patch, namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lei (Eddy) Xu updated HDFS-4176: Fix Version/s: 3.0.0-alpha1 > EditLogTailer should call rollEdits with a timeout > -- > > Key: HDFS-4176 > URL: https://issues.apache.org/jira/browse/HDFS-4176 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, namenode >Affects Versions: 2.0.2-alpha, 3.0.0-alpha1 >Reporter: Todd Lipcon >Assignee: Lei (Eddy) Xu > Fix For: 3.0.0-alpha1 > > Attachments: namenode.jstack4 > > > When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it > currently does so without a timeout. So, if the active NN has frozen (but not > actually crashed), this call can hang forever. This can then potentially > prevent the standby from becoming active. > This may actually considered a side effect of HADOOP-6762 -- if the RPC were > interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-4176) EditLogTailer should call rollEdits with a timeout
[ https://issues.apache.org/jira/browse/HDFS-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marc Heide updated HDFS-4176: - Attachment: namenode.jstack4 jstack of standby NameNode trying to get active EditLogTailer should call rollEdits with a timeout -- Key: HDFS-4176 URL: https://issues.apache.org/jira/browse/HDFS-4176 Project: Hadoop HDFS Issue Type: Bug Components: ha, namenode Affects Versions: 3.0.0, 2.0.2-alpha Reporter: Todd Lipcon Attachments: namenode.jstack4 When the EditLogTailer thread calls rollEdits() on the active NN via RPC, it currently does so without a timeout. So, if the active NN has frozen (but not actually crashed), this call can hang forever. This can then potentially prevent the standby from becoming active. This may actually considered a side effect of HADOOP-6762 -- if the RPC were interruptible, that would also fix the issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)