[jira] [Updated] (HDFS-15414) java.net.SocketException: Original Exception : java.io.IOException: Broken pipe

2020-06-15 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YCozy updated HDFS-15414: - Description: We observed this exception in a DataNode's log while we are not shutting down any nodes in the

[jira] [Updated] (HDFS-15414) java.net.SocketException: Original Exception : java.io.IOException: Broken pipe

2020-06-15 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YCozy updated HDFS-15414: - Description: We observed this exception in a DataNode's log while we are not shutting down any nodes in the

[jira] [Created] (HDFS-15414) java.net.SocketException: Original Exception : java.io.IOException: Broken pipe

2020-06-15 Thread YCozy (Jira)
YCozy created HDFS-15414: Summary: java.net.SocketException: Original Exception : java.io.IOException: Broken pipe Key: HDFS-15414 URL: https://issues.apache.org/jira/browse/HDFS-15414 Project: Hadoop HDFS

[jira] [Created] (HDFS-15367) Fail to get file checksum even if there's an available replica.

2020-05-20 Thread YCozy (Jira)
YCozy created HDFS-15367: Summary: Fail to get file checksum even if there's an available replica. Key: HDFS-15367 URL: https://issues.apache.org/jira/browse/HDFS-15367 Project: Hadoop HDFS Issue

[jira] [Commented] (HDFS-15235) Transient network failure during NameNode failover kills the NameNode

2020-04-01 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072805#comment-17072805 ] YCozy commented on HDFS-15235: -- Hello [~weichiu], thanks for looking into this! For triggering this bug,

[jira] [Commented] (HDFS-15235) Transient network failure during NameNode failover kills the NameNode

2020-03-31 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071727#comment-17071727 ] YCozy commented on HDFS-15235: -- Hello [~weichiu], would you please help review the patch? Thanks! >

[jira] [Commented] (HDFS-15235) Transient network failure during NameNode failover kills the NameNode

2020-03-30 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071006#comment-17071006 ] YCozy commented on HDFS-15235: -- Hello [~ayushtkn], I've made the title more accurate. Could you please help

[jira] [Commented] (HDFS-15235) Transient network failure during NameNode failover kills the NameNode

2020-03-28 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069462#comment-17069462 ] YCozy commented on HDFS-15235: -- Also, NN2 shouldn't be killed because the fencing should be invoked only

[jira] [Commented] (HDFS-15235) Transient network failure during NameNode failover kills the NameNode

2020-03-28 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069461#comment-17069461 ] YCozy commented on HDFS-15235: -- Thank you [~ayushtkn]! Upon further analysis we found that NN1 did become

[jira] [Updated] (HDFS-15235) Transient network failure during NameNode failover kills the NameNode

2020-03-28 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YCozy updated HDFS-15235: - Summary: Transient network failure during NameNode failover kills the NameNode (was: Transient network failure

[jira] [Commented] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable

2020-03-28 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17069442#comment-17069442 ] YCozy commented on HDFS-15235: -- [~hemanthboyina], [~elgoiri], would you be so kind to help review the patch?

[jira] [Commented] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable

2020-03-26 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067649#comment-17067649 ] YCozy commented on HDFS-15235: -- [~ayushtkn], could you please take a look at the patch? Thanks! > Transient

[jira] [Updated] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable

2020-03-25 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YCozy updated HDFS-15235: - Attachment: HDFS-15235.001.patch Status: Patch Available (was: Open) Attaching a patch with both the UT

[jira] [Commented] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable

2020-03-24 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066043#comment-17066043 ] YCozy commented on HDFS-15235: -- [~ayushtkn] Thanks for looking at this! I'll try to upload a UT and a fix.

[jira] [Commented] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable

2020-03-24 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17066042#comment-17066042 ] YCozy commented on HDFS-15235: -- A bit more info: After NN2 fails to send back a response, haadmin first

[jira] [Updated] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable

2020-03-24 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YCozy updated HDFS-15235: - Description: We have an HA cluster with two NameNodes: an active NN1 and a standby NN2. At some point, NN1

[jira] [Updated] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable

2020-03-23 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YCozy updated HDFS-15235: - Description: We have an HA cluster with two NameNodes: an active NN1 and a standby NN2. At some point, NN1

[jira] [Updated] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable

2020-03-23 Thread YCozy (Jira)
[ https://issues.apache.org/jira/browse/HDFS-15235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] YCozy updated HDFS-15235: - Description: We have an HA cluster with two NameNodes: an active NN1 and a standby NN2. At some point, NN1

[jira] [Created] (HDFS-15235) Transient network failure during NameNode failover makes cluster unavailable

2020-03-23 Thread YCozy (Jira)
YCozy created HDFS-15235: Summary: Transient network failure during NameNode failover makes cluster unavailable Key: HDFS-15235 URL: https://issues.apache.org/jira/browse/HDFS-15235 Project: Hadoop HDFS