[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan updated HDFS-8893: - Target Version/s: 3.5.0 (was: 3.4.0) > DNs with failed volumes stop serving during rolling upgrade > --- > > Key: HDFS-8893 > URL: https://issues.apache.org/jira/browse/HDFS-8893 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh Shah >Priority: Critical > > When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker > to each of their volumes. If one of the volumes is bad, this will fail. When > this failure happens, the DN does not update the key it received from the NN. > Unfortunately we had one failed volume on all the 3 datanodes which were > having replica. > Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the > DNs with failed volumes will stop serving clients. > Here is the stack trace on the datanode size: > {noformat} > 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN > datanode.DataNode: IOException in offerService > java.io.IOException: Read-only file system > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:947) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833) > at java.lang.Thread.run(Thread.java:722) > {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-8893: Target Version/s: 3.4.0 (was: 3.3.0) Moved to 3.4.0. > DNs with failed volumes stop serving during rolling upgrade > --- > > Key: HDFS-8893 > URL: https://issues.apache.org/jira/browse/HDFS-8893 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh Shah >Priority: Critical > > When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker > to each of their volumes. If one of the volumes is bad, this will fail. When > this failure happens, the DN does not update the key it received from the NN. > Unfortunately we had one failed volume on all the 3 datanodes which were > having replica. > Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the > DNs with failed volumes will stop serving clients. > Here is the stack trace on the datanode size: > {noformat} > 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN > datanode.DataNode: IOException in offerService > java.io.IOException: Read-only file system > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:947) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833) > at java.lang.Thread.run(Thread.java:722) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-8893: - Target Version/s: 3.3.0 (was: 3.2.0) > DNs with failed volumes stop serving during rolling upgrade > --- > > Key: HDFS-8893 > URL: https://issues.apache.org/jira/browse/HDFS-8893 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Daryn Sharp >Priority: Critical > > When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker > to each of their volumes. If one of the volumes is bad, this will fail. When > this failure happens, the DN does not update the key it received from the NN. > Unfortunately we had one failed volume on all the 3 datanodes which were > having replica. > Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the > DNs with failed volumes will stop serving clients. > Here is the stack trace on the datanode size: > {noformat} > 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN > datanode.DataNode: IOException in offerService > java.io.IOException: Read-only file system > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:947) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833) > at java.lang.Thread.run(Thread.java:722) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wangda Tan updated HDFS-8893: - Target Version/s: 3.2.0 (was: 3.1.0) > DNs with failed volumes stop serving during rolling upgrade > --- > > Key: HDFS-8893 > URL: https://issues.apache.org/jira/browse/HDFS-8893 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Daryn Sharp >Priority: Critical > > When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker > to each of their volumes. If one of the volumes is bad, this will fail. When > this failure happens, the DN does not update the key it received from the NN. > Unfortunately we had one failed volume on all the 3 datanodes which were > having replica. > Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the > DNs with failed volumes will stop serving clients. > Here is the stack trace on the datanode size: > {noformat} > 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN > datanode.DataNode: IOException in offerService > java.io.IOException: Read-only file system > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:947) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833) > at java.lang.Thread.run(Thread.java:722) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-8893: -- Target Version/s: 3.0.0-alpha3 (was: 2.7.4, 3.0.0-alpha3) Removing from the scope of 2.7.4. Feel free to add back if you plan to work on it. > DNs with failed volumes stop serving during rolling upgrade > --- > > Key: HDFS-8893 > URL: https://issues.apache.org/jira/browse/HDFS-8893 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Daryn Sharp >Priority: Critical > > When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker > to each of their volumes. If one of the volumes is bad, this will fail. When > this failure happens, the DN does not update the key it received from the NN. > Unfortunately we had one failed volume on all the 3 datanodes which were > having replica. > Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the > DNs with failed volumes will stop serving clients. > Here is the stack trace on the datanode size: > {noformat} > 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN > datanode.DataNode: IOException in offerService > java.io.IOException: Read-only file system > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:947) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833) > at java.lang.Thread.run(Thread.java:722) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-8893: -- Target Version/s: 2.7.4, 3.0.0-alpha3 (was: 2.7.4, 3.0.0-alpha2) > DNs with failed volumes stop serving during rolling upgrade > --- > > Key: HDFS-8893 > URL: https://issues.apache.org/jira/browse/HDFS-8893 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Daryn Sharp >Priority: Critical > > When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker > to each of their volumes. If one of the volumes is bad, this will fail. When > this failure happens, the DN does not update the key it received from the NN. > Unfortunately we had one failed volume on all the 3 datanodes which were > having replica. > Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the > DNs with failed volumes will stop serving clients. > Here is the stack trace on the datanode size: > {noformat} > 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN > datanode.DataNode: IOException in offerService > java.io.IOException: Read-only file system > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:947) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833) > at java.lang.Thread.run(Thread.java:722) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Wang updated HDFS-8893: -- Target Version/s: 2.7.4, 3.0.0-alpha2 (was: 2.7.4) > DNs with failed volumes stop serving during rolling upgrade > --- > > Key: HDFS-8893 > URL: https://issues.apache.org/jira/browse/HDFS-8893 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Daryn Sharp >Priority: Critical > > When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker > to each of their volumes. If one of the volumes is bad, this will fail. When > this failure happens, the DN does not update the key it received from the NN. > Unfortunately we had one failed volume on all the 3 datanodes which were > having replica. > Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the > DNs with failed volumes will stop serving clients. > Here is the stack trace on the datanode size: > {noformat} > 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN > datanode.DataNode: IOException in offerService > java.io.IOException: Read-only file system > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:947) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833) > at java.lang.Thread.run(Thread.java:722) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-8893: - Target Version/s: 2.7.4 (was: 2.7.3) > DNs with failed volumes stop serving during rolling upgrade > --- > > Key: HDFS-8893 > URL: https://issues.apache.org/jira/browse/HDFS-8893 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Daryn Sharp >Priority: Critical > > When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker > to each of their volumes. If one of the volumes is bad, this will fail. When > this failure happens, the DN does not update the key it received from the NN. > Unfortunately we had one failed volume on all the 3 datanodes which were > having replica. > Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the > DNs with failed volumes will stop serving clients. > Here is the stack trace on the datanode size: > {noformat} > 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN > datanode.DataNode: IOException in offerService > java.io.IOException: Read-only file system > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:947) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833) > at java.lang.Thread.run(Thread.java:722) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-8893: -- Target Version/s: 2.7.3 (was: 2.7.2) Moving this out of 2.7.2 as there's been no update in a while. > DNs with failed volumes stop serving during rolling upgrade > --- > > Key: HDFS-8893 > URL: https://issues.apache.org/jira/browse/HDFS-8893 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 2.6.0 >Reporter: Rushabh S Shah >Assignee: Daryn Sharp >Priority: Critical > > When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker > to each of their volumes. If one of the volumes is bad, this will fail. When > this failure happens, the DN does not update the key it received from the NN. > Unfortunately we had one failed volume on all the 3 datanodes which were > having replica. > Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the > DNs with failed volumes will stop serving clients. > Here is the stack trace on the datanode size: > {noformat} > 2015-08-11 07:32:28,827 [DataNode: heartbeating to 8020] WARN > datanode.DataNode: IOException in offerService > java.io.IOException: Read-only file system > at java.io.UnixFileSystem.createFileExclusively(Native Method) > at java.io.File.createNewFile(File.java:947) > at > org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721) > at > org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173) > at > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357) > at > org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) > at > org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833) > at java.lang.Thread.run(Thread.java:722) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-8893) DNs with failed volumes stop serving during rolling upgrade
[ https://issues.apache.org/jira/browse/HDFS-8893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushabh S Shah updated HDFS-8893: - Assignee: Daryn Sharp DNs with failed volumes stop serving during rolling upgrade --- Key: HDFS-8893 URL: https://issues.apache.org/jira/browse/HDFS-8893 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 2.6.0 Reporter: Rushabh S Shah Assignee: Daryn Sharp Priority: Critical When a rolling upgrade starts, all DNs try to write a rolling_upgrade marker to each of their volumes. If one of the volumes is bad, this will fail. When this failure happens, the DN does not update the key it received from the NN. Unfortunately we had one failed volume on all the 3 datanodes which were having replica. Keys expire after 20 hours so at about 20 hours into the rolling upgrade, the DNs with failed volumes will stop serving clients. Here is the stack trace on the datanode size: {noformat} 2015-08-11 07:32:28,827 [DataNode: heartbeating to nn18020] WARN datanode.DataNode: IOException in offerService java.io.IOException: Read-only file system at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:947) at org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setRollingUpgradeMarkers(BlockPoolSliceStorage.java:721) at org.apache.hadoop.hdfs.server.datanode.DataStorage.setRollingUpgradeMarker(DataStorage.java:173) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.setRollingUpgradeMarker(FsDatasetImpl.java:2357) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.signalRollingUpgrade(BPOfferService.java:480) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.handleRollingUpgradeStatus(BPServiceActor.java:626) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:677) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:833) at java.lang.Thread.run(Thread.java:722) {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)