[jira] [Updated] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-15963: Priority: Critical (was: Major) > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Critical > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.4 > > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, > HDFS-15963.003.patch > > Time Spent: 5h 10m > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HDFS-15963: Fix Version/s: 3.2.4 > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 2.10.2, 3.2.4 > > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, > HDFS-15963.003.patch > > Time Spent: 5h 10m > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-15963: -- Fix Version/s: 2.10.2 > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0, 2.10.2 > > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, > HDFS-15963.003.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15963: --- Fix Version/s: 3.3.1 > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.3.1, 3.4.0 > > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, > HDFS-15963.003.patch > > Time Spent: 4.5h > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15963: --- Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed Status: Resolved (was: Patch Available) Committed to trunk. Thanks [~zhangshuyan] for your report and contribution! Thanks [~weichiu] for your reviews. > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, > HDFS-15963.003.patch > > Time Spent: 4h > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-15963: Attachment: HDFS-15963.003.patch > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch, > HDFS-15963.003.patch > > Time Spent: 20m > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-15963: Attachment: HDFS-15963.002.patch > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15963.001.patch, HDFS-15963.002.patch > > Time Spent: 20m > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He updated HDFS-15963: --- Status: Patch Available (was: Open) > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15963.001.patch > > Time Spent: 20m > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-15963: -- Labels: pull-request-available (was: ) > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Priority: Major > Labels: pull-request-available > Attachments: HDFS-15963.001.patch > > Time Spent: 10m > Remaining Estimate: 0h > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15963) Unreleased volume references cause an infinite loop
[ https://issues.apache.org/jira/browse/HDFS-15963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang updated HDFS-15963: Attachment: HDFS-15963.001.patch > Unreleased volume references cause an infinite loop > --- > > Key: HDFS-15963 > URL: https://issues.apache.org/jira/browse/HDFS-15963 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: Shuyan Zhang >Priority: Major > Attachments: HDFS-15963.001.patch > > > When BlockSender throws an exception because the meta-data cannot be found, > the volume reference obtained by the thread is not released, which causes the > thread trying to remove the volume to wait and fall into an infinite loop. > {code:java} > boolean checkVolumesRemoved() { > Iterator it = volumesBeingRemoved.iterator(); > while (it.hasNext()) { > FsVolumeImpl volume = it.next(); > if (!volume.checkClosed()) { > return false; > } > it.remove(); > } > return true; > } > boolean checkClosed() { > // always be true. > if (this.reference.getReferenceCount() > 0) { > FsDatasetImpl.LOG.debug("The reference count for {} is {}, wait to be 0.", > this, reference.getReferenceCount()); > return false; > } > return true; > } > {code} > At the same time, because the thread has been holding checkDirsLock when > removing the volume, other threads trying to acquire the same lock will be > permanently blocked. > Similar problems also occur in RamDiskAsyncLazyPersistService and > FsDatasetAsyncDiskService. > This patch releases the three previously unreleased volume references. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org