[jira] [Updated] (HDFS-10830) FsDatasetImpl#removeVolumes crashes with IllegalMonitorStateException when vol being removed is in use
[ https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10830: - Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 3.0.0-alpha2 2.8.0 Status: Resolved (was: Patch Available) Thanks [~xiaochen] and [~manojg]. I committed this for 2.8.0. Manoj I also credited you for the patch. > FsDatasetImpl#removeVolumes crashes with IllegalMonitorStateException when > vol being removed is in use > -- > > Key: HDFS-10830 > URL: https://issues.apache.org/jira/browse/HDFS-10830 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Arpit Agarwal > Fix For: 2.8.0, 3.0.0-alpha2 > > Attachments: HDFS-10830.01.patch, HDFS-10830.02.patch, > HDFS-10830.05.patch, HDFS-10830.06.patch > > > {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with > IllegalMonitorStateException whenever the volume being removed is in use > concurrently. > Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is > FsDatasetImpl) which it has never locked, leading to > IllegalMonitorStateException. This monitor wait happens only the volume being > removed is in use (referencecount > 0). The thread performing this remove > volume operation thus crashes abruptly and block invalidations for the remove > volumes are totally skipped. > {code:title=FsDatasetImpl.java|borderStyle=solid} > @Override > public void removeVolumes(Set volumesToRemove, boolean clearFailure) { > .. > .. > try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire > datasetLock > for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { > .. .. .. > asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove > volumes.removeVolume(absRoot, clearFailure); > volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" > ?? But, we haven't locked it yet. > This will cause > IllegalMonitorStateException > and crash > getBlockReports()/FBR thread! > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = volumeMap.replicas(bpid).iterator(); > it.hasNext(); ) { > .. .. .. > it.remove(); <== volumeMap removal > } > blkToInvalidate.put(bpid, blocks); > } > .. .. > }<== LOCK release > datasetLock > // Call this outside the lock. > for (Map.Entryentry : > blkToInvalidate.entrySet()) { > .. > for (ReplicaInfo block : blocks) { > invalidate(bpid, block); <== Notify NN of > Block removal > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10830) FsDatasetImpl#removeVolumes crashes with IllegalMonitorStateException when vol being removed is in use
[ https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10830: - Summary: FsDatasetImpl#removeVolumes crashes with IllegalMonitorStateException when vol being removed is in use (was: FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being removed is in use) > FsDatasetImpl#removeVolumes crashes with IllegalMonitorStateException when > vol being removed is in use > -- > > Key: HDFS-10830 > URL: https://issues.apache.org/jira/browse/HDFS-10830 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Arpit Agarwal > Attachments: HDFS-10830.01.patch, HDFS-10830.02.patch, > HDFS-10830.05.patch, HDFS-10830.06.patch > > > {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with > IllegalMonitorStateException whenever the volume being removed is in use > concurrently. > Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is > FsDatasetImpl) which it has never locked, leading to > IllegalMonitorStateException. This monitor wait happens only the volume being > removed is in use (referencecount > 0). The thread performing this remove > volume operation thus crashes abruptly and block invalidations for the remove > volumes are totally skipped. > {code:title=FsDatasetImpl.java|borderStyle=solid} > @Override > public void removeVolumes(Set volumesToRemove, boolean clearFailure) { > .. > .. > try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire > datasetLock > for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { > .. .. .. > asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove > volumes.removeVolume(absRoot, clearFailure); > volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" > ?? But, we haven't locked it yet. > This will cause > IllegalMonitorStateException > and crash > getBlockReports()/FBR thread! > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = volumeMap.replicas(bpid).iterator(); > it.hasNext(); ) { > .. .. .. > it.remove(); <== volumeMap removal > } > blkToInvalidate.put(bpid, blocks); > } > .. .. > }<== LOCK release > datasetLock > // Call this outside the lock. > for (Map.Entryentry : > blkToInvalidate.entrySet()) { > .. > for (ReplicaInfo block : blocks) { > invalidate(bpid, block); <== Notify NN of > Block removal > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10830) FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being removed is in use
[ https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10830: - Attachment: HDFS-10830.06.patch > FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when > vol being removed is in use > > > Key: HDFS-10830 > URL: https://issues.apache.org/jira/browse/HDFS-10830 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Arpit Agarwal > Attachments: HDFS-10830.01.patch, HDFS-10830.02.patch, > HDFS-10830.05.patch, HDFS-10830.06.patch > > > {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with > IllegalMonitorStateException whenever the volume being removed is in use > concurrently. > Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is > FsDatasetImpl) which it has never locked, leading to > IllegalMonitorStateException. This monitor wait happens only the volume being > removed is in use (referencecount > 0). The thread performing this remove > volume operation thus crashes abruptly and block invalidations for the remove > volumes are totally skipped. > {code:title=FsDatasetImpl.java|borderStyle=solid} > @Override > public void removeVolumes(Set volumesToRemove, boolean clearFailure) { > .. > .. > try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire > datasetLock > for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { > .. .. .. > asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove > volumes.removeVolume(absRoot, clearFailure); > volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" > ?? But, we haven't locked it yet. > This will cause > IllegalMonitorStateException > and crash > getBlockReports()/FBR thread! > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = volumeMap.replicas(bpid).iterator(); > it.hasNext(); ) { > .. .. .. > it.remove(); <== volumeMap removal > } > blkToInvalidate.put(bpid, blocks); > } > .. .. > }<== LOCK release > datasetLock > // Call this outside the lock. > for (Map.Entryentry : > blkToInvalidate.entrySet()) { > .. > for (ReplicaInfo block : blocks) { > invalidate(bpid, block); <== Notify NN of > Block removal > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10830) FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being removed is in use
[ https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10830: - Attachment: HDFS-10830.05.patch v05: Let's try this once more. I tried to keep the test changes close to what you have in your latest branch-2 patch for HDFS-9781. > FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when > vol being removed is in use > > > Key: HDFS-10830 > URL: https://issues.apache.org/jira/browse/HDFS-10830 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Arpit Agarwal > Attachments: HDFS-10830.01.patch, HDFS-10830.02.patch, > HDFS-10830.05.patch > > > {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with > IllegalMonitorStateException whenever the volume being removed is in use > concurrently. > Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is > FsDatasetImpl) which it has never locked, leading to > IllegalMonitorStateException. This monitor wait happens only the volume being > removed is in use (referencecount > 0). The thread performing this remove > volume operation thus crashes abruptly and block invalidations for the remove > volumes are totally skipped. > {code:title=FsDatasetImpl.java|borderStyle=solid} > @Override > public void removeVolumes(Set volumesToRemove, boolean clearFailure) { > .. > .. > try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire > datasetLock > for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { > .. .. .. > asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove > volumes.removeVolume(absRoot, clearFailure); > volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" > ?? But, we haven't locked it yet. > This will cause > IllegalMonitorStateException > and crash > getBlockReports()/FBR thread! > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = volumeMap.replicas(bpid).iterator(); > it.hasNext(); ) { > .. .. .. > it.remove(); <== volumeMap removal > } > blkToInvalidate.put(bpid, blocks); > } > .. .. > }<== LOCK release > datasetLock > // Call this outside the lock. > for (Map.Entryentry : > blkToInvalidate.entrySet()) { > .. > for (ReplicaInfo block : blocks) { > invalidate(bpid, block); <== Notify NN of > Block removal > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10830) FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being removed is in use
[ https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10830: - Attachment: (was: HDFS-10830.04.patch) > FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when > vol being removed is in use > > > Key: HDFS-10830 > URL: https://issues.apache.org/jira/browse/HDFS-10830 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Arpit Agarwal > Attachments: HDFS-10830.01.patch, HDFS-10830.02.patch > > > {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with > IllegalMonitorStateException whenever the volume being removed is in use > concurrently. > Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is > FsDatasetImpl) which it has never locked, leading to > IllegalMonitorStateException. This monitor wait happens only the volume being > removed is in use (referencecount > 0). The thread performing this remove > volume operation thus crashes abruptly and block invalidations for the remove > volumes are totally skipped. > {code:title=FsDatasetImpl.java|borderStyle=solid} > @Override > public void removeVolumes(Set volumesToRemove, boolean clearFailure) { > .. > .. > try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire > datasetLock > for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { > .. .. .. > asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove > volumes.removeVolume(absRoot, clearFailure); > volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" > ?? But, we haven't locked it yet. > This will cause > IllegalMonitorStateException > and crash > getBlockReports()/FBR thread! > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = volumeMap.replicas(bpid).iterator(); > it.hasNext(); ) { > .. .. .. > it.remove(); <== volumeMap removal > } > blkToInvalidate.put(bpid, blocks); > } > .. .. > }<== LOCK release > datasetLock > // Call this outside the lock. > for (Map.Entryentry : > blkToInvalidate.entrySet()) { > .. > for (ReplicaInfo block : blocks) { > invalidate(bpid, block); <== Notify NN of > Block removal > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10830) FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being removed is in use
[ https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10830: - Attachment: (was: HDFS-10830.03.patch) > FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when > vol being removed is in use > > > Key: HDFS-10830 > URL: https://issues.apache.org/jira/browse/HDFS-10830 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Arpit Agarwal > Attachments: HDFS-10830.01.patch, HDFS-10830.02.patch > > > {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with > IllegalMonitorStateException whenever the volume being removed is in use > concurrently. > Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is > FsDatasetImpl) which it has never locked, leading to > IllegalMonitorStateException. This monitor wait happens only the volume being > removed is in use (referencecount > 0). The thread performing this remove > volume operation thus crashes abruptly and block invalidations for the remove > volumes are totally skipped. > {code:title=FsDatasetImpl.java|borderStyle=solid} > @Override > public void removeVolumes(Set volumesToRemove, boolean clearFailure) { > .. > .. > try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire > datasetLock > for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { > .. .. .. > asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove > volumes.removeVolume(absRoot, clearFailure); > volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" > ?? But, we haven't locked it yet. > This will cause > IllegalMonitorStateException > and crash > getBlockReports()/FBR thread! > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = volumeMap.replicas(bpid).iterator(); > it.hasNext(); ) { > .. .. .. > it.remove(); <== volumeMap removal > } > blkToInvalidate.put(bpid, blocks); > } > .. .. > }<== LOCK release > datasetLock > // Call this outside the lock. > for (Map.Entryentry : > blkToInvalidate.entrySet()) { > .. > for (ReplicaInfo block : blocks) { > invalidate(bpid, block); <== Notify NN of > Block removal > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10830) FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being removed is in use
[ https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10830: - Attachment: HDFS-10830.04.patch Thanks [~manojg]. Updated patch attached (although it still times out for me, pending HDFS-9781 I assume). > FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when > vol being removed is in use > > > Key: HDFS-10830 > URL: https://issues.apache.org/jira/browse/HDFS-10830 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Arpit Agarwal > Attachments: HDFS-10830.01.patch, HDFS-10830.02.patch, > HDFS-10830.03.patch, HDFS-10830.04.patch > > > {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with > IllegalMonitorStateException whenever the volume being removed is in use > concurrently. > Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is > FsDatasetImpl) which it has never locked, leading to > IllegalMonitorStateException. This monitor wait happens only the volume being > removed is in use (referencecount > 0). The thread performing this remove > volume operation thus crashes abruptly and block invalidations for the remove > volumes are totally skipped. > {code:title=FsDatasetImpl.java|borderStyle=solid} > @Override > public void removeVolumes(Set volumesToRemove, boolean clearFailure) { > .. > .. > try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire > datasetLock > for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { > .. .. .. > asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove > volumes.removeVolume(absRoot, clearFailure); > volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" > ?? But, we haven't locked it yet. > This will cause > IllegalMonitorStateException > and crash > getBlockReports()/FBR thread! > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = volumeMap.replicas(bpid).iterator(); > it.hasNext(); ) { > .. .. .. > it.remove(); <== volumeMap removal > } > blkToInvalidate.put(bpid, blocks); > } > .. .. > }<== LOCK release > datasetLock > // Call this outside the lock. > for (Map.Entryentry : > blkToInvalidate.entrySet()) { > .. > for (ReplicaInfo block : blocks) { > invalidate(bpid, block); <== Notify NN of > Block removal > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10830) FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being removed is in use
[ https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10830: - Attachment: HDFS-10830.03.patch Attaching the patch with the correct name. Hi [~manojg], can you please point me to the exact location of the test workaround so I know what to remove? > FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when > vol being removed is in use > > > Key: HDFS-10830 > URL: https://issues.apache.org/jira/browse/HDFS-10830 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Arpit Agarwal > Attachments: HDFS-10830.01.patch, HDFS-10830.02.patch, > HDFS-10830.03.patch > > > {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with > IllegalMonitorStateException whenever the volume being removed is in use > concurrently. > Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is > FsDatasetImpl) which it has never locked, leading to > IllegalMonitorStateException. This monitor wait happens only the volume being > removed is in use (referencecount > 0). The thread performing this remove > volume operation thus crashes abruptly and block invalidations for the remove > volumes are totally skipped. > {code:title=FsDatasetImpl.java|borderStyle=solid} > @Override > public void removeVolumes(Set volumesToRemove, boolean clearFailure) { > .. > .. > try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire > datasetLock > for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { > .. .. .. > asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove > volumes.removeVolume(absRoot, clearFailure); > volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" > ?? But, we haven't locked it yet. > This will cause > IllegalMonitorStateException > and crash > getBlockReports()/FBR thread! > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = volumeMap.replicas(bpid).iterator(); > it.hasNext(); ) { > .. .. .. > it.remove(); <== volumeMap removal > } > blkToInvalidate.put(bpid, blocks); > } > .. .. > }<== LOCK release > datasetLock > // Call this outside the lock. > for (Map.Entryentry : > blkToInvalidate.entrySet()) { > .. > for (ReplicaInfo block : blocks) { > invalidate(bpid, block); <== Notify NN of > Block removal > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10830) FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being removed is in use
[ https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10830: - Attachment: (was: HDFS-10830.01.patch) > FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when > vol being removed is in use > > > Key: HDFS-10830 > URL: https://issues.apache.org/jira/browse/HDFS-10830 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Arpit Agarwal > Attachments: HDFS-10830.01.patch, HDFS-10830.02.patch > > > {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with > IllegalMonitorStateException whenever the volume being removed is in use > concurrently. > Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is > FsDatasetImpl) which it has never locked, leading to > IllegalMonitorStateException. This monitor wait happens only the volume being > removed is in use (referencecount > 0). The thread performing this remove > volume operation thus crashes abruptly and block invalidations for the remove > volumes are totally skipped. > {code:title=FsDatasetImpl.java|borderStyle=solid} > @Override > public void removeVolumes(Set volumesToRemove, boolean clearFailure) { > .. > .. > try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire > datasetLock > for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { > .. .. .. > asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove > volumes.removeVolume(absRoot, clearFailure); > volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" > ?? But, we haven't locked it yet. > This will cause > IllegalMonitorStateException > and crash > getBlockReports()/FBR thread! > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = volumeMap.replicas(bpid).iterator(); > it.hasNext(); ) { > .. .. .. > it.remove(); <== volumeMap removal > } > blkToInvalidate.put(bpid, blocks); > } > .. .. > }<== LOCK release > datasetLock > // Call this outside the lock. > for (Map.Entryentry : > blkToInvalidate.entrySet()) { > .. > for (ReplicaInfo block : blocks) { > invalidate(bpid, block); <== Notify NN of > Block removal > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10830) FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being removed is in use
[ https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10830: - Attachment: HDFS-10830.01.patch v03 patch: rebased to trunk. > FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when > vol being removed is in use > > > Key: HDFS-10830 > URL: https://issues.apache.org/jira/browse/HDFS-10830 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Arpit Agarwal > Attachments: HDFS-10830.01.patch, HDFS-10830.01.patch, > HDFS-10830.02.patch > > > {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with > IllegalMonitorStateException whenever the volume being removed is in use > concurrently. > Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is > FsDatasetImpl) which it has never locked, leading to > IllegalMonitorStateException. This monitor wait happens only the volume being > removed is in use (referencecount > 0). The thread performing this remove > volume operation thus crashes abruptly and block invalidations for the remove > volumes are totally skipped. > {code:title=FsDatasetImpl.java|borderStyle=solid} > @Override > public void removeVolumes(Set volumesToRemove, boolean clearFailure) { > .. > .. > try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire > datasetLock > for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { > .. .. .. > asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove > volumes.removeVolume(absRoot, clearFailure); > volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" > ?? But, we haven't locked it yet. > This will cause > IllegalMonitorStateException > and crash > getBlockReports()/FBR thread! > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = volumeMap.replicas(bpid).iterator(); > it.hasNext(); ) { > .. .. .. > it.remove(); <== volumeMap removal > } > blkToInvalidate.put(bpid, blocks); > } > .. .. > }<== LOCK release > datasetLock > // Call this outside the lock. > for (Map.Entryentry : > blkToInvalidate.entrySet()) { > .. > for (ReplicaInfo block : blocks) { > invalidate(bpid, block); <== Notify NN of > Block removal > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10830) FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being removed is in use
[ https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10830: - Attachment: HDFS-10830.02.patch > FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when > vol being removed is in use > > > Key: HDFS-10830 > URL: https://issues.apache.org/jira/browse/HDFS-10830 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Arpit Agarwal > Attachments: HDFS-10830.01.patch, HDFS-10830.02.patch > > > {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with > IllegalMonitorStateException whenever the volume being removed is in use > concurrently. > Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is > FsDatasetImpl) which it has never locked, leading to > IllegalMonitorStateException. This monitor wait happens only the volume being > removed is in use (referencecount > 0). The thread performing this remove > volume operation thus crashes abruptly and block invalidations for the remove > volumes are totally skipped. > {code:title=FsDatasetImpl.java|borderStyle=solid} > @Override > public void removeVolumes(Set volumesToRemove, boolean clearFailure) { > .. > .. > try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire > datasetLock > for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { > .. .. .. > asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove > volumes.removeVolume(absRoot, clearFailure); > volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" > ?? But, we haven't locked it yet. > This will cause > IllegalMonitorStateException > and crash > getBlockReports()/FBR thread! > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = volumeMap.replicas(bpid).iterator(); > it.hasNext(); ) { > .. .. .. > it.remove(); <== volumeMap removal > } > blkToInvalidate.put(bpid, blocks); > } > .. .. > }<== LOCK release > datasetLock > // Call this outside the lock. > for (Map.Entryentry : > blkToInvalidate.entrySet()) { > .. > for (ReplicaInfo block : blocks) { > invalidate(bpid, block); <== Notify NN of > Block removal > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10830) FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being removed is in use
[ https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10830: - Status: Patch Available (was: Open) > FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when > vol being removed is in use > > > Key: HDFS-10830 > URL: https://issues.apache.org/jira/browse/HDFS-10830 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-10830.01.patch > > > {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with > IllegalMonitorStateException whenever the volume being removed is in use > concurrently. > Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is > FsDatasetImpl) which it has never locked, leading to > IllegalMonitorStateException. This monitor wait happens only the volume being > removed is in use (referencecount > 0). The thread performing this remove > volume operation thus crashes abruptly and block invalidations for the remove > volumes are totally skipped. > {code:title=FsDatasetImpl.java|borderStyle=solid} > @Override > public void removeVolumes(Set volumesToRemove, boolean clearFailure) { > .. > .. > try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire > datasetLock > for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { > .. .. .. > asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove > volumes.removeVolume(absRoot, clearFailure); > volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" > ?? But, we haven't locked it yet. > This will cause > IllegalMonitorStateException > and crash > getBlockReports()/FBR thread! > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = volumeMap.replicas(bpid).iterator(); > it.hasNext(); ) { > .. .. .. > it.remove(); <== volumeMap removal > } > blkToInvalidate.put(bpid, blocks); > } > .. .. > }<== LOCK release > datasetLock > // Call this outside the lock. > for (Map.Entryentry : > blkToInvalidate.entrySet()) { > .. > for (ReplicaInfo block : blocks) { > invalidate(bpid, block); <== Notify NN of > Block removal > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-10830) FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being removed is in use
[ https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-10830: - Attachment: HDFS-10830.01.patch > FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when > vol being removed is in use > > > Key: HDFS-10830 > URL: https://issues.apache.org/jira/browse/HDFS-10830 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.0.0-alpha1 >Reporter: Manoj Govindassamy >Assignee: Manoj Govindassamy > Attachments: HDFS-10830.01.patch > > > {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with > IllegalMonitorStateException whenever the volume being removed is in use > concurrently. > Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is > FsDatasetImpl) which it has never locked, leading to > IllegalMonitorStateException. This monitor wait happens only the volume being > removed is in use (referencecount > 0). The thread performing this remove > volume operation thus crashes abruptly and block invalidations for the remove > volumes are totally skipped. > {code:title=FsDatasetImpl.java|borderStyle=solid} > @Override > public void removeVolumes(Set volumesToRemove, boolean clearFailure) { > .. > .. > try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire > datasetLock > for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { > .. .. .. > asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove > volumes.removeVolume(absRoot, clearFailure); > volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" > ?? But, we haven't locked it yet. > This will cause > IllegalMonitorStateException > and crash > getBlockReports()/FBR thread! > for (String bpid : volumeMap.getBlockPoolList()) { > List blocks = new ArrayList<>(); > for (Iterator it = volumeMap.replicas(bpid).iterator(); > it.hasNext(); ) { > .. .. .. > it.remove(); <== volumeMap removal > } > blkToInvalidate.put(bpid, blocks); > } > .. .. > }<== LOCK release > datasetLock > // Call this outside the lock. > for (Map.Entryentry : > blkToInvalidate.entrySet()) { > .. > for (ReplicaInfo block : blocks) { > invalidate(bpid, block); <== Notify NN of > Block removal > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org