Manoj Govindassamy created HDFS-10830: -----------------------------------------
Summary: FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being removed is in use Key: HDFS-10830 URL: https://issues.apache.org/jira/browse/HDFS-10830 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.0.0-alpha1 Reporter: Manoj Govindassamy Assignee: Manoj Govindassamy {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with IllegalMonitorStateException whenever the volume being removed is in use concurrently. Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is FsDatasetImpl) which it has never locked, leading to IllegalMonitorStateException. This monitor wait happens only the volume being removed is in use (referencecount > 0). The thread performing this remove volume operation thus crashes abruptly and block invalidations for the remove volumes are totally skipped. {code:title=FsDatasetImpl.java|borderStyle=solid} @Override public void removeVolumes(Set<File> volumesToRemove, boolean clearFailure) { .. .. try (AutoCloseableLock lock = datasetLock.acquire()) { <== LOCK acquire datasetLock for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) { .. .. .. asyncDiskService.removeVolume(sd.getCurrentDir()); <== volume SD1 remove volumes.removeVolume(absRoot, clearFailure); volumes.waitVolumeRemoved(5000, this); <== WAIT on "this" ?? But, we haven't locked it yet. This will cause IllegalMonitorStateException and crash getBlockReports()/FBR thread! for (String bpid : volumeMap.getBlockPoolList()) { List<ReplicaInfo> blocks = new ArrayList<>(); for (Iterator<ReplicaInfo> it = volumeMap.replicas(bpid).iterator(); it.hasNext(); ) { .. .. .. it.remove(); <== volumeMap removal } blkToInvalidate.put(bpid, blocks); } .. .. } <== LOCK release datasetLock // Call this outside the lock. for (Map.Entry<String, List<ReplicaInfo>> entry : blkToInvalidate.entrySet()) { .. for (ReplicaInfo block : blocks) { invalidate(bpid, block); <== Notify NN of Block removal } } {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org