Manoj Govindassamy created HDFS-10830:
-----------------------------------------

             Summary: FsDatasetImpl#removeVolumes() crashes with 
IllegalMonitorStateException when vol being removed is in use
                 Key: HDFS-10830
                 URL: https://issues.apache.org/jira/browse/HDFS-10830
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs
    Affects Versions: 3.0.0-alpha1
            Reporter: Manoj Govindassamy
            Assignee: Manoj Govindassamy



{{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with 
IllegalMonitorStateException whenever the volume being removed is in use 
concurrently.

Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is 
FsDatasetImpl) which it has never locked, leading to  
IllegalMonitorStateException. This monitor wait happens only the volume being 
removed is in use (referencecount > 0). The thread performing this remove 
volume operation thus crashes abruptly and block invalidations for the remove 
volumes are totally skipped. 


{code:title=FsDatasetImpl.java|borderStyle=solid}
@Override
public void removeVolumes(Set<File> volumesToRemove, boolean clearFailure) {
..
..
try (AutoCloseableLock lock = datasetLock.acquire()) {   <== LOCK acquire 
datasetLock
for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) {
  .. .. ..
  asyncDiskService.removeVolume(sd.getCurrentDir());     <== volume SD1 remove
  volumes.removeVolume(absRoot, clearFailure);
  volumes.waitVolumeRemoved(5000, this);                 <== WAIT on "this" ?? 
But, we haven't locked it yet.
                                                             This will cause 
IllegalMonitorStateException
                                                             and crash 
getBlockReports()/FBR thread!

  for (String bpid : volumeMap.getBlockPoolList()) {
    List<ReplicaInfo> blocks = new ArrayList<>();
    for (Iterator<ReplicaInfo> it = volumeMap.replicas(bpid).iterator();
         it.hasNext(); ) {
        .. .. .. 
        it.remove();                                     <== volumeMap removal
      }
    blkToInvalidate.put(bpid, blocks);
  }
 .. ..
}                                                        <== LOCK release 
datasetLock   

// Call this outside the lock.
for (Map.Entry<String, List<ReplicaInfo>> entry :
blkToInvalidate.entrySet()) {
 ..
 for (ReplicaInfo block : blocks) {
  invalidate(bpid, block);                               <== Notify NN of Block 
removal
 }
}
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to