Shashikant Banerjee created HDDS-935:
----------------------------------------

             Summary: Avoid creating an already created container on a datanode 
in case of disk removal followed by datanode restart
                 Key: HDDS-935
                 URL: https://issues.apache.org/jira/browse/HDDS-935
             Project: Hadoop Distributed Data Store
          Issue Type: Improvement
          Components: Ozone Datanode
    Affects Versions: 0.4.0
            Reporter: Rakesh R
            Assignee: Shashikant Banerjee


Currently, a container gets created when a writeChunk request comes to 
HddsDispatcher and if the container does not exist already. In case a disk on 
which a container exists gets removed and datanode restarts and now, if a 
writeChunkRequest comes , it might end up creating the same container again 
with an updated BCSID as it won't detect the disk is removed. This won't be 
detected by SCM as well as it will have the latest BCSID. This Jira aims to 
address this issue.

The proposed fix would be to persist the all the containerIds existing in the 
containerSet when a ratis snapshot is taken in the snapshot file. If the disk 
is removed and dn gets restarted, the container set will be rebuild after 
scanning all the available disks and the the container list stored in the 
snapshot file will give all the containers created in the datanode. The diff 
between these two will give the exact list of containers which were created but 
were not detected after the restart. Any writeChunk request now should validate 
the container Id from the list of missing containers. Also, we need to ensure 
container creation does not happen as part of applyTransaction of writeChunk 
request in Ratis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to