[jira] [Commented] (HDFS-11293) FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation
[ https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15803261#comment-15803261 ] Yuanbo Liu commented on HDFS-11293: --- [~umamaheswararao] Thanks for your response. I'll attach a test case for this issue. > FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation > --- > > Key: HDFS-11293 > URL: https://issues.apache.org/jira/browse/HDFS-11293 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yuanbo Liu >Assignee: Yuanbo Liu >Priority: Critical > > In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica > info by block pool id. But in this situation: > {code} > datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}. > 1. the same block replica exists in A[DISK] and B[DISK]. > 2. the block pool id of datanode A and datanode B are the same. > {code} > Then we start to change the file's storage policy and move the block replica > in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at > this time, datanode A throws ReplicaAlreadyExistsException and it's not a > correct behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11293) FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation
[ https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802413#comment-15802413 ] Uma Maheswara Rao G commented on HDFS-11293: [~yuanbo], {quote} A[DISK], not A[SSD]. {quote} This should have selected as part of chooseTargetInSameNode. If the target is in same node, it should move little differently. Related code to be executed in this case is In DataXceiver#replaceBlock {code} // Move the block to different storage in the same datanode if (proxySource.equals(datanode.getDatanodeId())) { ReplicaInfo oldReplica = datanode.data.moveBlockAcrossStorage(block, storageType); if (oldReplica != null) { LOG.info("Moved " + block + " from StorageType " + oldReplica.getVolume().getStorageType() + " to " + storageType); } } else { {code} Can you confirm code flow going this way? It would be great if you can attach test case here. Also if this reproducing consistently? > FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation > --- > > Key: HDFS-11293 > URL: https://issues.apache.org/jira/browse/HDFS-11293 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yuanbo Liu >Assignee: Yuanbo Liu >Priority: Critical > > In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica > info by block pool id. But in this situation: > {code} > datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}. > 1. the same block replica exists in A[DISK] and B[DISK]. > 2. the block pool id of datanode A and datanode B are the same. > {code} > Then we start to change the file's storage policy and move the block replica > in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at > this time, datanode A throws ReplicaAlreadyExistsException and it's not a > correct behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11293) FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation
[ https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800939#comment-15800939 ] Yuanbo Liu commented on HDFS-11293: --- [~umamaheswararao] Thanks for your response. {code} the scheduling is wrong if that happening right? {code} The current answer is yes and I've encountered it when I test SPS. But in general speaking, A[SSD] chosen as a target seems reasonable because the block replica exists in A[DISK], not A[SSD]. Are there any considerations about not putting replica in the same node with different storage type/dir? > FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation > --- > > Key: HDFS-11293 > URL: https://issues.apache.org/jira/browse/HDFS-11293 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yuanbo Liu >Assignee: Yuanbo Liu >Priority: Critical > > In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica > info by block pool id. But in this situation: > {code} > datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}. > 1. the same block replica exists in A[DISK] and B[DISK]. > 2. the block pool id of datanode A and datanode B are the same. > {code} > Then we start to change the file's storage policy and move the block replica > in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at > this time, datanode A throws ReplicaAlreadyExistsException and it's not a > correct behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11293) FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation
[ https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800742#comment-15800742 ] Uma Maheswara Rao G commented on HDFS-11293: [~yuanbo], I am wondering how 'A' chosen as target when replica already there in that node. the scheduling is wrong if that happening right? Can you explain a little more whats your scenario? > FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation > --- > > Key: HDFS-11293 > URL: https://issues.apache.org/jira/browse/HDFS-11293 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yuanbo Liu >Assignee: Yuanbo Liu >Priority: Critical > > In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica > info by block pool id. But in this situation: > {code} > datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}. > 1. the same block replica exists in A[DISK] and B[DISK]. > 2. the block pool id of datanode A and datanode B are the same. > {code} > Then we start to change the file's storage policy and move the block replica > in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at > this time, datanode A throws ReplicaAlreadyExistsException and it's not a > correct behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-11293) FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation
[ https://issues.apache.org/jira/browse/HDFS-11293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800624#comment-15800624 ] Yuanbo Liu commented on HDFS-11293: --- [~umamaheswararao] / [~rakeshr] I tag you here because this situation always make SPS not stable even without my persistence code. And I don't think this issue is caused by SPS. It's a common issue. If you have any thoughts about this JIRA, please let me know, thanks in advance! > FsDatasetImpl throws ReplicaAlreadyExistsException in a wrong situation > --- > > Key: HDFS-11293 > URL: https://issues.apache.org/jira/browse/HDFS-11293 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Yuanbo Liu >Assignee: Yuanbo Liu >Priority: Critical > > In {{FsDatasetImpl#createTemporary}}, we use {{volumeMap}} to get replica > info by block pool id. But in this situation: > {code} > datanode A => {DISK, SSD}, datanode B => {DISK, ARCHIVE}. > 1. the same block replica exists in A[DISK] and B[DISK]. > 2. the block pool id of datanode A and datanode B are the same. > {code} > Then we start to change the file's storage policy and move the block replica > in the cluster. Very likely we have to move block from B[DISK] to A[SSD], at > this time, datanode A throws ReplicaAlreadyExistsException and it's not a > correct behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org