[jira] [Updated] (HDFS-9516) truncate file fails with data dirs on multiple disks
[ https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated HDFS-9516: - Fix Version/s: 2.8.0 > truncate file fails with data dirs on multiple disks > > > Key: HDFS-9516 > URL: https://issues.apache.org/jira/browse/HDFS-9516 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Bogdan Raducanu >Assignee: Plamen Jeliazkov > Fix For: 2.8.0, 2.7.3, 3.0.0-alpha1 > > Attachments: HDFS-9516_1.patch, HDFS-9516_2.patch, HDFS-9516_3.patch, > HDFS-9516_testFailures.patch, Main.java, truncate.dn.log > > > FileSystem.truncate returns false (no exception) but the file is never closed > and not writable after this. > It seems to be because of copy on truncate which is used because the system > is in upgrade state. In this case a rename between devices is attempted. > See attached log and repro code. > Probably also affects truncate snapshotted file when copy on truncate is also > used. > Possibly it affects not only truncate but any block recovery. > I think the problem is in updateReplicaUnderRecovery > {code} > ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten( > newBlockId, recoveryId, rur.getVolume(), > blockFile.getParentFile(), > newlength); > {code} > blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to > choose any volume so rur.getVolume() is not where the block is located. > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-9516) truncate file fails with data dirs on multiple disks
[ https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-9516: -- Fix Version/s: (was: 2.9.0) 2.7.3 Committed to branch-2.8 and branch-2.7. > truncate file fails with data dirs on multiple disks > > > Key: HDFS-9516 > URL: https://issues.apache.org/jira/browse/HDFS-9516 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Bogdan Raducanu >Assignee: Plamen Jeliazkov > Fix For: 2.7.3 > > Attachments: HDFS-9516_1.patch, HDFS-9516_2.patch, HDFS-9516_3.patch, > HDFS-9516_testFailures.patch, Main.java, truncate.dn.log > > > FileSystem.truncate returns false (no exception) but the file is never closed > and not writable after this. > It seems to be because of copy on truncate which is used because the system > is in upgrade state. In this case a rename between devices is attempted. > See attached log and repro code. > Probably also affects truncate snapshotted file when copy on truncate is also > used. > Possibly it affects not only truncate but any block recovery. > I think the problem is in updateReplicaUnderRecovery > {code} > ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten( > newBlockId, recoveryId, rur.getVolume(), > blockFile.getParentFile(), > newlength); > {code} > blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to > choose any volume so rur.getVolume() is not where the block is located. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9516) truncate file fails with data dirs on multiple disks
[ https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinod Kumar Vavilapalli updated HDFS-9516: -- Target Version/s: 2.8.0, 2.7.3 [~shv], unfortunately this came in too late for 2.7.2. That said, I don’t see any reason why this shouldn’t be in 2.8.0 and 2.7.3. Setting the target-versions accordingly on JIRA. If you agree, appreciate backport help to those branches (branch-2.8.0, branch-2.7). > truncate file fails with data dirs on multiple disks > > > Key: HDFS-9516 > URL: https://issues.apache.org/jira/browse/HDFS-9516 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Bogdan Raducanu >Assignee: Plamen Jeliazkov > Fix For: 2.9.0 > > Attachments: HDFS-9516_1.patch, HDFS-9516_2.patch, HDFS-9516_3.patch, > HDFS-9516_testFailures.patch, Main.java, truncate.dn.log > > > FileSystem.truncate returns false (no exception) but the file is never closed > and not writable after this. > It seems to be because of copy on truncate which is used because the system > is in upgrade state. In this case a rename between devices is attempted. > See attached log and repro code. > Probably also affects truncate snapshotted file when copy on truncate is also > used. > Possibly it affects not only truncate but any block recovery. > I think the problem is in updateReplicaUnderRecovery > {code} > ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten( > newBlockId, recoveryId, rur.getVolume(), > blockFile.getParentFile(), > newlength); > {code} > blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to > choose any volume so rur.getVolume() is not where the block is located. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9516) truncate file fails with data dirs on multiple disks
[ https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HDFS-9516: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.9.0 Status: Resolved (was: Patch Available) I just committed this to trunk and branch-2. Thank you Plamen. > truncate file fails with data dirs on multiple disks > > > Key: HDFS-9516 > URL: https://issues.apache.org/jira/browse/HDFS-9516 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Bogdan Raducanu >Assignee: Plamen Jeliazkov > Fix For: 2.9.0 > > Attachments: HDFS-9516_1.patch, HDFS-9516_2.patch, HDFS-9516_3.patch, > HDFS-9516_testFailures.patch, Main.java, truncate.dn.log > > > FileSystem.truncate returns false (no exception) but the file is never closed > and not writable after this. > It seems to be because of copy on truncate which is used because the system > is in upgrade state. In this case a rename between devices is attempted. > See attached log and repro code. > Probably also affects truncate snapshotted file when copy on truncate is also > used. > Possibly it affects not only truncate but any block recovery. > I think the problem is in updateReplicaUnderRecovery > {code} > ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten( > newBlockId, recoveryId, rur.getVolume(), > blockFile.getParentFile(), > newlength); > {code} > blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to > choose any volume so rur.getVolume() is not where the block is located. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9516) truncate file fails with data dirs on multiple disks
[ https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-9516: --- Attachment: HDFS-9516_3.patch Attaching "_3" patch which contains an assert that [~shv] wished for. The assert this time checks that the volume base path is the starting path of the block file in order to assure that the block file will not attempt to be moved across volumes when finalized. > truncate file fails with data dirs on multiple disks > > > Key: HDFS-9516 > URL: https://issues.apache.org/jira/browse/HDFS-9516 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Bogdan Raducanu >Assignee: Plamen Jeliazkov > Attachments: HDFS-9516_1.patch, HDFS-9516_2.patch, HDFS-9516_3.patch, > HDFS-9516_testFailures.patch, Main.java, truncate.dn.log > > > FileSystem.truncate returns false (no exception) but the file is never closed > and not writable after this. > It seems to be because of copy on truncate which is used because the system > is in upgrade state. In this case a rename between devices is attempted. > See attached log and repro code. > Probably also affects truncate snapshotted file when copy on truncate is also > used. > Possibly it affects not only truncate but any block recovery. > I think the problem is in updateReplicaUnderRecovery > {code} > ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten( > newBlockId, recoveryId, rur.getVolume(), > blockFile.getParentFile(), > newlength); > {code} > blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to > choose any volume so rur.getVolume() is not where the block is located. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9516) truncate file fails with data dirs on multiple disks
[ https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-9516: --- Attachment: HDFS-9516_1.patch Attaching first patch (_1) with proposed fix. I've left the assert in place. All unit tests pass locally. > truncate file fails with data dirs on multiple disks > > > Key: HDFS-9516 > URL: https://issues.apache.org/jira/browse/HDFS-9516 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Bogdan Raducanu >Assignee: Plamen Jeliazkov > Attachments: HDFS-9516_1.patch, HDFS-9516_testFailures.patch, > Main.java, truncate.dn.log > > > FileSystem.truncate returns false (no exception) but the file is never closed > and not writable after this. > It seems to be because of copy on truncate which is used because the system > is in upgrade state. In this case a rename between devices is attempted. > See attached log and repro code. > Probably also affects truncate snapshotted file when copy on truncate is also > used. > Possibly it affects not only truncate but any block recovery. > I think the problem is in updateReplicaUnderRecovery > {code} > ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten( > newBlockId, recoveryId, rur.getVolume(), > blockFile.getParentFile(), > newlength); > {code} > blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to > choose any volume so rur.getVolume() is not where the block is located. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9516) truncate file fails with data dirs on multiple disks
[ https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-9516: --- Attachment: HDFS-9516_testFailures.patch > truncate file fails with data dirs on multiple disks > > > Key: HDFS-9516 > URL: https://issues.apache.org/jira/browse/HDFS-9516 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Bogdan Raducanu >Assignee: Plamen Jeliazkov > Attachments: Main.java, truncate.dn.log > > > FileSystem.truncate returns false (no exception) but the file is never closed > and not writable after this. > It seems to be because of copy on truncate which is used because the system > is in upgrade state. In this case a rename between devices is attempted. > See attached log and repro code. > Probably also affects truncate snapshotted file when copy on truncate is also > used. > Possibly it affects not only truncate but any block recovery. > I think the problem is in updateReplicaUnderRecovery > {code} > ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten( > newBlockId, recoveryId, rur.getVolume(), > blockFile.getParentFile(), > newlength); > {code} > blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to > choose any volume so rur.getVolume() is not where the block is located. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9516) truncate file fails with data dirs on multiple disks
[ https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-9516: --- Attachment: (was: HDFS-9516_testFailures.patch) > truncate file fails with data dirs on multiple disks > > > Key: HDFS-9516 > URL: https://issues.apache.org/jira/browse/HDFS-9516 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Bogdan Raducanu >Assignee: Plamen Jeliazkov > Attachments: Main.java, truncate.dn.log > > > FileSystem.truncate returns false (no exception) but the file is never closed > and not writable after this. > It seems to be because of copy on truncate which is used because the system > is in upgrade state. In this case a rename between devices is attempted. > See attached log and repro code. > Probably also affects truncate snapshotted file when copy on truncate is also > used. > Possibly it affects not only truncate but any block recovery. > I think the problem is in updateReplicaUnderRecovery > {code} > ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten( > newBlockId, recoveryId, rur.getVolume(), > blockFile.getParentFile(), > newlength); > {code} > blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to > choose any volume so rur.getVolume() is not where the block is located. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9516) truncate file fails with data dirs on multiple disks
[ https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-9516: --- Attachment: HDFS-9516_testFailures.patch > truncate file fails with data dirs on multiple disks > > > Key: HDFS-9516 > URL: https://issues.apache.org/jira/browse/HDFS-9516 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Bogdan Raducanu >Assignee: Plamen Jeliazkov > Attachments: HDFS-9516_testFailures.patch, Main.java, truncate.dn.log > > > FileSystem.truncate returns false (no exception) but the file is never closed > and not writable after this. > It seems to be because of copy on truncate which is used because the system > is in upgrade state. In this case a rename between devices is attempted. > See attached log and repro code. > Probably also affects truncate snapshotted file when copy on truncate is also > used. > Possibly it affects not only truncate but any block recovery. > I think the problem is in updateReplicaUnderRecovery > {code} > ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten( > newBlockId, recoveryId, rur.getVolume(), > blockFile.getParentFile(), > newlength); > {code} > blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to > choose any volume so rur.getVolume() is not where the block is located. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9516) truncate file fails with data dirs on multiple disks
[ https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-9516: --- Status: Patch Available (was: In Progress) > truncate file fails with data dirs on multiple disks > > > Key: HDFS-9516 > URL: https://issues.apache.org/jira/browse/HDFS-9516 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Bogdan Raducanu >Assignee: Plamen Jeliazkov > Attachments: HDFS-9516_1.patch, HDFS-9516_2.patch, > HDFS-9516_testFailures.patch, Main.java, truncate.dn.log > > > FileSystem.truncate returns false (no exception) but the file is never closed > and not writable after this. > It seems to be because of copy on truncate which is used because the system > is in upgrade state. In this case a rename between devices is attempted. > See attached log and repro code. > Probably also affects truncate snapshotted file when copy on truncate is also > used. > Possibly it affects not only truncate but any block recovery. > I think the problem is in updateReplicaUnderRecovery > {code} > ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten( > newBlockId, recoveryId, rur.getVolume(), > blockFile.getParentFile(), > newlength); > {code} > blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to > choose any volume so rur.getVolume() is not where the block is located. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9516) truncate file fails with data dirs on multiple disks
[ https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Plamen Jeliazkov updated HDFS-9516: --- Attachment: HDFS-9516_2.patch Attaching second patch (_2) with same fix but the assert statement taken out. Please note I have also removed the try block; please let me know if I should have left that back in. It does not seem it was needed anymore when I took a look though. > truncate file fails with data dirs on multiple disks > > > Key: HDFS-9516 > URL: https://issues.apache.org/jira/browse/HDFS-9516 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Bogdan Raducanu >Assignee: Plamen Jeliazkov > Attachments: HDFS-9516_1.patch, HDFS-9516_2.patch, > HDFS-9516_testFailures.patch, Main.java, truncate.dn.log > > > FileSystem.truncate returns false (no exception) but the file is never closed > and not writable after this. > It seems to be because of copy on truncate which is used because the system > is in upgrade state. In this case a rename between devices is attempted. > See attached log and repro code. > Probably also affects truncate snapshotted file when copy on truncate is also > used. > Possibly it affects not only truncate but any block recovery. > I think the problem is in updateReplicaUnderRecovery > {code} > ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten( > newBlockId, recoveryId, rur.getVolume(), > blockFile.getParentFile(), > newlength); > {code} > blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to > choose any volume so rur.getVolume() is not where the block is located. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9516) truncate file fails with data dirs on multiple disks
[ https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bogdan Raducanu updated HDFS-9516: -- Attachment: truncate.dn.log > truncate file fails with data dirs on multiple disks > > > Key: HDFS-9516 > URL: https://issues.apache.org/jira/browse/HDFS-9516 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Bogdan Raducanu > Attachments: Main.java, truncate.dn.log > > > FileSystem.truncate returns false (no exception) but the file is never closed > and not writable after this. > It seems to be because of copy on truncate which is used because the system > is in upgrade state. In this case a rename between devices is attempted. > See attached log and repro code. > Probably also affects truncate snapshotted file when copy on truncate is also > used. > Possibly it affects not only truncate but any block recovery. > I think the problem is in updateReplicaUnderRecovery > {code} > ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten( > newBlockId, recoveryId, rur.getVolume(), > blockFile.getParentFile(), > newlength); > {code} > blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to > choose any volume so rur.getVolume() is not where the block is located. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-9516) truncate file fails with data dirs on multiple disks
[ https://issues.apache.org/jira/browse/HDFS-9516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bogdan Raducanu updated HDFS-9516: -- Attachment: Main.java > truncate file fails with data dirs on multiple disks > > > Key: HDFS-9516 > URL: https://issues.apache.org/jira/browse/HDFS-9516 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Affects Versions: 2.7.1 >Reporter: Bogdan Raducanu > Attachments: Main.java, truncate.dn.log > > > FileSystem.truncate returns false (no exception) but the file is never closed > and not writable after this. > It seems to be because of copy on truncate which is used because the system > is in upgrade state. In this case a rename between devices is attempted. > See attached log and repro code. > Probably also affects truncate snapshotted file when copy on truncate is also > used. > Possibly it affects not only truncate but any block recovery. > I think the problem is in updateReplicaUnderRecovery > {code} > ReplicaBeingWritten newReplicaInfo = new ReplicaBeingWritten( > newBlockId, recoveryId, rur.getVolume(), > blockFile.getParentFile(), > newlength); > {code} > blockFile is created with copyReplicaWithNewBlockIdAndGS which is allowed to > choose any volume so rur.getVolume() is not where the block is located. > -- This message was sent by Atlassian JIRA (v6.3.4#6332)