[jira] [Updated] (HDFS-15480) Ordered snapshot deletion: record snapshot deletion in XAttr
[ https://issues.apache.org/jira/browse/HDFS-15480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15480: --- Attachment: HDFS-15480.000.patch > Ordered snapshot deletion: record snapshot deletion in XAttr > > > Key: HDFS-15480 > URL: https://issues.apache.org/jira/browse/HDFS-15480 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Reporter: Tsz-wo Sze >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-15480.000.patch > > > In this JIRA, the behavior of deleting the non-earliest snapshots will be > changed to marking them as deleted in XAttr but not actually deleting them. > Note that > # The marked-for-deletion snapshots will be garbage collected later on; see > HDFS-15481. > # The marked-for-deletion snapshots will be hided from users; see HDFS-15482. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15480) Ordered snapshot deletion: record snapshot deletion in XAttr
[ https://issues.apache.org/jira/browse/HDFS-15480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDFS-15480: -- Assignee: Shashikant Banerjee > Ordered snapshot deletion: record snapshot deletion in XAttr > > > Key: HDFS-15480 > URL: https://issues.apache.org/jira/browse/HDFS-15480 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: snapshots >Reporter: Tsz-wo Sze >Assignee: Shashikant Banerjee >Priority: Major > > In this JIRA, the behavior of deleting the non-earliest snapshots will be > changed to marking them as deleted in XAttr but not actually deleting them. > Note that > # The marked-for-deletion snapshots will be garbage collected later on; see > HDFS-15481. > # The marked-for-deletion snapshots will be hided from users; see HDFS-15482. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161036#comment-17161036 ] Shashikant Banerjee commented on HDFS-15313: [^HDFS-15313.branch-2.10.001.patch] -> Patch for 2.10 branch > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5 > > Attachments: HDFS-15313-branch-3.1.001.patch, HDFS-15313.000.patch, > HDFS-15313.001.patch, HDFS-15313.branch-2.10.001.patch, > HDFS-15313.branch-2.10.patch, HDFS-15313.branch-2.8.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15313: --- Attachment: HDFS-15313.branch-2.10.001.patch > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5 > > Attachments: HDFS-15313-branch-3.1.001.patch, HDFS-15313.000.patch, > HDFS-15313.001.patch, HDFS-15313.branch-2.10.001.patch, > HDFS-15313.branch-2.10.patch, HDFS-15313.branch-2.8.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-15463) Add a tool to validate FsImage
[ https://issues.apache.org/jira/browse/HDFS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDFS-15463. Fix Version/s: 3.4.0 Resolution: Fixed > Add a tool to validate FsImage > -- > > Key: HDFS-15463 > URL: https://issues.apache.org/jira/browse/HDFS-15463 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Fix For: 3.4.0 > > Attachments: FsImageValidation20200709.patch, > FsImageValidation20200712.patch, FsImageValidation20200714.patch, > FsImageValidation20200715.patch, FsImageValidation20200715b.patch, > FsImageValidation20200715c.patch, FsImageValidation20200717b.patch, > FsImageValidation20200718.patch, HDFS-15463.000.patch > > > Due to some snapshot related bugs, a fsimage may become corrupted. Using a > corrupted fsimage may further result in data loss. > In some cases, we found that reference counts are incorrect in some corrupted > FsImage. One of the goals of the validation tool is to check reference > counts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15313: --- Attachment: HDFS-15313.branch-2.10.patch > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > Attachments: HDFS-15313-branch-3.1.001.patch, HDFS-15313.000.patch, > HDFS-15313.001.patch, HDFS-15313.branch-2.10.patch, > HDFS-15313.branch-2.8.patch, HDFS-15313.branch-3.1.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15470) Added more unit tests to validate rename behaviour across snapshots
[ https://issues.apache.org/jira/browse/HDFS-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15470: --- Attachment: HDFS-15470.002.patch > Added more unit tests to validate rename behaviour across snapshots > --- > > Key: HDFS-15470 > URL: https://issues.apache.org/jira/browse/HDFS-15470 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.0.4 > > Attachments: HDFS-15470.000.patch, HDFS-15470.001.patch, > HDFS-15470.002.patch > > > HDFS-15313 fixes a critical issue which will avoid deletion of data in active > fs with a sequence of snapshot deletes. The idea is to add more tests to > verify the behaviour. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158176#comment-17158176 ] Shashikant Banerjee commented on HDFS-15313: Thanks [~sodonnell], i have uploaded patches for branch 2.8 as well as 3.1 . Please have a look. > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > Attachments: HDFS-15313.000.patch, HDFS-15313.001.patch, > HDFS-15313.branch-2.8.patch, HDFS-15313.branch-3.1.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15313: --- Attachment: HDFS-15313.branch-3.1.patch > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > Attachments: HDFS-15313.000.patch, HDFS-15313.001.patch, > HDFS-15313.branch-2.8.patch, HDFS-15313.branch-3.1.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15313: --- Attachment: HDFS-15313.branch-2.8.patch > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.2.2, 3.3.1, 3.4.0 > > Attachments: HDFS-15313.000.patch, HDFS-15313.001.patch, > HDFS-15313.branch-2.8.patch, HDFS-15313.branch-3.1.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15463) Add a tool to validate FsImage
[ https://issues.apache.org/jira/browse/HDFS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158030#comment-17158030 ] Shashikant Banerjee commented on HDFS-15463: Thanks [~szetszwo] for working on this. HDFS-15463.000.patch -> rebased to latest trunk. > Add a tool to validate FsImage > -- > > Key: HDFS-15463 > URL: https://issues.apache.org/jira/browse/HDFS-15463 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Attachments: FsImageValidation20200709.patch, > FsImageValidation20200712.patch, FsImageValidation20200714.patch, > FsImageValidation20200715.patch, HDFS-15463.000.patch > > > Due to some snapshot related bugs, a fsimage may become corrupted. Using a > corrupted fsimage may further result in data loss. > In some cases, we found that reference counts are incorrect in some corrupted > FsImage. One of the goals of the validation tool is to check reference > counts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15463) Add a tool to validate FsImage
[ https://issues.apache.org/jira/browse/HDFS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15463: --- Attachment: HDFS-15463.000.patch > Add a tool to validate FsImage > -- > > Key: HDFS-15463 > URL: https://issues.apache.org/jira/browse/HDFS-15463 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Attachments: FsImageValidation20200709.patch, > FsImageValidation20200712.patch, FsImageValidation20200714.patch, > FsImageValidation20200715.patch, HDFS-15463.000.patch > > > Due to some snapshot related bugs, a fsimage may become corrupted. Using a > corrupted fsimage may further result in data loss. > In some cases, we found that reference counts are incorrect in some corrupted > FsImage. One of the goals of the validation tool is to check reference > counts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15463) Add a tool to validate FsImage
[ https://issues.apache.org/jira/browse/HDFS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15463: --- Status: Patch Available (was: Open) > Add a tool to validate FsImage > -- > > Key: HDFS-15463 > URL: https://issues.apache.org/jira/browse/HDFS-15463 > Project: Hadoop HDFS > Issue Type: New Feature > Components: namenode >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Attachments: FsImageValidation20200709.patch, > FsImageValidation20200712.patch, FsImageValidation20200714.patch, > FsImageValidation20200715.patch, HDFS-15463.000.patch > > > Due to some snapshot related bugs, a fsimage may become corrupted. Using a > corrupted fsimage may further result in data loss. > In some cases, we found that reference counts are incorrect in some corrupted > FsImage. One of the goals of the validation tool is to check reference > counts. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14504) Rename with Snapshots does not honor quota limit
[ https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157903#comment-17157903 ] Shashikant Banerjee commented on HDFS-14504: [~hemanthboyina], sorry for coming back to it late. Patch v2 looks good. +1 > Rename with Snapshots does not honor quota limit > > > Key: HDFS-14504 > URL: https://issues.apache.org/jira/browse/HDFS-14504 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Hemanth Boyina >Priority: Major > Attachments: HDFS-14504.001.patch, HDFS-14504.002.patch > > > Steps to Reproduce: > > {code:java} > HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2 > 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2 > 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2 > 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Allowing snapshot on /dir2 succeeded > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1 > 2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1 > 2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Created snapshot /dir2/.snapshot/snap1 > HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2 > 2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 > 2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Found 1 items > -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex > 2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > touchz: The NameSpace quota (directories and files) of directory /dir2 is > exceeded: quota=3 file count=4 > HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2 > 2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Created snapshot /dir2/.snapshot/snap2 > HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 > 2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Found 1 items > -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 > HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file2 /dir2/file3 > 2019-05-21 15:12:35,908 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filey > 2019-05-21 15:12:49,998 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > touchz: The NameSpace quota (directories and files) of directory /dir2 is > exceeded: quota=3 file count=5 > {code} > // create operation fails here as it has already exceeded the quota limit > {code} > HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap3 > 2019-05-21 15:13:07,656 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Created snapshot /dir2/.snapshot/snap3 > HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file3 /dir2/file4 > 2019-05-21 15:13:20,715 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > {code} > // Rename operation succeeds here adding on to the namespace quota > {code} > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filez > 2019-05-21 15:13:30,486 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > touchz: The NameSpace quota (directories and files) of directory /dir2 is > exceeded: quota=3 file count=6
[jira] [Updated] (HDFS-15470) Added more unit tests to validate rename behaviour across snapshots
[ https://issues.apache.org/jira/browse/HDFS-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15470: --- Attachment: HDFS-15470.001.patch > Added more unit tests to validate rename behaviour across snapshots > --- > > Key: HDFS-15470 > URL: https://issues.apache.org/jira/browse/HDFS-15470 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.0.4 > > Attachments: HDFS-15470.000.patch, HDFS-15470.001.patch > > > HDFS-15313 fixes a critical issue which will avoid deletion of data in active > fs with a sequence of snapshot deletes. The idea is to add more tests to > verify the behaviour. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API
[ https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15319: --- Fix Version/s: 3.0.4 Resolution: Fixed Status: Resolved (was: Patch Available) > Fix INode#isInLatestSnapshot() API > -- > > Key: HDFS-15319 > URL: https://issues.apache.org/jira/browse/HDFS-15319 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.0.4 > > Attachments: HDFS-15319.000.patch, HDFS-15319.001.patch > > > isInLatestSnapshot() may return true in cases where an inode's ancesstors > might not be in the latest snapshot. > {code:java} > // if parent is a reference node, parent must be a renamed node. We can > // stop the check at the reference node. > if (parent != null && parent.isReference()) { > // TODO: Is it a bug to return true? > // Some ancestor nodes may not be in the latest snapshot. > return true; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15470) Added more unit tests to validate rename behaviour across snapshots
[ https://issues.apache.org/jira/browse/HDFS-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15470: --- Status: Patch Available (was: Open) > Added more unit tests to validate rename behaviour across snapshots > --- > > Key: HDFS-15470 > URL: https://issues.apache.org/jira/browse/HDFS-15470 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.0.4 > > Attachments: HDFS-15470.000.patch > > > HDFS-15313 fixes a critical issue which will avoid deletion of data in active > fs with a sequence of snapshot deletes. The idea is to add more tests to > verify the behaviour. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15470) Added more unit tests to validate rename behaviour across snapshots
[ https://issues.apache.org/jira/browse/HDFS-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15470: --- Attachment: HDFS-15470.000.patch > Added more unit tests to validate rename behaviour across snapshots > --- > > Key: HDFS-15470 > URL: https://issues.apache.org/jira/browse/HDFS-15470 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.0.4 > > Attachments: HDFS-15470.000.patch > > > HDFS-15313 fixes a critical issue which will avoid deletion of data in active > fs with a sequence of snapshot deletes. The idea is to add more tests to > verify the behaviour. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15470) Added more unit tests to validate rename behaviour across snapshots
Shashikant Banerjee created HDFS-15470: -- Summary: Added more unit tests to validate rename behaviour across snapshots Key: HDFS-15470 URL: https://issues.apache.org/jira/browse/HDFS-15470 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 3.0.4 HDFS-15313 fixes a critical issue which will avoid deletion of data in active fs with a sequence of snapshot deletes. The idea is to add more tests to verify the behaviour. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API
[ https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15319: --- Attachment: HDFS-15319.001.patch > Fix INode#isInLatestSnapshot() API > -- > > Key: HDFS-15319 > URL: https://issues.apache.org/jira/browse/HDFS-15319 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-15319.000.patch, HDFS-15319.001.patch > > > isInLatestSnapshot() may return true in cases where an inode's ancesstors > might not be in the latest snapshot. > {code:java} > // if parent is a reference node, parent must be a renamed node. We can > // stop the check at the reference node. > if (parent != null && parent.isReference()) { > // TODO: Is it a bug to return true? > // Some ancestor nodes may not be in the latest snapshot. > return true; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15319) Fix INode#isInLatestSnapshot() API
[ https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157498#comment-17157498 ] Shashikant Banerjee commented on HDFS-15319: patch v1 removes the TODO added as a part of fix for HDFS-15313 as it does not seem to be a problem. > Fix INode#isInLatestSnapshot() API > -- > > Key: HDFS-15319 > URL: https://issues.apache.org/jira/browse/HDFS-15319 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-15319.000.patch, HDFS-15319.001.patch > > > isInLatestSnapshot() may return true in cases where an inode's ancesstors > might not be in the latest snapshot. > {code:java} > // if parent is a reference node, parent must be a renamed node. We can > // stop the check at the reference node. > if (parent != null && parent.isReference()) { > // TODO: Is it a bug to return true? > // Some ancestor nodes may not be in the latest snapshot. > return true; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API
[ https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15319: --- Status: Patch Available (was: Open) > Fix INode#isInLatestSnapshot() API > -- > > Key: HDFS-15319 > URL: https://issues.apache.org/jira/browse/HDFS-15319 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-15319.000.patch > > > isInLatestSnapshot() may return true in cases where an inode's ancesstors > might not be in the latest snapshot. > {code:java} > // if parent is a reference node, parent must be a renamed node. We can > // stop the check at the reference node. > if (parent != null && parent.isReference()) { > // TODO: Is it a bug to return true? > // Some ancestor nodes may not be in the latest snapshot. > return true; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API
[ https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15319: --- Attachment: HDFS-15319.000.patch > Fix INode#isInLatestSnapshot() API > -- > > Key: HDFS-15319 > URL: https://issues.apache.org/jira/browse/HDFS-15319 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-15319.000.patch > > > isInLatestSnapshot() may return true in cases where an inode's ancesstors > might not be in the latest snapshot. > {code:java} > // if parent is a reference node, parent must be a renamed node. We can > // stop the check at the reference node. > if (parent != null && parent.isReference()) { > // TODO: Is it a bug to return true? > // Some ancestor nodes may not be in the latest snapshot. > return true; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15319) Fix INode#isInLatestSnapshot() API
[ https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097210#comment-17097210 ] Shashikant Banerjee commented on HDFS-15319: [~szetszwo], can you please have a look? > Fix INode#isInLatestSnapshot() API > -- > > Key: HDFS-15319 > URL: https://issues.apache.org/jira/browse/HDFS-15319 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-15319.000.patch > > > isInLatestSnapshot() may return true in cases where an inode's ancesstors > might not be in the latest snapshot. > {code:java} > // if parent is a reference node, parent must be a renamed node. We can > // stop the check at the reference node. > if (parent != null && parent.isReference()) { > // TODO: Is it a bug to return true? > // Some ancestor nodes may not be in the latest snapshot. > return true; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API
[ https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15319: --- Component/s: snapshots > Fix INode#isInLatestSnapshot() API > -- > > Key: HDFS-15319 > URL: https://issues.apache.org/jira/browse/HDFS-15319 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Attachments: HDFS-15319.000.patch > > > isInLatestSnapshot() may return true in cases where an inode's ancesstors > might not be in the latest snapshot. > {code:java} > // if parent is a reference node, parent must be a renamed node. We can > // stop the check at the reference node. > if (parent != null && parent.isReference()) { > // TODO: Is it a bug to return true? > // Some ancestor nodes may not be in the latest snapshot. > return true; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097200#comment-17097200 ] Shashikant Banerjee commented on HDFS-15313: Filed HDFS-15319 to address the issues in isInLatestSnapshot(). > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15313.000.patch, HDFS-15313.001.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-15319) Fix INode#isInLatestSnapshot() API
[ https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDFS-15319: -- Assignee: Shashikant Banerjee > Fix INode#isInLatestSnapshot() API > -- > > Key: HDFS-15319 > URL: https://issues.apache.org/jira/browse/HDFS-15319 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > > isInLatestSnapshot() may return true in cases where an inode's ancesstors > might not be in the latest snapshot. > {code:java} > // if parent is a reference node, parent must be a renamed node. We can > // stop the check at the reference node. > if (parent != null && parent.isReference()) { > // TODO: Is it a bug to return true? > // Some ancestor nodes may not be in the latest snapshot. > return true; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API
[ https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15319: --- Description: isInLatestSnapshot() may return true in cases where an inode's ancesstors might not be in the latest snapshot. {code:java} // if parent is a reference node, parent must be a renamed node. We can // stop the check at the reference node. if (parent != null && parent.isReference()) { // TODO: Is it a bug to return true? // Some ancestor nodes may not be in the latest snapshot. return true; } {code} was: {code:java} // if parent is a reference node, parent must be a renamed node. We can // stop the check at the reference node. if (parent != null && parent.isReference()) { // TODO: Is it a bug to return true? // Some ancestor nodes may not be in the latest snapshot. return true; } {code} > Fix INode#isInLatestSnapshot() API > -- > > Key: HDFS-15319 > URL: https://issues.apache.org/jira/browse/HDFS-15319 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shashikant Banerjee >Priority: Major > > isInLatestSnapshot() may return true in cases where an inode's ancesstors > might not be in the latest snapshot. > {code:java} > // if parent is a reference node, parent must be a renamed node. We can > // stop the check at the reference node. > if (parent != null && parent.isReference()) { > // TODO: Is it a bug to return true? > // Some ancestor nodes may not be in the latest snapshot. > return true; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API
[ https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15319: --- Description: {code:java} // if parent is a reference node, parent must be a renamed node. We can // stop the check at the reference node. if (parent != null && parent.isReference()) { // TODO: Is it a bug to return true? // Some ancestor nodes may not be in the latest snapshot. return true; } {code} was:The > Fix INode#isInLatestSnapshot() API > -- > > Key: HDFS-15319 > URL: https://issues.apache.org/jira/browse/HDFS-15319 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shashikant Banerjee >Priority: Major > > {code:java} > // if parent is a reference node, parent must be a renamed node. We can > // stop the check at the reference node. > if (parent != null && parent.isReference()) { > // TODO: Is it a bug to return true? > // Some ancestor nodes may not be in the latest snapshot. > return true; > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API
[ https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15319: --- Description: The > Fix INode#isInLatestSnapshot() API > -- > > Key: HDFS-15319 > URL: https://issues.apache.org/jira/browse/HDFS-15319 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shashikant Banerjee >Priority: Major > > The -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15319) Fix INode#isInLatestSnapshot
Shashikant Banerjee created HDFS-15319: -- Summary: Fix INode#isInLatestSnapshot Key: HDFS-15319 URL: https://issues.apache.org/jira/browse/HDFS-15319 Project: Hadoop HDFS Issue Type: Bug Reporter: Shashikant Banerjee -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API
[ https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15319: --- Summary: Fix INode#isInLatestSnapshot() API (was: Fix INode#isInLatestSnapshot) > Fix INode#isInLatestSnapshot() API > -- > > Key: HDFS-15319 > URL: https://issues.apache.org/jira/browse/HDFS-15319 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shashikant Banerjee >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15313: --- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~szetszwo] for the review. I have committed this. > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15313.000.patch, HDFS-15313.001.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096576#comment-17096576 ] Shashikant Banerjee commented on HDFS-15313: Patch v1 addresses checkstyle issues. > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15313.000.patch, HDFS-15313.001.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15313: --- Attachment: HDFS-15313.001.patch > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15313.000.patch, HDFS-15313.001.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15313: --- Description: After HDFS-13101, it was observed in one of our customer deployments that delete snapshot ends up cleaning up inodes from active fs which can be referred from only one snapshot as the isLastReference() check for the parent dir introduced in HDFS-13101 may return true in certain cases. The aim of this Jira to add a check to ensure if the Inodes are being referred in the active fs , should not get deleted while deletion of snapshot happens. (was: After HDFS-13101, it was observed that delete snapshot end up cleaning up inodes from active fs which can be referred from only one snapshot as the isLastReference() check for the parent dir introduced in HDFS-13101 may return true in certain cases. The aim of this Jira to add a check to ensure if the Inodes are being referred in the active fs , should not get deleted while deletion of snapshot happens.) > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15313.000.patch > > > After HDFS-13101, it was observed in one of our customer deployments that > delete snapshot ends up cleaning up inodes from active fs which can be > referred from only one snapshot as the isLastReference() check for the parent > dir introduced in HDFS-13101 may return true in certain cases. The aim of > this Jira to add a check to ensure if the Inodes are being referred in the > active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096271#comment-17096271 ] Shashikant Banerjee commented on HDFS-15313: [~szetszwo], [~weichiu] , can you please have a look? cc ~[~arp], [~msingh] > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15313.000.patch > > > After HDFS-13101, it was observed that delete snapshot end up cleaning up > inodes from active fs which can be referred from only one snapshot as the > isLastReference() check for the parent dir introduced in HDFS-13101 may > return true in certain cases. The aim of this Jira to add a check to ensure > if the Inodes are being referred in the active fs , should not get deleted > while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15313: --- Description: After HDFS-13101, it was observed that delete snapshot end up cleaning up inodes from active fs which can be referred from only one snapshot as the isLastReference() check for the parent dir introduced in HDFS-13101 may return true in certain cases. The aim of this Jira to add a check to ensure if the Inodes are being referred in the active fs , should not get deleted while deletion of snapshot happens. (was: After HDFS-13101, it was observed that delete snapshot end up cleaning up inodes from active fs which can be referred from only one snapshot as the isLastReference() check for the parent dir introduced in HDFS-13101 returns true. The aim of this Jira to add a check to ensure if the Inodes are being referred in the active fs , should not get deleted while deletion of snapshot happens.) > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15313.000.patch > > > After HDFS-13101, it was observed that delete snapshot end up cleaning up > inodes from active fs which can be referred from only one snapshot as the > isLastReference() check for the parent dir introduced in HDFS-13101 may > return true in certain cases. The aim of this Jira to add a check to ensure > if the Inodes are being referred in the active fs , should not get deleted > while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15313: --- Status: Patch Available (was: Open) > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15313.000.patch > > > After HDFS-13101, it was observed that delete snapshot end up cleaning up > inodes from active fs which can be referred from only one snapshot as the > isLastReference() check for the parent dir introduced in HDFS-13101 may > return true in certain cases. The aim of this Jira to add a check to ensure > if the Inodes are being referred in the active fs , should not get deleted > while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15313: --- Attachment: HDFS-15313.000.patch > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > Attachments: HDFS-15313.000.patch > > > After HDFS-13101, it was observed that delete snapshot end up cleaning up > inodes from active fs which can be referred from only one snapshot as the > isLastReference() check for the parent dir introduced in HDFS-13101 returns > true. The aim of this Jira to add a check to ensure if the Inodes are being > referred in the active fs , should not get deleted while deletion of snapshot > happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
[ https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15313: --- Description: After HDFS-13101, it was observed that delete snapshot end up cleaning up inodes from active fs which can be referred from only one snapshot as the isLastReference() check for the parent dir introduced in HDFS-13101 returns true. The aim of this Jira to add a check to ensure if the Inodes are being referred in the active fs , should not get deleted while deletion of snapshot happens. (was: After HDFS-13101, it was observed that delete snapshot end up cleaning up inodes from active fs which can be referred from only one snapshot as the isLastReference() introduced in HDFS-13101 returns true. The aim of this Jira to add a check to ensure if the Inodes are being referred in the active fs , should not get deleted while deletion of snapshot happens.) > Ensure inodes in active filesytem are not deleted during snapshot delete > > > Key: HDFS-15313 > URL: https://issues.apache.org/jira/browse/HDFS-15313 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 3.4.0 > > > After HDFS-13101, it was observed that delete snapshot end up cleaning up > inodes from active fs which can be referred from only one snapshot as the > isLastReference() check for the parent dir introduced in HDFS-13101 returns > true. The aim of this Jira to add a check to ensure if the Inodes are being > referred in the active fs , should not get deleted while deletion of snapshot > happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete
Shashikant Banerjee created HDFS-15313: -- Summary: Ensure inodes in active filesytem are not deleted during snapshot delete Key: HDFS-15313 URL: https://issues.apache.org/jira/browse/HDFS-15313 Project: Hadoop HDFS Issue Type: Bug Components: snapshots Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 3.4.0 After HDFS-13101, it was observed that delete snapshot end up cleaning up inodes from active fs which can be referred from only one snapshot as the isLastReference() introduced in HDFS-13101 returns true. The aim of this Jira to add a check to ensure if the Inodes are being referred in the active fs , should not get deleted while deletion of snapshot happens. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14504) Rename with Snapshots does not honor quota limit
[ https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092046#comment-17092046 ] Shashikant Banerjee commented on HDFS-14504: Thanks [~hemanthboyina] for the explanation. [~szetszwo], can you please have a look ? > Rename with Snapshots does not honor quota limit > > > Key: HDFS-14504 > URL: https://issues.apache.org/jira/browse/HDFS-14504 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14504.001.patch, HDFS-14504.002.patch > > > Steps to Reproduce: > > {code:java} > HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2 > 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2 > 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2 > 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Allowing snapshot on /dir2 succeeded > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1 > 2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1 > 2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Created snapshot /dir2/.snapshot/snap1 > HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2 > 2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 > 2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Found 1 items > -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex > 2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > touchz: The NameSpace quota (directories and files) of directory /dir2 is > exceeded: quota=3 file count=4 > HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2 > 2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Created snapshot /dir2/.snapshot/snap2 > HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 > 2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Found 1 items > -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 > HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file2 /dir2/file3 > 2019-05-21 15:12:35,908 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filey > 2019-05-21 15:12:49,998 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > touchz: The NameSpace quota (directories and files) of directory /dir2 is > exceeded: quota=3 file count=5 > {code} > // create operation fails here as it has already exceeded the quota limit > {code} > HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap3 > 2019-05-21 15:13:07,656 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Created snapshot /dir2/.snapshot/snap3 > HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file3 /dir2/file4 > 2019-05-21 15:13:20,715 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > {code} > // Rename operation succeeds here adding on to the namespace quota > {code} > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filez > 2019-05-21 15:13:30,486 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > touchz: The NameSpace quota (directories and files) of directory /dir2 is > exceeded: quota=3
[jira] [Comment Edited] (HDFS-14504) Rename with Snapshots does not honor quota limit
[ https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087899#comment-17087899 ] Shashikant Banerjee edited comment on HDFS-14504 at 4/20/20, 4:25 PM: -- Thanks [~hemanthboyina] for updating the patch. {code:java} @Test public void testRenameAcrossDirWithinSnapshot() throws Exception { // snapshottable directory String dirr = "/dir"; Path rootDir = new Path(dirr); hdfs.mkdirs(rootDir); hdfs.allowSnapshot(rootDir); // set quota for source directory under snapshottable root directory Path dir2 = new Path(rootDir, "dir2"); Path fil1 = new Path(dir2, "file1"); hdfs.mkdirs(dir2); hdfs.setQuota(dir2, 3, 0); hdfs.create(fil1); Path file2 = new Path(dir2, "file2"); hdfs.rename(fil1, file2); Path fil3 = new Path(dir2, "file3"); hdfs.create(fil3); // destination directory under snapshottable root directory Path dir1 = new Path(rootDir, "dir1"); Path dir1fil1 = new Path(dir1, "file1"); hdfs.mkdirs(dir1); hdfs.create(dir1fil1); Path dir1fil2 = new Path(dir1, "file2"); hdfs.rename(dir1fil1, dir1fil2); hdfs.createSnapshot(rootDir, "snap1"); Path filex = new Path(dir2, "filex"); // create a file after exceeding namespace quota LambdaTestUtils.intercept(NSQuotaExceededException.class, "The NameSpace quota (directories and files) of " + "directory /dir/dir2 is exceeded", () -> hdfs.create(filex)); // Rename across directories within snapshot with quota set on source // directory assertTrue(hdfs.rename(fil3, dir1)); } {code} In the test above, if "filex" cound not created in "dir2" because of exceeding quota limit but rename of "fil3" existing under the same directory "dir2" seems successful which ideally should fail as it will create InodeReferene in dir2 diff list for the snapshot snap1 hence will exceed the quota limit. Can you plz check? was (Author: shashikant): Thanks [~hemanthboyina] for updating the patch. {code:java} @Test public void testRenameAcrossDirWithinSnapshot() throws Exception { // snapshottable directory String dirr = "/dir"; Path rootDir = new Path(dirr); hdfs.mkdirs(rootDir); hdfs.allowSnapshot(rootDir); // set quota for source directory under snapshottable root directory Path dir2 = new Path(rootDir, "dir2"); Path fil1 = new Path(dir2, "file1"); hdfs.mkdirs(dir2); hdfs.setQuota(dir2, 3, 0); hdfs.create(fil1); Path file2 = new Path(dir2, "file2"); hdfs.rename(fil1, file2); Path fil3 = new Path(dir2, "file3"); hdfs.create(fil3); // destination directory under snapshottable root directory Path dir1 = new Path(rootDir, "dir1"); Path dir1fil1 = new Path(dir1, "file1"); hdfs.mkdirs(dir1); hdfs.create(dir1fil1); Path dir1fil2 = new Path(dir1, "file2"); hdfs.rename(dir1fil1, dir1fil2); hdfs.createSnapshot(rootDir, "snap1"); Path filex = new Path(dir2, "filex"); // create a file after exceeding namespace quota LambdaTestUtils.intercept(NSQuotaExceededException.class, "The NameSpace quota (directories and files) of " + "directory /dir/dir2 is exceeded", () -> hdfs.create(filex)); // Rename across directories within snapshot with quota set on source // directory assertTrue(hdfs.rename(fil3, dir1)); } {code} In the test above, if "filex" cound not created in "dir2" because of exceeding quota limit but rename of "fil3" existing under the same directory "dir2" seems successful which ideally should fail as it will create InodeReferene in dir2 diff list for the snapshot snap1 hence exceeding the quota limit. Can you plz check? > Rename with Snapshots does not honor quota limit > > > Key: HDFS-14504 > URL: https://issues.apache.org/jira/browse/HDFS-14504 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14504.001.patch, HDFS-14504.002.patch > > > Steps to Reproduce: > > {code:java} > HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2 > 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2 > 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2 > 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Allowing snapshot on /dir2 succeeded > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1 > 2019-05-21
[jira] [Comment Edited] (HDFS-14504) Rename with Snapshots does not honor quota limit
[ https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087899#comment-17087899 ] Shashikant Banerjee edited comment on HDFS-14504 at 4/20/20, 4:17 PM: -- Thanks [~hemanthboyina] for updating the patch. {code:java} @Test public void testRenameAcrossDirWithinSnapshot() throws Exception { // snapshottable directory String dirr = "/dir"; Path rootDir = new Path(dirr); hdfs.mkdirs(rootDir); hdfs.allowSnapshot(rootDir); // set quota for source directory under snapshottable root directory Path dir2 = new Path(rootDir, "dir2"); Path fil1 = new Path(dir2, "file1"); hdfs.mkdirs(dir2); hdfs.setQuota(dir2, 3, 0); hdfs.create(fil1); Path file2 = new Path(dir2, "file2"); hdfs.rename(fil1, file2); Path fil3 = new Path(dir2, "file3"); hdfs.create(fil3); // destination directory under snapshottable root directory Path dir1 = new Path(rootDir, "dir1"); Path dir1fil1 = new Path(dir1, "file1"); hdfs.mkdirs(dir1); hdfs.create(dir1fil1); Path dir1fil2 = new Path(dir1, "file2"); hdfs.rename(dir1fil1, dir1fil2); hdfs.createSnapshot(rootDir, "snap1"); Path filex = new Path(dir2, "filex"); // create a file after exceeding namespace quota LambdaTestUtils.intercept(NSQuotaExceededException.class, "The NameSpace quota (directories and files) of " + "directory /dir/dir2 is exceeded", () -> hdfs.create(filex)); // Rename across directories within snapshot with quota set on source // directory assertTrue(hdfs.rename(fil3, dir1)); } {code} In the test above, if "filex" cound not created in "dir2" because of exceeding quota limit but rename of "fil3" existing under the same directory "dir2" seems successful which ideally should fail as it will create InodeReferene in dir2 diff list for the snapshot snap1 hence exceeding the quota limit. Can you plz check? was (Author: shashikant): Thanks [~hemanthboyina] for updating the patch. {code:java} @Test public void testRenameAcrossDirWithinSnapshot() throws Exception { // snapshottable directory String dirr = "/dir"; Path rootDir = new Path(dirr); hdfs.mkdirs(rootDir); hdfs.allowSnapshot(rootDir); // set quota for source directory under snapshottable root directory Path dir2 = new Path(rootDir, "dir2"); Path fil1 = new Path(dir2, "file1"); hdfs.mkdirs(dir2); hdfs.setQuota(dir2, 3, 0); hdfs.create(fil1); Path file2 = new Path(dir2, "file2"); hdfs.rename(fil1, file2); Path fil3 = new Path(dir2, "file3"); hdfs.create(fil3); // destination directory under snapshottable root directory Path dir1 = new Path(rootDir, "dir1"); Path dir1fil1 = new Path(dir1, "file1"); hdfs.mkdirs(dir1); hdfs.create(dir1fil1); Path dir1fil2 = new Path(dir1, "file2"); hdfs.rename(dir1fil1, dir1fil2); hdfs.createSnapshot(rootDir, "snap1"); Path filex = new Path(dir2, "filex"); // create a file after exceeding namespace quota LambdaTestUtils.intercept(NSQuotaExceededException.class, "The NameSpace quota (directories and files) of " + "directory /dir/dir2 is exceeded", () -> hdfs.create(filex)); // Rename across directories within snapshot with quota set on source // directory assertTrue(hdfs.rename(fil3, dir1)); } {code} In the test above, if "filex" cound not created in "dir2" because of exceeding quota limit but rename of "fil3" existing under the same directory "dir2" seems successful which ideally should fail as it will crete InodeReferene in dir2 diff list for the snapshot snap1 hence exceeding the quota limit. Can you plz check? > Rename with Snapshots does not honor quota limit > > > Key: HDFS-14504 > URL: https://issues.apache.org/jira/browse/HDFS-14504 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14504.001.patch, HDFS-14504.002.patch > > > Steps to Reproduce: > > {code:java} > HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2 > 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2 > 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2 > 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Allowing snapshot on /dir2 succeeded > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1 > 2019-05-21
[jira] [Commented] (HDFS-14504) Rename with Snapshots does not honor quota limit
[ https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087899#comment-17087899 ] Shashikant Banerjee commented on HDFS-14504: Thanks [~hemanthboyina] for updating the patch. {code:java} @Test public void testRenameAcrossDirWithinSnapshot() throws Exception { // snapshottable directory String dirr = "/dir"; Path rootDir = new Path(dirr); hdfs.mkdirs(rootDir); hdfs.allowSnapshot(rootDir); // set quota for source directory under snapshottable root directory Path dir2 = new Path(rootDir, "dir2"); Path fil1 = new Path(dir2, "file1"); hdfs.mkdirs(dir2); hdfs.setQuota(dir2, 3, 0); hdfs.create(fil1); Path file2 = new Path(dir2, "file2"); hdfs.rename(fil1, file2); Path fil3 = new Path(dir2, "file3"); hdfs.create(fil3); // destination directory under snapshottable root directory Path dir1 = new Path(rootDir, "dir1"); Path dir1fil1 = new Path(dir1, "file1"); hdfs.mkdirs(dir1); hdfs.create(dir1fil1); Path dir1fil2 = new Path(dir1, "file2"); hdfs.rename(dir1fil1, dir1fil2); hdfs.createSnapshot(rootDir, "snap1"); Path filex = new Path(dir2, "filex"); // create a file after exceeding namespace quota LambdaTestUtils.intercept(NSQuotaExceededException.class, "The NameSpace quota (directories and files) of " + "directory /dir/dir2 is exceeded", () -> hdfs.create(filex)); // Rename across directories within snapshot with quota set on source // directory assertTrue(hdfs.rename(fil3, dir1)); } {code} In the test above, if "filex" cound not created in "dir2" because of exceeding quota limit but rename of "fil3" existing under the same directory "dir2" seems successful which ideally should fail as it will crete InodeReferene in dir2 diff list for the snapshot snap1 hence exceeding the quota limit. Can you plz check? > Rename with Snapshots does not honor quota limit > > > Key: HDFS-14504 > URL: https://issues.apache.org/jira/browse/HDFS-14504 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14504.001.patch, HDFS-14504.002.patch > > > Steps to Reproduce: > > {code:java} > HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2 > 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2 > 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2 > 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Allowing snapshot on /dir2 succeeded > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1 > 2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1 > 2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Created snapshot /dir2/.snapshot/snap1 > HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2 > 2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 > 2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Found 1 items > -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex > 2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > touchz: The NameSpace quota (directories and files) of directory /dir2 is > exceeded: quota=3 file count=4 > HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2 > 2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Created snapshot /dir2/.snapshot/snap2 > HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 > 2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Found 1 items > -rw-r--r-- 1
[jira] [Comment Edited] (HDFS-14504) Rename with Snapshots does not honor quota limit
[ https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077306#comment-17077306 ] Shashikant Banerjee edited comment on HDFS-14504 at 4/8/20, 5:52 PM: - Thanks [~hemanthboyina] for working on this. There are multiple cases to be considered while doing the rename across directories in a snapshot like: 1) Rename across directories within snapshot with quota set on source directory 2) Rename within the same directory within a snapshottable root with quota set 3) Rename from a directory under snapshottable root to a directory with quota set to a directory not under under any snapshottable root. The fix might not address all the cases here. Also, disallowing the rename in case of a rename happening under a snapshotted directory with quota set might become incompatible change as well. was (Author: shashikant): Thanks [~hemanthboyina] for working on this. There are multiple cases to be considered while doing the rename across directories in a snapshot like: 1) Rename across directories within snapshot with quota set on source directory 2) Rename within the same directory within a snapshottable root with quota set 3) Rename from a directory under snapshottable root to a directory with quota set to a directory not under under any snapshottable root. The fix might not address all the cases here. Also, disallowing the rename in case of a rename happening under a snapshotted directory with quota set will become incompatible change as well. > Rename with Snapshots does not honor quota limit > > > Key: HDFS-14504 > URL: https://issues.apache.org/jira/browse/HDFS-14504 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14504.001.patch > > > Steps to Reproduce: > > {code:java} > HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2 > 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2 > 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2 > 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Allowing snapshot on /dir2 succeeded > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1 > 2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1 > 2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Created snapshot /dir2/.snapshot/snap1 > HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2 > 2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 > 2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Found 1 items > -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex > 2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > touchz: The NameSpace quota (directories and files) of directory /dir2 is > exceeded: quota=3 file count=4 > HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2 > 2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Created snapshot /dir2/.snapshot/snap2 > HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 > 2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Found 1 items > -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 > HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file2 /dir2/file3 > 2019-05-21 15:12:35,908 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filey >
[jira] [Commented] (HDFS-14504) Rename with Snapshots does not honor quota limit
[ https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077306#comment-17077306 ] Shashikant Banerjee commented on HDFS-14504: Thanks [~hemanthboyina] for working on this. There are multiple cases to be considered while doing the rename across directories in a snapshot like: 1) Rename across directories within snapshot with quota set on source directory 2) Rename within the same directory within a snapshottable root with quota set 3) Rename from a directory under snapshottable root to a directory with quota set to a directory not under under any snapshottable root. The fix might not address all the cases here. Also, disallowing the rename in case of a rename happening under a snapshotted directory with quota set will become incompatible change as well. > Rename with Snapshots does not honor quota limit > > > Key: HDFS-14504 > URL: https://issues.apache.org/jira/browse/HDFS-14504 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: hemanthboyina >Priority: Major > Attachments: HDFS-14504.001.patch > > > Steps to Reproduce: > > {code:java} > HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2 > 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2 > 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2 > 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Allowing snapshot on /dir2 succeeded > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1 > 2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1 > 2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Created snapshot /dir2/.snapshot/snap1 > HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2 > 2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 > 2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Found 1 items > -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex > 2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > touchz: The NameSpace quota (directories and files) of directory /dir2 is > exceeded: quota=3 file count=4 > HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2 > 2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Created snapshot /dir2/.snapshot/snap2 > HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2 > 2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Found 1 items > -rw-r--r-- 1 sbanerjee hadoop 0 2019-05-21 15:10 /dir2/file2 > HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file2 /dir2/file3 > 2019-05-21 15:12:35,908 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filey > 2019-05-21 15:12:49,998 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > touchz: The NameSpace quota (directories and files) of directory /dir2 is > exceeded: quota=3 file count=5 > {code} > // create operation fails here as it has already exceeded the quota limit > {code} > HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap3 > 2019-05-21 15:13:07,656 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > Created snapshot /dir2/.snapshot/snap3 > HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file3 /dir2/file4 > 2019-05-21 15:13:20,715 WARN
[jira] [Comment Edited] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999407#comment-16999407 ] Shashikant Banerjee edited comment on HDFS-15012 at 12/18/19 6:12 PM: -- Thanks [~ericlin] for helping discovering the issue. Thanks [~arp], [~szetszwo], [~weichiu], [~ayushtkn] [~surendrasingh] for the review and feedback. I have committed this. The findbug issue reported is not related. was (Author: shashikant): Thanks [~ericlin] for helping discovering the issue. Thanks [~arp], [~szetszwo], [~weichiu], [~ayushtkn] [~surendrasingh] for the review and feedback. I have committed this. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Fix For: 2.8.0, 2.9.0, 3.1.0, 2.10.0, 3.2.0, 3.3.0 > > Attachments: HDFS-15012.000.patch, HDFS-15012.001.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at >
[jira] [Updated] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15012: --- Fix Version/s: 2.8.0 2.9.0 3.1.0 2.10.0 3.2.0 3.3.0 Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~ericlin] for helping discovering the issue. Thanks [~arp], [~szetszwo], [~weichiu], [~ayushtkn] [~surendrasingh] for the review and feedback. I have committed this. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Fix For: 3.3.0, 3.2.0, 2.10.0, 3.1.0, 2.9.0, 2.8.0 > > Attachments: HDFS-15012.000.patch, HDFS-15012.001.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at >
[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994244#comment-16994244 ] Shashikant Banerjee commented on HDFS-15012: Thanks [~szetszwo] . Patch v1 addresses the checkstyle issues. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15012.000.patch, HDFS-15012.001.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Updated] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15012: --- Attachment: HDFS-15012.001.patch > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15012.000.patch, HDFS-15012.001.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands,
[jira] [Updated] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15012: --- Status: Patch Available (was: Open) Patch v0 adds a unit test and fix to address the issue. > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15012.000.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail:
[jira] [Updated] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDFS-15012: --- Attachment: HDFS-15012.000.patch > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Blocker > Labels: release-blocker > Attachments: HDFS-15012.000.patch > > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[jira] [Resolved] (HDFS-14869) Data loss in case of distcp using snapshot diff. Replication should include rename records if file was skipped in the previous iteration
[ https://issues.apache.org/jira/browse/HDFS-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDFS-14869. Fix Version/s: 3.1.4 Resolution: Fixed Thanks [~aasha] for the contribution and [~ste...@apache.org] for the review. I have committed this. > Data loss in case of distcp using snapshot diff. Replication should include > rename records if file was skipped in the previous iteration > > > Key: HDFS-14869 > URL: https://issues.apache.org/jira/browse/HDFS-14869 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > Fix For: 3.1.4 > > > This issue arises when a directory or file is excluded by exclusion filter > during distcp replication. Later on if the directory is renamed later to a > name which is not excluded by the filter, the snapshot diff reports only a > rename operation. The directory is never copied to target even though its > not excluded now. This also doesn't throw any error so there is no way to > find the issue. > Steps to reproduce > * Create a directory in hdfs to copy using distcp. > * Include a staging folder in the directory. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop fs -ls > /tmp/tocopy > Found 4 items > -rw-r--r-- 3 hdfs hdfs 16 2019-09-12 10:32 /tmp/tocopy/.b.txt > drwxr-xr-x - hdfs hdfs 0 2019-09-23 09:18 /tmp/tocopy/.staging > -rw-r--r-- 3 hdfs hdfs 12 2019-09-12 10:32 /tmp/tocopy/a.txt > -rw-r--r-- 3 hdfs hdfs 4 2019-09-20 08:23 /tmp/tocopy/foo.txt{code} > * The exclusion filter is set to exclude any staging directory > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ cat > /tmp/filter > .*\.Trash.* > .*\.staging.*{code} > * Do a copy using distcp snapshots, the staging directory is not replicated. > {code:java} > hadoop jar hadoop-distcp-3.3.0-SNAPSHOT.jar > -Dmapreduce.job.user.classpath.first=true -filters /tmp/filter > /tmp/tocopy/.snapshot/s1 /tmp/target > [hdfs@ctr-e141-1563959304486-33995-01-03 root]$ hadoop fs -ls /tmp/target > Found 3 items > -rw-r--r-- 3 hdfs hdfs 16 2019-09-24 06:56 /tmp/target/.b.txt > -rw-r--r-- 3 hdfs hdfs 12 2019-09-24 06:56 /tmp/target/a.txt > -rw-r--r-- 3 hdfs hdfs 4 2019-09-24 06:56 /tmp/target/foo.txt{code} > * Rename the staging directory to final > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop fs -mv > /tmp/tocopy/.staging /tmp/tocopy/final{code} > * Do a copy using snapshot diff. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hdfs > snapshotDiff /tmp/tocopy s1 s2[hdfs@ctr-e141-1563959304486-33995-01-03 > hadoop-mapreduce]$ hdfs snapshotDiff /tmp/tocopy s1 s2Difference between > snapshot s1 and snapshot s2 under directory /tmp/tocopy:M .R ./.staging -> > ./final > {code} > * The diff report just has a rename record and the new final directory is > never copied. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop jar > hadoop-distcp-3.3.0-SNAPSHOT.jar -Dmapreduce.job.user.classpath.first=true > -filters /tmp/filter -diff s1 s2 -update /tmp/tocopy /tmp/target > 19/09/24 07:05:32 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, overwrite=false, append=false, useDiff=true, > useRdiff=false, fromSnapshot=s1, toSnapshot=s2, skipCRC=false, blocking=true, > numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, > copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, > logPath=null, sourceFileListing=null, sourcePaths=[/tmp/tocopy], > targetPath=/tmp/target, filtersFile='/tmp/filter', blocksPerChunk=0, > copyBufferSize=8192, verboseLog=false, directWrite=false}, > sourcePaths=[/tmp/tocopy], targetPathExists=true, preserveRawXattrsfalse > 19/09/24 07:05:32 INFO client.RMProxy: Connecting to ResourceManager at > ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:8050 > 19/09/24 07:05:33 INFO client.AHSProxy: Connecting to Application History > server at ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:10200 > 19/09/24 07:05:33 INFO tools.DistCp: Number of paths in the copy list: 0 > 19/09/24 07:05:33 INFO client.RMProxy: Connecting to ResourceManager at > ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:8050 > 19/09/24 07:05:33 INFO client.AHSProxy: Connecting to Application History > server at ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:10200 > 19/09/24 07:05:33 INFO
[jira] [Assigned] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101
[ https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDFS-15012: -- Assignee: Shashikant Banerjee > NN fails to parse Edit logs after applying HDFS-13101 > - > > Key: HDFS-15012 > URL: https://issues.apache.org/jira/browse/HDFS-15012 > Project: Hadoop HDFS > Issue Type: Bug > Components: nn >Reporter: Eric Lin >Assignee: Shashikant Banerjee >Priority: Critical > > After applying HDFS-13101, and deleting and creating large number of > snapshots, SNN exited with below error: > > {code:sh} > 2019-11-18 08:28:06,528 ERROR > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception > on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, > snapshotName=distcp-3479-31-old, > RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc > CallId=1] > java.lang.AssertionError: Element already exists: > element=partition_isactive=true, DELETED=[partition_isactive=true] > at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193) > at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239) > at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332) > at > org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235) > at > org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547) > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615) > {code} > We confirmed that fsimage and edit files were NOT corrupted, as reverting > HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken > and failed to parse edit log files. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2542) Race condition between read and write stateMachineData
[ https://issues.apache.org/jira/browse/HDDS-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-2542: - Assignee: Shashikant Banerjee > Race condition between read and write stateMachineData > -- > > Key: HDDS-2542 > URL: https://issues.apache.org/jira/browse/HDDS-2542 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Marton Elek >Assignee: Shashikant Banerjee >Priority: Critical > > The write payload (the chunk itself) is sent to the Ratis as an external, > binary byte array. It's not part of the LogEntry and saved from an async > thread with calling ContainerStateMachine.writeStateMachineData > > As it's an async thread it's possible that the stateMachineData is not yet > written when the data should be sent to the followers in the next heartbeat. > By design a cache is used to avoid this issue but there are multiple problems > with the cache. > First, the current cache size is chunkExecutor.getCorePoolSize() which is not > enough. By default it means 60 executor threads and a cache with size 60. But > in case of one very slow and 59 very fast writer the cache entries can be > invalidated before the write. > In my tests (freon datanode-chunk-writer-generator) I have seen missed cache > hits even with cache size 5000. > Second: as the readStateMachineData and writeStateMachien data are called > from two different thread there is a race condition independent from the the > cache size. It's possible that the write thread has not yet added the data to > the cache but the read thread needs it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2509) Code cleanup in replication package
[ https://issues.apache.org/jira/browse/HDDS-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2509: -- Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~adoroszlai] for working on this. I have committed this. > Code cleanup in replication package > --- > > Key: HDDS-2509 > URL: https://issues.apache.org/jira/browse/HDDS-2509 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available, sonar > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > Fix couple of [issues > reported|https://sonarcloud.io/project/issues?directories=hadoop-hdds%2Fcontainer-service%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fhadoop%2Fozone%2Fcontainer%2Freplication%2Chadoop-hdds%2Fcontainer-service%2Fsrc%2Ftest%2Fjava%2Forg%2Fapache%2Fhadoop%2Fozone%2Fcontainer%2Freplication=hadoop-ozone=false] > in {{org.apache.hadoop.ozone.container.replication}} package. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException
[ https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-2372: - Assignee: Marton Elek (was: Shashikant Banerjee) > Datanode pipeline is failing with NoSuchFileException > - > > Key: HDDS-2372 > URL: https://issues.apache.org/jira/browse/HDDS-2372 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Marton Elek >Assignee: Marton Elek >Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > Found it on a k8s based test cluster using a simple 3 node cluster and > HDDS-2327 freon test. After a while the StateMachine become unhealthy after > this error: > {code:java} > datanode-0 datanode java.util.concurrent.ExecutionException: > java.util.concurrent.ExecutionException: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > java.nio.file.NoSuchFileException: > /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830 > {code} > Can be reproduced. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException
[ https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969379#comment-16969379 ] Shashikant Banerjee commented on HDDS-2372: --- In ratis, raft log entries can get truncated after leader election happens. The data write actually happens as a part of append the log entry itself. Currently, if the raft log gets truncated , we don't do any handling for those entries i.e, we don't delete/validate the chunk files written as a part of log entry itself as the the data always exist in the tmp files which is stamped with the term and log index which are not visible and will remain as garbage even if the corresponding log entries in the raft log have been truncated. If we write to the actual chunk file which happens as a part of writing the log itself, then correspondingly, if the those log entries get truncated, we might need to handle this inside ozone by deleting the corresponding chunk files as well to maintain the consistency or have to validate the data while updating the rocks db entries as well. > Datanode pipeline is failing with NoSuchFileException > - > > Key: HDDS-2372 > URL: https://issues.apache.org/jira/browse/HDDS-2372 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Marton Elek >Assignee: Shashikant Banerjee >Priority: Critical > > Found it on a k8s based test cluster using a simple 3 node cluster and > HDDS-2327 freon test. After a while the StateMachine become unhealthy after > this error: > {code:java} > datanode-0 datanode java.util.concurrent.ExecutionException: > java.util.concurrent.ExecutionException: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > java.nio.file.NoSuchFileException: > /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830 > {code} > Can be reproduced. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException
[ https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968960#comment-16968960 ] Shashikant Banerjee commented on HDDS-2372: --- Thanks [~aengineer] for the suggestion. Writing to the actual chunk file may lead to handling truncation log entries in Ratis inside Ozone which we don't need to handle right now as we always write to tmp chunk files. Even if log entries get truncated inside Ratis , tmp files are left behind as garbage. > Datanode pipeline is failing with NoSuchFileException > - > > Key: HDDS-2372 > URL: https://issues.apache.org/jira/browse/HDDS-2372 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Marton Elek >Assignee: Shashikant Banerjee >Priority: Critical > > Found it on a k8s based test cluster using a simple 3 node cluster and > HDDS-2327 freon test. After a while the StateMachine become unhealthy after > this error: > {code:java} > datanode-0 datanode java.util.concurrent.ExecutionException: > java.util.concurrent.ExecutionException: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > java.nio.file.NoSuchFileException: > /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830 > {code} > Can be reproduced. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2270) Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet
[ https://issues.apache.org/jira/browse/HDDS-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2270: -- Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~adoroszlai] for working on this. I have committed this. > Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet > -- > > Key: HDDS-2270 > URL: https://issues.apache.org/jira/browse/HDDS-2270 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone Datanode >Reporter: Tsz-wo Sze >Assignee: Attila Doroszlai >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > ContainerStateMachine: > - In loadSnapshot(..), it first reads the snapshotFile to a byte[] and then > parses it to ContainerProtos.Container2BCSIDMapProto. The buffer copying can > be avoided. > {code} > try (FileInputStream fin = new FileInputStream(snapshotFile)) { > byte[] container2BCSIDData = IOUtils.toByteArray(fin); > ContainerProtos.Container2BCSIDMapProto proto = > ContainerProtos.Container2BCSIDMapProto > .parseFrom(container2BCSIDData); > ... > } > {code} > - persistContainerSet(..) has similar problem. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2407) Reduce log level of per-node failure in XceiverClientGrpc
[ https://issues.apache.org/jira/browse/HDDS-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2407: -- Fix Version/s: 0.5.0 Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~adoroszlai] for the contribution. I have committed this. > Reduce log level of per-node failure in XceiverClientGrpc > - > > Key: HDDS-2407 > URL: https://issues.apache.org/jira/browse/HDDS-2407 > Project: Hadoop Distributed Data Store > Issue Type: Task > Components: Ozone Client >Reporter: Attila Doroszlai >Assignee: Attila Doroszlai >Priority: Minor > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 20m > Remaining Estimate: 0h > > When reading from a pipeline, client should not care if some datanode could > not service the request, as long as the pipeline as a whole is OK. The [log > message|https://github.com/apache/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304] > indicating node failure was [increased to error > level|https://github.com/apache/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288] > in HDDS-1780. This task proposes to change it back to debug. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2388) Teragen test failure due to OM exception
[ https://issues.apache.org/jira/browse/HDDS-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966770#comment-16966770 ] Shashikant Banerjee commented on HDDS-2388: --- [~avijayan], OM was not crashing because of this but because of HDDS-2379. But, this exception showed up quite a few times in the test. > Teragen test failure due to OM exception > > > Key: HDDS-2388 > URL: https://issues.apache.org/jira/browse/HDDS-2388 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Hanisha Koneru >Priority: Major > Fix For: 0.5.0 > > > Ran into below exception while running teragen: > {code:java} > Unable to get delta updates since sequenceNumber 79932 > org.rocksdb.RocksDBException: Requested sequence not yet written in the db > at org.rocksdb.RocksDB.getUpdatesSince(Native Method) > at org.rocksdb.RocksDB.getUpdatesSince(RocksDB.java:3587) > at > org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(RDBStore.java:338) > at > org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(OzoneManager.java:3283) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(OzoneManagerRequestHandler.java:404) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handle(OzoneManagerRequestHandler.java:314) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:219) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:134) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:102) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:984) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:912) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException
[ https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-2372: - Assignee: Shashikant Banerjee > Datanode pipeline is failing with NoSuchFileException > - > > Key: HDDS-2372 > URL: https://issues.apache.org/jira/browse/HDDS-2372 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Marton Elek >Assignee: Shashikant Banerjee >Priority: Critical > > Found it on a k8s based test cluster using a simple 3 node cluster and > HDDS-2327 freon test. After a while the StateMachine become unhealthy after > this error: > {code:java} > datanode-0 datanode java.util.concurrent.ExecutionException: > java.util.concurrent.ExecutionException: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > java.nio.file.NoSuchFileException: > /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830 > {code} > Can be reproduced. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2388) Teragen test failure due to OM exception
[ https://issues.apache.org/jira/browse/HDDS-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963831#comment-16963831 ] Shashikant Banerjee commented on HDDS-2388: --- cc - [~bharat] > Teragen test failure due to OM exception > > > Key: HDDS-2388 > URL: https://issues.apache.org/jira/browse/HDDS-2388 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Ran into below exception while running teragen: > {code:java} > Unable to get delta updates since sequenceNumber 79932 > org.rocksdb.RocksDBException: Requested sequence not yet written in the db > at org.rocksdb.RocksDB.getUpdatesSince(Native Method) > at org.rocksdb.RocksDB.getUpdatesSince(RocksDB.java:3587) > at > org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(RDBStore.java:338) > at > org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(OzoneManager.java:3283) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(OzoneManagerRequestHandler.java:404) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handle(OzoneManagerRequestHandler.java:314) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:219) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:134) > at > org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) > at > org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:102) > at > org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:984) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:912) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2388) Teragen test failure due to OM exception
Shashikant Banerjee created HDDS-2388: - Summary: Teragen test failure due to OM exception Key: HDDS-2388 URL: https://issues.apache.org/jira/browse/HDDS-2388 Project: Hadoop Distributed Data Store Issue Type: Bug Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Fix For: 0.5.0 Ran into below exception while running teragen: {code:java} Unable to get delta updates since sequenceNumber 79932 org.rocksdb.RocksDBException: Requested sequence not yet written in the db at org.rocksdb.RocksDB.getUpdatesSince(Native Method) at org.rocksdb.RocksDB.getUpdatesSince(RocksDB.java:3587) at org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(RDBStore.java:338) at org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(OzoneManager.java:3283) at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(OzoneManagerRequestHandler.java:404) at org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handle(OzoneManagerRequestHandler.java:314) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:219) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:134) at org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72) at org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:102) at org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:984) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:912) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException
[ https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963813#comment-16963813 ] Shashikant Banerjee edited comment on HDDS-2372 at 10/31/19 9:52 AM: - Thanks [~elek] . I do agree that, there is no synchronisation between readStateMachineData and applyTransaction which may lead to NoSuchFile exception as you suggested but the appendRequest will be retried in the leader and the system should recover thereafter once the commit of writeChunk completes. In teragen testing as well, i ran into same issue but my test did complete. Can you share the logs for this? was (Author: shashikant): Thanks [~elek] . I do agree that, there is no synchronisation between readStateMachineData and applyTransaction which may lead to NoSuchFile exception as you suggested but the appendRequest will be retried in the leader and the system should recover thereafter once the commit of writeChunk completes. In teragen testing as well, i ran into same issue but my test did complete. Can you share the logs/test to reproduce this? > Datanode pipeline is failing with NoSuchFileException > - > > Key: HDDS-2372 > URL: https://issues.apache.org/jira/browse/HDDS-2372 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Marton Elek >Priority: Critical > > Found it on a k8s based test cluster using a simple 3 node cluster and > HDDS-2327 freon test. After a while the StateMachine become unhealthy after > this error: > {code:java} > datanode-0 datanode java.util.concurrent.ExecutionException: > java.util.concurrent.ExecutionException: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > java.nio.file.NoSuchFileException: > /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830 > {code} > Can be reproduced. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException
[ https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963815#comment-16963815 ] Shashikant Banerjee commented on HDDS-2372: --- [~szetszwo], to answer your question precisely, while reading the data from stateMachine, it first checks whether the chunk file does exist. If this exists, it reads from the actual chunk file and if it does not exist, it reads from the temporary chunk file. > Datanode pipeline is failing with NoSuchFileException > - > > Key: HDDS-2372 > URL: https://issues.apache.org/jira/browse/HDDS-2372 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Marton Elek >Priority: Critical > > Found it on a k8s based test cluster using a simple 3 node cluster and > HDDS-2327 freon test. After a while the StateMachine become unhealthy after > this error: > {code:java} > datanode-0 datanode java.util.concurrent.ExecutionException: > java.util.concurrent.ExecutionException: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > java.nio.file.NoSuchFileException: > /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830 > {code} > Can be reproduced. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException
[ https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963813#comment-16963813 ] Shashikant Banerjee commented on HDDS-2372: --- Thanks [~elek] . I do agree that, there is no synchronisation between readStateMachineData and applyTransaction which may lead to NoSuchFile exception as you suggested but the appendRequest will be retried in the leader and the system should recover thereafter once the commit of writeChunk completes. In teragen testing as well, i ran into same issue but my test did complete. Can you share the logs/test to reproduce this? > Datanode pipeline is failing with NoSuchFileException > - > > Key: HDDS-2372 > URL: https://issues.apache.org/jira/browse/HDDS-2372 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Marton Elek >Priority: Critical > > Found it on a k8s based test cluster using a simple 3 node cluster and > HDDS-2327 freon test. After a while the StateMachine become unhealthy after > this error: > {code:java} > datanode-0 datanode java.util.concurrent.ExecutionException: > java.util.concurrent.ExecutionException: > org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: > java.nio.file.NoSuchFileException: > /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830 > {code} > Can be reproduced. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2331) Client OOME due to buffer retention
[ https://issues.apache.org/jira/browse/HDDS-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955856#comment-16955856 ] Shashikant Banerjee commented on HDDS-2331: --- One more point to add here is, as per the cmd, freon is run with a single thread itself, which i assume will write only 1 key at a time. For a 1MB key, we should have only at max 2 containerCommadRequestMsgs(one for write chunk and one for putBlock) and these objects should disappear/garbage collected after the response is received. At any point of time, it should not hence show up 150+ containerCommandRequestObjects lying around. > Client OOME due to buffer retention > --- > > Key: HDDS-2331 > URL: https://issues.apache.org/jira/browse/HDDS-2331 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Shashikant Banerjee >Priority: Critical > Attachments: profiler.png > > > Freon random key generator exhausts default heap after just few hundred 1MB > keys. Heap dump on OOME reveals 150+ instances of > {{ContainerCommandRequestMessage}}, each with 16MB {{byte[]}}. > Steps to reproduce: > # Start Ozone cluster with 1 datanode > # Start Freon (5K keys of size 1MB) > Result: OOME after a few hundred keys > {noformat} > $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone > $ docker-compose up -d > $ docker-compose exec scm bash > $ export HADOOP_OPTS='-XX:+HeapDumpOnOutOfMemoryError' > $ ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 > --replicationType RATIS --factor ONE --keySize 1048576 --numOfKeys 5120 > --bufferSize 65536 > ... > java.lang.OutOfMemoryError: Java heap space > Dumping heap to java_pid289.hprof ... > Heap dump file created [1456141975 bytes in 7.760 secs] > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Deleted] (HDDS-2338) Avoid buffer copy while submitting write chunk request in Ozone Client
[ https://issues.apache.org/jira/browse/HDDS-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee deleted HDDS-2338: -- > Avoid buffer copy while submitting write chunk request in Ozone Client > -- > > Key: HDDS-2338 > URL: https://issues.apache.org/jira/browse/HDDS-2338 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > > Based, on the config value of "ozone.UnsafeByteOperations.enabled" which by > default is set to true we used to avoid buffer copy while submitting write > chunk request to Ratis. With recent changes around byteStringConversion > utility, seems like the config is never passed to BlockOutputStream and it > results in buffer copying every time there is a byteBuffer to byteString > conversion is done in ozone client. This Jira is to pass the appropriate > config value so that buffer copy can be avoided. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2338) Avoid buffer copy while submitting write chunk request in Ozone Client
[ https://issues.apache.org/jira/browse/HDDS-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-2338: - Assignee: Shashikant Banerjee > Avoid buffer copy while submitting write chunk request in Ozone Client > -- > > Key: HDDS-2338 > URL: https://issues.apache.org/jira/browse/HDDS-2338 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Fix For: 0.5.0 > > > Based, on the config value of "ozone.UnsafeByteOperations.enabled" which by > default is set to true we used to avoid buffer copy while submitting write > chunk request to Ratis. With recent changes around byteStringConversion > utility, seems like the config is never passed to BlockOutputStream and it > results in buffer copying every time there is a byteBuffer to byteString > conversion is done in ozone client. This Jira is to pass the appropriate > config value so that buffer copy can be avoided. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2338) Avoid buffer copy while submitting write chunk request in Ozone Client
Shashikant Banerjee created HDDS-2338: - Summary: Avoid buffer copy while submitting write chunk request in Ozone Client Key: HDDS-2338 URL: https://issues.apache.org/jira/browse/HDDS-2338 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Reporter: Shashikant Banerjee Fix For: 0.5.0 Based, on the config value of "ozone.UnsafeByteOperations.enabled" which by default is set to true we used to avoid buffer copy while submitting write chunk request to Ratis. With recent changes around byteStringConversion utility, seems like the config is never passed to BlockOutputStream and it results in buffer copying every time there is a byteBuffer to byteString conversion is done in ozone client. This Jira is to pass the appropriate config value so that buffer copy can be avoided. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDDS-2331) Client OOME due to buffer retention
[ https://issues.apache.org/jira/browse/HDDS-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955814#comment-16955814 ] Shashikant Banerjee edited comment on HDDS-2331 at 10/21/19 7:04 AM: - In Ozone, by default the buffer size is equal to the chunk size(16 MB default). Once a write call happens, a buffer is allocated and data is being written just into the buffer till it gets full/flush/close and then pushed to datanode and the buffer gets only released when watchForCommit call for the respective putBlock call log Index succeeds successfully. So until and unless, watchForCommit call gets acknowledged by Ozone Client, we keep holding onto the buffer so that, in case the ratis request fails, we have the user data cached in the client buffer which can be written over to the next block. We have had multiple discussions around on reducing the default buffer size and implement a true streaming client, but this is still under consideration. [~adoroszlai], for your test, you can try changing the default chunk size to say 1 MB and see if it works well. It might also be possible that buffer release handling got broken with some changes introduced which need to be verified. was (Author: shashikant): In Ozone, by default the buffer size is equal to the chunk size(16 MB default). Once a write call happens, a buffer is allocated and data is being written just into the buffer till it gets full/flush/close and then pushed to datanode and the buffer gets only released when watchForCommit call for the respective putBlock call log Index succeeds successfully. So until and unless, watchForCommit call gets acknowledged by Ratis, we keep holding onto the buffer so that, in case the ratis request fails, we have the user data cached in the client buffer which can be written over to the next block. We have had multiple discussions around this on reducing the default buffer size and implement a true streaming client, but this is still under consideration. [~adoroszlai], for your test, you can try changing the default chunk size to say 1 MB and see if it works well. It might also be possible that buffer release handling got broken with some changes introduced which need to be verified. > Client OOME due to buffer retention > --- > > Key: HDDS-2331 > URL: https://issues.apache.org/jira/browse/HDDS-2331 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Shashikant Banerjee >Priority: Critical > Attachments: profiler.png > > > Freon random key generator exhausts default heap after just few hundred 1MB > keys. Heap dump on OOME reveals 150+ instances of > {{ContainerCommandRequestMessage}}, each with 16MB {{byte[]}}. > Steps to reproduce: > # Start Ozone cluster with 1 datanode > # Start Freon (5K keys of size 1MB) > Result: OOME after a few hundred keys > {noformat} > $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone > $ docker-compose up -d > $ docker-compose exec scm bash > $ export HADOOP_OPTS='-XX:+HeapDumpOnOutOfMemoryError' > $ ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 > --replicationType RATIS --factor ONE --keySize 1048576 --numOfKeys 5120 > --bufferSize 65536 > ... > java.lang.OutOfMemoryError: Java heap space > Dumping heap to java_pid289.hprof ... > Heap dump file created [1456141975 bytes in 7.760 secs] > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2331) Client OOME due to buffer retention
[ https://issues.apache.org/jira/browse/HDDS-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955814#comment-16955814 ] Shashikant Banerjee commented on HDDS-2331: --- In Ozone, by default the buffer size is equal to the chunk size(16 MB default). Once a write call happens, a buffer is allocated and data is being written just into the buffer till it gets full/flush/close and then pushed to datanode and the buffer gets only released when watchForCommit call for the respective putBlock call log Index succeeds successfully. So until and unless, watchForCommit call gets acknowledged by Ratis, we keep holding onto the buffer so that, in case the ratis request fails, we have the user data cached in the client buffer which can be written over to the next block. We have had multiple discussions around this on reducing the default buffer size and implement a true streaming client, but this is still under consideration. [~adoroszlai], for your test, you can try changing the default chunk size to say 1 MB and see if it works well. It might also be possible that buffer release handling got broken with some changes introduced which need to be verified. > Client OOME due to buffer retention > --- > > Key: HDDS-2331 > URL: https://issues.apache.org/jira/browse/HDDS-2331 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Shashikant Banerjee >Priority: Critical > Attachments: profiler.png > > > Freon random key generator exhausts default heap after just few hundred 1MB > keys. Heap dump on OOME reveals 150+ instances of > {{ContainerCommandRequestMessage}}, each with 16MB {{byte[]}}. > Steps to reproduce: > # Start Ozone cluster with 1 datanode > # Start Freon (5K keys of size 1MB) > Result: OOME after a few hundred keys > {noformat} > $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone > $ docker-compose up -d > $ docker-compose exec scm bash > $ export HADOOP_OPTS='-XX:+HeapDumpOnOutOfMemoryError' > $ ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 > --replicationType RATIS --factor ONE --keySize 1048576 --numOfKeys 5120 > --bufferSize 65536 > ... > java.lang.OutOfMemoryError: Java heap space > Dumping heap to java_pid289.hprof ... > Heap dump file created [1456141975 bytes in 7.760 secs] > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2331) Client OOME due to buffer retention
[ https://issues.apache.org/jira/browse/HDDS-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-2331: - Assignee: Shashikant Banerjee > Client OOME due to buffer retention > --- > > Key: HDDS-2331 > URL: https://issues.apache.org/jira/browse/HDDS-2331 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Attila Doroszlai >Assignee: Shashikant Banerjee >Priority: Critical > Attachments: profiler.png > > > Freon random key generator exhausts default heap after just few hundred 1MB > keys. Heap dump on OOME reveals 150+ instances of > {{ContainerCommandRequestMessage}}, each with 16MB {{byte[]}}. > Steps to reproduce: > # Start Ozone cluster with 1 datanode > # Start Freon (5K keys of size 1MB) > Result: OOME after a few hundred keys > {noformat} > $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone > $ docker-compose up -d > $ docker-compose exec scm bash > $ export HADOOP_OPTS='-XX:+HeapDumpOnOutOfMemoryError' > $ ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 > --replicationType RATIS --factor ONE --keySize 1048576 --numOfKeys 5120 > --bufferSize 65536 > ... > java.lang.OutOfMemoryError: Java heap space > Dumping heap to java_pid289.hprof ... > Heap dump file created [1456141975 bytes in 7.760 secs] > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2280) HddsUtils#CheckForException should not return null in case the ratis exception cause is not set
[ https://issues.apache.org/jira/browse/HDDS-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954702#comment-16954702 ] Shashikant Banerjee commented on HDDS-2280: --- st > HddsUtils#CheckForException should not return null in case the ratis > exception cause is not set > --- > > Key: HDDS-2280 > URL: https://issues.apache.org/jira/browse/HDDS-2280 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > HddsUtils#CheckForException checks for the cause to be set properly to one of > the defined/expected exceptions. In case, ratis throws up any runtime > exception, HddsUtils#CheckForException can return null and lead to > NullPointerException while write. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2332) BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future
[ https://issues.apache.org/jira/browse/HDDS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954697#comment-16954697 ] Shashikant Banerjee commented on HDDS-2332: --- [~ljain], i think we should timeout all requests in ozone client. > BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future > --- > > Key: HDDS-2332 > URL: https://issues.apache.org/jira/browse/HDDS-2332 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Lokesh Jain >Priority: Major > > BlockOutputStream blocks on waitOnFlushFutures call. Two jstacks show that > the thread is blocked on the same condition. > {code:java} > 2019-10-18 06:30:38 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode): > "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on > condition [0x7fbea96d6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xe4739888> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at > java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729) > at > java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:439) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:232) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190) > at > org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46) > at > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57) > at java.io.DataOutputStream.write(DataOutputStream.java:107) > - locked <0xa6a75930> (a > org.apache.hadoop.fs.FSDataOutputStream) > at > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:77) > - locked <0xa6a75918> (a > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter) > at > org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:64) > at > org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670) > at > org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) > at > org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) > at > org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230) > at > org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:203) > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168) > 2019-10-18 07:02:50 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode): > "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on > condition [0x7fbea96d6000] >java.lang.Thread.State: WAITING (parking) > at sun.misc.Unsafe.park(Native Method) > - parking to wait for <0xe4739888> (a > java.util.concurrent.CompletableFuture$Signaller) > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > at > java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693) > at > java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323) > at >
[jira] [Created] (HDDS-2286) Add a log info in ozone client and scm to print the exclusion list during allocate block
Shashikant Banerjee created HDDS-2286: - Summary: Add a log info in ozone client and scm to print the exclusion list during allocate block Key: HDDS-2286 URL: https://issues.apache.org/jira/browse/HDDS-2286 Project: Hadoop Distributed Data Store Issue Type: Bug Affects Versions: 0.5.0 Reporter: Shashikant Banerjee -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2281) ContainerStateMachine#handleWriteChunk should ignore close container exception
Shashikant Banerjee created HDDS-2281: - Summary: ContainerStateMachine#handleWriteChunk should ignore close container exception Key: HDDS-2281 URL: https://issues.apache.org/jira/browse/HDDS-2281 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 Currently, ContainerStateMachine#applyTrannsaction ignores close container exception.Similarly,ContainerStateMachine#handleWriteChunk call also should ignore close container exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2280) HddsUtils#CheckForException should not return null in case the ratis exception cause is not set
[ https://issues.apache.org/jira/browse/HDDS-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2280: -- Summary: HddsUtils#CheckForException should not return null in case the ratis exception cause is not set (was: HddsUtils#CheckForException may return null in case the ratis exception cause is not set) > HddsUtils#CheckForException should not return null in case the ratis > exception cause is not set > --- > > Key: HDDS-2280 > URL: https://issues.apache.org/jira/browse/HDDS-2280 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > > HddsUtils#CheckForException checks for the cause to be set properly to one of > the defined/expected exceptions. In case, ratis throws up any runtime > exception, HddsUtils#CheckForException can return null and lead to > NullPointerException while write. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2280) HddsUtils#CheckForException may return null in case the ratis exception cause is not set
Shashikant Banerjee created HDDS-2280: - Summary: HddsUtils#CheckForException may return null in case the ratis exception cause is not set Key: HDDS-2280 URL: https://issues.apache.org/jira/browse/HDDS-2280 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee HddsUtils#CheckForException checks for the cause to be set properly to one of the defined/expected exceptions. In case, ratis throws up any runtime exception, HddsUtils#CheckForException can return null and lead to NullPointerException while write. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2266) Avoid evaluation of LOG.trace and LOG.debug statement in the read/write path (Ozone)
[ https://issues.apache.org/jira/browse/HDDS-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-2266. --- Resolution: Fixed Thanks [~swagle] for the contribution. I have committed this. > Avoid evaluation of LOG.trace and LOG.debug statement in the read/write path > (Ozone) > > > Key: HDDS-2266 > URL: https://issues.apache.org/jira/browse/HDDS-2266 > Project: Hadoop Distributed Data Store > Issue Type: Improvement > Components: Ozone CLI, Ozone Manager >Affects Versions: 0.5.0 >Reporter: Siddharth Wagle >Assignee: Siddharth Wagle >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 50m > Remaining Estimate: 0h > > LOG.trace and LOG.debug with logging information will be evaluated even when > debug/trace logging is disabled. This jira proposes to wrap all the > trace/debug logging with > LOG.isDebugEnabled and LOG.isTraceEnabled to prevent the logging. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-14869) Data loss in case of distcp using snapshot diff. Replication should include rename records if file was skipped in the previous iteration
[ https://issues.apache.org/jira/browse/HDFS-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948318#comment-16948318 ] Shashikant Banerjee commented on HDFS-14869: [~ste...@apache.org], can you please have a look at this? > Data loss in case of distcp using snapshot diff. Replication should include > rename records if file was skipped in the previous iteration > > > Key: HDFS-14869 > URL: https://issues.apache.org/jira/browse/HDFS-14869 > Project: Hadoop HDFS > Issue Type: Bug > Components: distcp >Reporter: Aasha Medhi >Assignee: Aasha Medhi >Priority: Major > > This issue arises when a directory or file is excluded by exclusion filter > during distcp replication. Later on if the directory is renamed later to a > name which is not excluded by the filter, the snapshot diff reports only a > rename operation. The directory is never copied to target even though its > not excluded now. This also doesn't throw any error so there is no way to > find the issue. > Steps to reproduce > * Create a directory in hdfs to copy using distcp. > * Include a staging folder in the directory. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop fs -ls > /tmp/tocopy > Found 4 items > -rw-r--r-- 3 hdfs hdfs 16 2019-09-12 10:32 /tmp/tocopy/.b.txt > drwxr-xr-x - hdfs hdfs 0 2019-09-23 09:18 /tmp/tocopy/.staging > -rw-r--r-- 3 hdfs hdfs 12 2019-09-12 10:32 /tmp/tocopy/a.txt > -rw-r--r-- 3 hdfs hdfs 4 2019-09-20 08:23 /tmp/tocopy/foo.txt{code} > * The exclusion filter is set to exclude any staging directory > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ cat > /tmp/filter > .*\.Trash.* > .*\.staging.*{code} > * Do a copy using distcp snapshots, the staging directory is not replicated. > {code:java} > hadoop jar hadoop-distcp-3.3.0-SNAPSHOT.jar > -Dmapreduce.job.user.classpath.first=true -filters /tmp/filter > /tmp/tocopy/.snapshot/s1 /tmp/target > [hdfs@ctr-e141-1563959304486-33995-01-03 root]$ hadoop fs -ls /tmp/target > Found 3 items > -rw-r--r-- 3 hdfs hdfs 16 2019-09-24 06:56 /tmp/target/.b.txt > -rw-r--r-- 3 hdfs hdfs 12 2019-09-24 06:56 /tmp/target/a.txt > -rw-r--r-- 3 hdfs hdfs 4 2019-09-24 06:56 /tmp/target/foo.txt{code} > * Rename the staging directory to final > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop fs -mv > /tmp/tocopy/.staging /tmp/tocopy/final{code} > * Do a copy using snapshot diff. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hdfs > snapshotDiff /tmp/tocopy s1 s2[hdfs@ctr-e141-1563959304486-33995-01-03 > hadoop-mapreduce]$ hdfs snapshotDiff /tmp/tocopy s1 s2Difference between > snapshot s1 and snapshot s2 under directory /tmp/tocopy:M .R ./.staging -> > ./final > {code} > * The diff report just has a rename record and the new final directory is > never copied. > {code:java} > [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop jar > hadoop-distcp-3.3.0-SNAPSHOT.jar -Dmapreduce.job.user.classpath.first=true > -filters /tmp/filter -diff s1 s2 -update /tmp/tocopy /tmp/target > 19/09/24 07:05:32 INFO tools.DistCp: Input Options: > DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, > ignoreFailures=false, overwrite=false, append=false, useDiff=true, > useRdiff=false, fromSnapshot=s1, toSnapshot=s2, skipCRC=false, blocking=true, > numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, > copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, > logPath=null, sourceFileListing=null, sourcePaths=[/tmp/tocopy], > targetPath=/tmp/target, filtersFile='/tmp/filter', blocksPerChunk=0, > copyBufferSize=8192, verboseLog=false, directWrite=false}, > sourcePaths=[/tmp/tocopy], targetPathExists=true, preserveRawXattrsfalse > 19/09/24 07:05:32 INFO client.RMProxy: Connecting to ResourceManager at > ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:8050 > 19/09/24 07:05:33 INFO client.AHSProxy: Connecting to Application History > server at ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:10200 > 19/09/24 07:05:33 INFO tools.DistCp: Number of paths in the copy list: 0 > 19/09/24 07:05:33 INFO client.RMProxy: Connecting to ResourceManager at > ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:8050 > 19/09/24 07:05:33 INFO client.AHSProxy: Connecting to Application History > server at ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:10200 > 19/09/24 07:05:33 INFO mapreduce.JobResourceUploader: Disabling Erasure > Coding for path:
[jira] [Resolved] (HDDS-2261) Change readChunk methods to return ByteBuffer
[ https://issues.apache.org/jira/browse/HDDS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-2261. --- Resolution: Fixed > Change readChunk methods to return ByteBuffer > - > > Key: HDDS-2261 > URL: https://issues.apache.org/jira/browse/HDDS-2261 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Major > Labels: pull-request-available > > During refactoring to HDDS-2233 I realized the following: > KeyValueHandler.handleReadChunk and handleGetSmallFile methods are using > ChunkManager.readChunk, which returns a byte[], but then both of them (the > only usage points) converts the returning byte[] to a ByteBuffer, and then to > a ByteString. > ChunkManagerImpl on the other hand in readChunk utilizes > ChunkUtils.readChunk, which in order to conform the return value converts a > ByteBuffer back to a byte[]. > I open this JIRA to change the internal logic to fully rely on ByteBuffers > instead of converting from ByteBuffer to byte[] then to ByteBuffer again. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDDS-2233) Remove ByteStringHelper and refactor the code to the place where it used
[ https://issues.apache.org/jira/browse/HDDS-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDDS-2233. --- Fix Version/s: 0.5.0 Resolution: Fixed > Remove ByteStringHelper and refactor the code to the place where it used > > > Key: HDDS-2233 > URL: https://issues.apache.org/jira/browse/HDDS-2233 > Project: Hadoop Distributed Data Store > Issue Type: Bug >Reporter: Istvan Fajth >Assignee: Istvan Fajth >Priority: Critical > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 2h 20m > Remaining Estimate: 0h > > See HDDS-2203 where there is a race condition reported by me. > Later in the discussion we agreed that it is better to refactor the code and > remove the class completely for now, and that would also resolve the race > condition. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2169) Avoid buffer copies while submitting client requests in Ratis
[ https://issues.apache.org/jira/browse/HDDS-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2169: -- Resolution: Fixed Status: Resolved (was: Patch Available) > Avoid buffer copies while submitting client requests in Ratis > - > > Key: HDDS-2169 > URL: https://issues.apache.org/jira/browse/HDDS-2169 > Project: Hadoop Distributed Data Store > Issue Type: Improvement >Reporter: Shashikant Banerjee >Assignee: Tsz-wo Sze >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > Currently, while sending write requests to Ratis from ozone, a protobuf > object containing data encoded and then resultant protobuf is again > converted to a byteString which internally does a copy of the buffer embedded > inside the protobuf again so that it can be submitted over to Ratis client. > Again, while sending the appendRequest as well while building up the > appendRequestProto, it might be again copying the data. The idea here is to > provide client so pass the raw data(stateMachine data) separately to ratis > client without copying overhead. > > {code:java} > private CompletableFuture sendRequestAsync( > ContainerCommandRequestProto request) { > try (Scope scope = GlobalTracer.get() > .buildSpan("XceiverClientRatis." + request.getCmdType().name()) > .startActive(true)) { > ContainerCommandRequestProto finalPayload = > ContainerCommandRequestProto.newBuilder(request) > .setTraceID(TracingUtil.exportCurrentSpan()) > .build(); > boolean isReadOnlyRequest = HddsUtils.isReadOnly(finalPayload); > // finalPayload already has the byteString data embedded. > ByteString byteString = finalPayload.toByteString(); -> It involves a > copy again. > if (LOG.isDebugEnabled()) { > LOG.debug("sendCommandAsync {} {}", isReadOnlyRequest, > sanitizeForDebug(finalPayload)); > } > return isReadOnlyRequest ? > getClient().sendReadOnlyAsync(() -> byteString) : > getClient().sendAsync(() -> byteString); > } > } > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2210) ContainerStateMachine should not be marked unhealthy if applyTransaction fails with closed container exception
[ https://issues.apache.org/jira/browse/HDDS-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2210: -- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~msingh] for the review. I have committed this change to trunk. > ContainerStateMachine should not be marked unhealthy if applyTransaction > fails with closed container exception > -- > > Key: HDDS-2210 > URL: https://issues.apache.org/jira/browse/HDDS-2210 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Currently, if applyTransaction fails, the stateMachine is marked unhealthy > and next snapshot creation will fail. As a result of which the the raftServer > will close down leading to pipeline failure. ClosedContainer exception should > be ignored while marking the stateMachine unhealthy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-14492) Snapshot memory leak
[ https://issues.apache.org/jira/browse/HDFS-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee resolved HDFS-14492. Fix Version/s: 3.1.4 Resolution: Fixed Thanks [~jojochuang] for the contribution. I have committed this change to trunk. > Snapshot memory leak > > > Key: HDFS-14492 > URL: https://issues.apache.org/jira/browse/HDFS-14492 > Project: Hadoop HDFS > Issue Type: Bug > Components: snapshots >Affects Versions: 2.6.0 > Environment: CDH5.14.4 >Reporter: Wei-Chiu Chuang >Assignee: Wei-Chiu Chuang >Priority: Major > Fix For: 3.1.4 > > > We recently examined the NameNode heap dump of a big, heavy snapshot user, > trying to trim some fat, and surely enough we found memory leak in it: when > snapshots are removed, the corresponding data structures are not removed. > This cluster has 586 million file system objects (286 million files, 287 > million blocks, 13 million directories), using around 132gb of heap. > While only 44.5 million files have snapshotted copies, > (INodeFileAttributes$SnapshotCopy), most inodes (nearly 212 million) have > FileWithSnapshotFeature and FileDiffList. Those inodes had snapshotted copies > at some point in the past, but after snapshots are removed, those data > structured are still kept in the heap. > INode$Feature = 32.5 byte on average, FileWithSnapshotFeature = 32 bytes, > FileDiffList = 24 bytes. It may not sound a lot, but they add up quickly in > large clusters like this. In this cluster, a whopping 13.8gb of memory could > have been saved: ((32.5 + 32 + 24) bytes * (211997769 - 44572380) =~ > 13.8gb) if not for this bug. That is more than 10% of savings in heap size. > Heap histogram for reference: > {noformat} > num #instances #bytes class name > -- > 1: 286418254 27496152384 org.apache.hadoop.hdfs.server.namenode.INodeFile > 2: 28737 18388622528 > org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo > 3: 227899550 17144816120 [B > 4: 287324031 13769408616 > [Lorg.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo; > 5: 71352116 12353841568 [Ljava.lang.Object; > 6: 286322650 9170335840 > [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo; > 7: 235632329 7658462416 > [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature; > 8: 4 7046430816 [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement; > 9: 211997769 6783928608 > org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature > 10: 211997769 5087946456 > org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList > 11: 76586261 3780468856 [I > 12: 44572380 3209211360 > org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy > 13: 58634517 2345380680 java.util.ArrayList > 14: 44572380 2139474240 > org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff > 15: 76582416 1837977984 org.apache.hadoop.hdfs.server.namenode.AclFeature > 16: 12907668 1135874784 > org.apache.hadoop.hdfs.server.namenode.INodeDirectory{noformat} > [~szetszwo] [~arpaga] [~smeng] [~shashikant] any thoughts? > I am thinking that inside > AbstractINodeDiffList#deleteSnapshotDiff() , in addition to cleaning up file > diffs, it should also remove FileWithSnapshotFeature. I am not familiar with > the snapshot implementation, so any guidance is greatly appreciated. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2210) ContainerStateMachine should not be marked unhealthy if applyTransaction fails with closed container exception
[ https://issues.apache.org/jira/browse/HDDS-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2210: -- Status: Patch Available (was: Open) > ContainerStateMachine should not be marked unhealthy if applyTransaction > fails with closed container exception > -- > > Key: HDDS-2210 > URL: https://issues.apache.org/jira/browse/HDDS-2210 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 10m > Remaining Estimate: 0h > > Currently, if applyTransaction fails, the stateMachine is marked unhealthy > and next snapshot creation will fail. As a result of which the the raftServer > will close down leading to pipeline failure. ClosedContainer exception should > be ignored while marking the stateMachine unhealthy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2210) ContainerStateMachine should not be marked unhealthy if applyTransaction fails with closed container exception
Shashikant Banerjee created HDDS-2210: - Summary: ContainerStateMachine should not be marked unhealthy if applyTransaction fails with closed container exception Key: HDDS-2210 URL: https://issues.apache.org/jira/browse/HDDS-2210 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 Currently, if applyTransaction fails, the stateMachine is marked unhealthy and next snapshot creation will fail. As a result of which the the raftServer will close down leading to pipeline failure. ClosedContainer exception should be ignored while marking the stateMachine unhealthy. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDDS-2152) Ozone client fails with OOM while writing a large (~300MB) key.
[ https://issues.apache.org/jira/browse/HDDS-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee reassigned HDDS-2152: - Assignee: Shashikant Banerjee > Ozone client fails with OOM while writing a large (~300MB) key. > --- > > Key: HDDS-2152 > URL: https://issues.apache.org/jira/browse/HDDS-2152 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Aravindan Vijayan >Assignee: Shashikant Banerjee >Priority: Major > Attachments: largekey.png > > > {code} > dd if=/dev/zero of=testfile bs=1024 count=307200 > ozone sh key put /vol1/bucket1/key testfile > {code} > {code} > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at > java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at > java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at > org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:66) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:234) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:96) at > org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:117) > at > org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:55) > at picocli.CommandLine.execute(CommandLine.java:1173) at > picocli.CommandLine.access$800(CommandLine.java:141) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2207) Update Ratis to latest snnapshot
Shashikant Banerjee created HDDS-2207: - Summary: Update Ratis to latest snnapshot Key: HDDS-2207 URL: https://issues.apache.org/jira/browse/HDDS-2207 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Client Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 This Jira aims to update ozone with latest ratis snapshot which has a crtical fix for retry behaviour on getting not leader exception in client. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2152) Ozone client fails with OOM while writing a large (~300MB) key.
[ https://issues.apache.org/jira/browse/HDDS-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935959#comment-16935959 ] Shashikant Banerjee commented on HDDS-2152: --- [~jnp], this requires changes in Ratis as tracked by RATIS-688 plus corresponding changes in Ozone client as well Datannode. > Ozone client fails with OOM while writing a large (~300MB) key. > --- > > Key: HDDS-2152 > URL: https://issues.apache.org/jira/browse/HDDS-2152 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Aravindan Vijayan >Assignee: YiSheng Lien >Priority: Major > Attachments: largekey.png > > > {code} > dd if=/dev/zero of=testfile bs=1024 count=307200 > ozone sh key put /vol1/bucket1/key testfile > {code} > {code} > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at > java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at > java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at > org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:66) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:234) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:96) at > org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:117) > at > org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:55) > at picocli.CommandLine.execute(CommandLine.java:1173) at > picocli.CommandLine.access$800(CommandLine.java:141) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2152) Ozone client fails with OOM while writing a large (~300MB) key.
[ https://issues.apache.org/jira/browse/HDDS-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935663#comment-16935663 ] Shashikant Banerjee commented on HDDS-2152: --- The issue gets recreated when u try to create/write a key of size 300 MB as in the test with java heap to set to 256 MB or lower. This issue needs some discussions on how to avoid buffer copies while doing protobuf conversion as well as dig in areas of code where actual possible buffer copy is happening while writes or possible memory leaks. > Ozone client fails with OOM while writing a large (~300MB) key. > --- > > Key: HDDS-2152 > URL: https://issues.apache.org/jira/browse/HDDS-2152 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Aravindan Vijayan >Assignee: YiSheng Lien >Priority: Major > Attachments: largekey.png > > > {code} > dd if=/dev/zero of=testfile bs=1024 count=307200 > ozone sh key put /vol1/bucket1/key testfile > {code} > {code} > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at > java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at > java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at > org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:66) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:234) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:96) at > org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:117) > at > org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:55) > at picocli.CommandLine.execute(CommandLine.java:1173) at > picocli.CommandLine.access$800(CommandLine.java:141) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDDS-2152) Ozone client fails with OOM while writing a large (~300MB) key.
[ https://issues.apache.org/jira/browse/HDDS-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933588#comment-16933588 ] Shashikant Banerjee commented on HDDS-2152: --- [~cxorm], do you have a solution/fix to address this? > Ozone client fails with OOM while writing a large (~300MB) key. > --- > > Key: HDDS-2152 > URL: https://issues.apache.org/jira/browse/HDDS-2152 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Reporter: Aravindan Vijayan >Assignee: YiSheng Lien >Priority: Major > > {code} > dd if=/dev/zero of=testfile bs=1024 count=307200 > ozone sh key put /vol1/bucket1/key testfile > {code} > {code} > Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at > java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at > java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at > org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:66) > at > org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:234) > at > org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211) > at > org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193) > at > org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49) > at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:96) at > org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:117) > at > org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:55) > at picocli.CommandLine.execute(CommandLine.java:1173) at > picocli.CommandLine.access$800(CommandLine.java:141) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2153) Add a config to tune max pending requests in Ratis leader
[ https://issues.apache.org/jira/browse/HDDS-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2153: -- Status: Patch Available (was: Open) > Add a config to tune max pending requests in Ratis leader > - > > Key: HDDS-2153 > URL: https://issues.apache.org/jira/browse/HDDS-2153 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Datanode >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 10m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDDS-2153) Add a config to tune max pending requests in Ratis leader
Shashikant Banerjee created HDDS-2153: - Summary: Add a config to tune max pending requests in Ratis leader Key: HDDS-2153 URL: https://issues.apache.org/jira/browse/HDDS-2153 Project: Hadoop Distributed Data Store Issue Type: Bug Components: Ozone Datanode Affects Versions: 0.5.0 Reporter: Shashikant Banerjee Assignee: Shashikant Banerjee Fix For: 0.5.0 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDDS-2032) Ozone client should retry writes in case of any ratis/stateMachine exceptions
[ https://issues.apache.org/jira/browse/HDDS-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shashikant Banerjee updated HDDS-2032: -- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks [~msingh] for the review. I have committed this. > Ozone client should retry writes in case of any ratis/stateMachine exceptions > - > > Key: HDDS-2032 > URL: https://issues.apache.org/jira/browse/HDDS-2032 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Components: Ozone Client >Affects Versions: 0.5.0 >Reporter: Shashikant Banerjee >Assignee: Shashikant Banerjee >Priority: Major > Labels: pull-request-available > Fix For: 0.5.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Currently, Ozone client retry writes on a different pipeline or container in > case of some specific exceptions. But in case, it sees exception such as > DISK_FULL, CONTAINER_UNHEALTHY or any corruption , it just aborts the write. > In general, the every such exception on the client should be a retriable > exception in ozone client and on some specific exceptions, it should take > some more specific exception like excluding certain containers or pipelines > while retrying or informing SCM of a corrupt replica etc. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org