[jira] [Updated] (HDFS-15480) Ordered snapshot deletion: record snapshot deletion in XAttr

2020-07-20 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15480:
---
Attachment: HDFS-15480.000.patch

> Ordered snapshot deletion: record snapshot deletion in XAttr
> 
>
> Key: HDFS-15480
> URL: https://issues.apache.org/jira/browse/HDFS-15480
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Tsz-wo Sze
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-15480.000.patch
>
>
> In this JIRA, the behavior of deleting the non-earliest snapshots will be 
> changed to marking them as deleted in XAttr but not actually deleting them.  
> Note that
> # The marked-for-deletion snapshots will be garbage collected later on; see 
> HDFS-15481.
> # The marked-for-deletion snapshots will be hided from users; see HDFS-15482.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15480) Ordered snapshot deletion: record snapshot deletion in XAttr

2020-07-20 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDFS-15480:
--

Assignee: Shashikant Banerjee

> Ordered snapshot deletion: record snapshot deletion in XAttr
> 
>
> Key: HDFS-15480
> URL: https://issues.apache.org/jira/browse/HDFS-15480
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Tsz-wo Sze
>Assignee: Shashikant Banerjee
>Priority: Major
>
> In this JIRA, the behavior of deleting the non-earliest snapshots will be 
> changed to marking them as deleted in XAttr but not actually deleting them.  
> Note that
> # The marked-for-deletion snapshots will be garbage collected later on; see 
> HDFS-15481.
> # The marked-for-deletion snapshots will be hided from users; see HDFS-15482.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-07-20 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17161036#comment-17161036
 ] 

Shashikant Banerjee commented on HDFS-15313:


[^HDFS-15313.branch-2.10.001.patch] -> Patch for 2.10 branch

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5
>
> Attachments: HDFS-15313-branch-3.1.001.patch, HDFS-15313.000.patch, 
> HDFS-15313.001.patch, HDFS-15313.branch-2.10.001.patch, 
> HDFS-15313.branch-2.10.patch, HDFS-15313.branch-2.8.patch
>
>
> After HDFS-13101, it was observed in one of our customer deployments that 
> delete snapshot ends up cleaning up inodes from active fs which can be 
> referred from only one snapshot as the isLastReference() check for the parent 
> dir introduced in HDFS-13101 may return true in certain cases. The aim of 
> this Jira to add a check to ensure if the Inodes are being referred in the 
> active fs , should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-07-20 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15313:
---
Attachment: HDFS-15313.branch-2.10.001.patch

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0, 3.1.5
>
> Attachments: HDFS-15313-branch-3.1.001.patch, HDFS-15313.000.patch, 
> HDFS-15313.001.patch, HDFS-15313.branch-2.10.001.patch, 
> HDFS-15313.branch-2.10.patch, HDFS-15313.branch-2.8.patch
>
>
> After HDFS-13101, it was observed in one of our customer deployments that 
> delete snapshot ends up cleaning up inodes from active fs which can be 
> referred from only one snapshot as the isLastReference() check for the parent 
> dir introduced in HDFS-13101 may return true in certain cases. The aim of 
> this Jira to add a check to ensure if the Inodes are being referred in the 
> active fs , should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-15463) Add a tool to validate FsImage

2020-07-20 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDFS-15463.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Add a tool to validate FsImage
> --
>
> Key: HDFS-15463
> URL: https://issues.apache.org/jira/browse/HDFS-15463
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: FsImageValidation20200709.patch, 
> FsImageValidation20200712.patch, FsImageValidation20200714.patch, 
> FsImageValidation20200715.patch, FsImageValidation20200715b.patch, 
> FsImageValidation20200715c.patch, FsImageValidation20200717b.patch, 
> FsImageValidation20200718.patch, HDFS-15463.000.patch
>
>
> Due to some snapshot related bugs, a fsimage may become corrupted.  Using a 
> corrupted fsimage may further result in data loss.
> In some cases, we found that reference counts are incorrect in some corrupted 
> FsImage.  One of the goals of the validation tool is to check  reference 
> counts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-07-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15313:
---
Attachment: HDFS-15313.branch-2.10.patch

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0
>
> Attachments: HDFS-15313-branch-3.1.001.patch, HDFS-15313.000.patch, 
> HDFS-15313.001.patch, HDFS-15313.branch-2.10.patch, 
> HDFS-15313.branch-2.8.patch, HDFS-15313.branch-3.1.patch
>
>
> After HDFS-13101, it was observed in one of our customer deployments that 
> delete snapshot ends up cleaning up inodes from active fs which can be 
> referred from only one snapshot as the isLastReference() check for the parent 
> dir introduced in HDFS-13101 may return true in certain cases. The aim of 
> this Jira to add a check to ensure if the Inodes are being referred in the 
> active fs , should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15470) Added more unit tests to validate rename behaviour across snapshots

2020-07-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15470:
---
Attachment: HDFS-15470.002.patch

> Added more unit tests to validate rename behaviour across snapshots
> ---
>
> Key: HDFS-15470
> URL: https://issues.apache.org/jira/browse/HDFS-15470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.0.4
>
> Attachments: HDFS-15470.000.patch, HDFS-15470.001.patch, 
> HDFS-15470.002.patch
>
>
> HDFS-15313 fixes a critical issue which will avoid deletion of data in active 
> fs with a sequence of snapshot deletes. The idea is to add more tests to 
> verify the behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-07-15 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158176#comment-17158176
 ] 

Shashikant Banerjee commented on HDFS-15313:


Thanks [~sodonnell], i have uploaded patches for branch 2.8 as well as 3.1 . 
Please have a look.

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0
>
> Attachments: HDFS-15313.000.patch, HDFS-15313.001.patch, 
> HDFS-15313.branch-2.8.patch, HDFS-15313.branch-3.1.patch
>
>
> After HDFS-13101, it was observed in one of our customer deployments that 
> delete snapshot ends up cleaning up inodes from active fs which can be 
> referred from only one snapshot as the isLastReference() check for the parent 
> dir introduced in HDFS-13101 may return true in certain cases. The aim of 
> this Jira to add a check to ensure if the Inodes are being referred in the 
> active fs , should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-07-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15313:
---
Attachment: HDFS-15313.branch-3.1.patch

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0
>
> Attachments: HDFS-15313.000.patch, HDFS-15313.001.patch, 
> HDFS-15313.branch-2.8.patch, HDFS-15313.branch-3.1.patch
>
>
> After HDFS-13101, it was observed in one of our customer deployments that 
> delete snapshot ends up cleaning up inodes from active fs which can be 
> referred from only one snapshot as the isLastReference() check for the parent 
> dir introduced in HDFS-13101 may return true in certain cases. The aim of 
> this Jira to add a check to ensure if the Inodes are being referred in the 
> active fs , should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-07-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15313:
---
Attachment: HDFS-15313.branch-2.8.patch

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.2.2, 3.3.1, 3.4.0
>
> Attachments: HDFS-15313.000.patch, HDFS-15313.001.patch, 
> HDFS-15313.branch-2.8.patch, HDFS-15313.branch-3.1.patch
>
>
> After HDFS-13101, it was observed in one of our customer deployments that 
> delete snapshot ends up cleaning up inodes from active fs which can be 
> referred from only one snapshot as the isLastReference() check for the parent 
> dir introduced in HDFS-13101 may return true in certain cases. The aim of 
> this Jira to add a check to ensure if the Inodes are being referred in the 
> active fs , should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15463) Add a tool to validate FsImage

2020-07-15 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17158030#comment-17158030
 ] 

Shashikant Banerjee commented on HDFS-15463:


Thanks [~szetszwo] for working on this. 

HDFS-15463.000.patch -> rebased to latest trunk.

> Add a tool to validate FsImage
> --
>
> Key: HDFS-15463
> URL: https://issues.apache.org/jira/browse/HDFS-15463
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: FsImageValidation20200709.patch, 
> FsImageValidation20200712.patch, FsImageValidation20200714.patch, 
> FsImageValidation20200715.patch, HDFS-15463.000.patch
>
>
> Due to some snapshot related bugs, a fsimage may become corrupted.  Using a 
> corrupted fsimage may further result in data loss.
> In some cases, we found that reference counts are incorrect in some corrupted 
> FsImage.  One of the goals of the validation tool is to check  reference 
> counts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15463) Add a tool to validate FsImage

2020-07-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15463:
---
Attachment: HDFS-15463.000.patch

> Add a tool to validate FsImage
> --
>
> Key: HDFS-15463
> URL: https://issues.apache.org/jira/browse/HDFS-15463
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: FsImageValidation20200709.patch, 
> FsImageValidation20200712.patch, FsImageValidation20200714.patch, 
> FsImageValidation20200715.patch, HDFS-15463.000.patch
>
>
> Due to some snapshot related bugs, a fsimage may become corrupted.  Using a 
> corrupted fsimage may further result in data loss.
> In some cases, we found that reference counts are incorrect in some corrupted 
> FsImage.  One of the goals of the validation tool is to check  reference 
> counts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15463) Add a tool to validate FsImage

2020-07-15 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15463:
---
Status: Patch Available  (was: Open)

> Add a tool to validate FsImage
> --
>
> Key: HDFS-15463
> URL: https://issues.apache.org/jira/browse/HDFS-15463
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: namenode
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
> Attachments: FsImageValidation20200709.patch, 
> FsImageValidation20200712.patch, FsImageValidation20200714.patch, 
> FsImageValidation20200715.patch, HDFS-15463.000.patch
>
>
> Due to some snapshot related bugs, a fsimage may become corrupted.  Using a 
> corrupted fsimage may further result in data loss.
> In some cases, we found that reference counts are incorrect in some corrupted 
> FsImage.  One of the goals of the validation tool is to check  reference 
> counts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14504) Rename with Snapshots does not honor quota limit

2020-07-14 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157903#comment-17157903
 ] 

Shashikant Banerjee commented on HDFS-14504:


[~hemanthboyina], sorry for coming back to it late. Patch v2 looks good. +1

> Rename with Snapshots does not honor quota limit
> 
>
> Key: HDFS-14504
> URL: https://issues.apache.org/jira/browse/HDFS-14504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Hemanth Boyina
>Priority: Major
> Attachments: HDFS-14504.001.patch, HDFS-14504.002.patch
>
>
> Steps to Reproduce:
> 
> {code:java}
> HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2
> 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2
> 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2
> 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Allowing snapshot on /dir2 succeeded
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1
> 2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1
> 2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Created snapshot /dir2/.snapshot/snap1
> HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2
> 2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2
> 2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Found 1 items
> -rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex
> 2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> touchz: The NameSpace quota (directories and files) of directory /dir2 is 
> exceeded: quota=3 file count=4
> HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2
> 2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Created snapshot /dir2/.snapshot/snap2
> HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2
> 2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Found 1 items
> -rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2
> HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file2 /dir2/file3
> 2019-05-21 15:12:35,908 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filey
> 2019-05-21 15:12:49,998 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> touchz: The NameSpace quota (directories and files) of directory /dir2 is 
> exceeded: quota=3 file count=5
> {code}
> // create operation fails here as it has already exceeded the quota limit
> {code}
> HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap3
> 2019-05-21 15:13:07,656 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Created snapshot /dir2/.snapshot/snap3
> HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file3 /dir2/file4
> 2019-05-21 15:13:20,715 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> {code}
> // Rename operation succeeds here adding on to the namespace quota
> {code}
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filez
> 2019-05-21 15:13:30,486 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> touchz: The NameSpace quota (directories and files) of directory /dir2 is 
> exceeded: quota=3 file count=6

[jira] [Updated] (HDFS-15470) Added more unit tests to validate rename behaviour across snapshots

2020-07-14 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15470:
---
Attachment: HDFS-15470.001.patch

> Added more unit tests to validate rename behaviour across snapshots
> ---
>
> Key: HDFS-15470
> URL: https://issues.apache.org/jira/browse/HDFS-15470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.0.4
>
> Attachments: HDFS-15470.000.patch, HDFS-15470.001.patch
>
>
> HDFS-15313 fixes a critical issue which will avoid deletion of data in active 
> fs with a sequence of snapshot deletes. The idea is to add more tests to 
> verify the behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API

2020-07-14 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15319:
---
Fix Version/s: 3.0.4
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Fix INode#isInLatestSnapshot() API
> --
>
> Key: HDFS-15319
> URL: https://issues.apache.org/jira/browse/HDFS-15319
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.0.4
>
> Attachments: HDFS-15319.000.patch, HDFS-15319.001.patch
>
>
> isInLatestSnapshot() may return true in cases where an inode's ancesstors 
> might not be in the latest snapshot.
> {code:java}
> // if parent is a reference node, parent must be a renamed node. We can 
> // stop the check at the reference node.
> if (parent != null && parent.isReference()) {
>   // TODO: Is it a bug to return true?
>   //   Some ancestor nodes may not be in the latest snapshot.
>   return true;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15470) Added more unit tests to validate rename behaviour across snapshots

2020-07-14 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15470:
---
Status: Patch Available  (was: Open)

> Added more unit tests to validate rename behaviour across snapshots
> ---
>
> Key: HDFS-15470
> URL: https://issues.apache.org/jira/browse/HDFS-15470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.0.4
>
> Attachments: HDFS-15470.000.patch
>
>
> HDFS-15313 fixes a critical issue which will avoid deletion of data in active 
> fs with a sequence of snapshot deletes. The idea is to add more tests to 
> verify the behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15470) Added more unit tests to validate rename behaviour across snapshots

2020-07-14 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15470:
---
Attachment: HDFS-15470.000.patch

> Added more unit tests to validate rename behaviour across snapshots
> ---
>
> Key: HDFS-15470
> URL: https://issues.apache.org/jira/browse/HDFS-15470
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.0.4
>
> Attachments: HDFS-15470.000.patch
>
>
> HDFS-15313 fixes a critical issue which will avoid deletion of data in active 
> fs with a sequence of snapshot deletes. The idea is to add more tests to 
> verify the behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15470) Added more unit tests to validate rename behaviour across snapshots

2020-07-14 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDFS-15470:
--

 Summary: Added more unit tests to validate rename behaviour across 
snapshots
 Key: HDFS-15470
 URL: https://issues.apache.org/jira/browse/HDFS-15470
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 3.0.4


HDFS-15313 fixes a critical issue which will avoid deletion of data in active 
fs with a sequence of snapshot deletes. The idea is to add more tests to verify 
the behaviour.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API

2020-07-14 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15319:
---
Attachment: HDFS-15319.001.patch

> Fix INode#isInLatestSnapshot() API
> --
>
> Key: HDFS-15319
> URL: https://issues.apache.org/jira/browse/HDFS-15319
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-15319.000.patch, HDFS-15319.001.patch
>
>
> isInLatestSnapshot() may return true in cases where an inode's ancesstors 
> might not be in the latest snapshot.
> {code:java}
> // if parent is a reference node, parent must be a renamed node. We can 
> // stop the check at the reference node.
> if (parent != null && parent.isReference()) {
>   // TODO: Is it a bug to return true?
>   //   Some ancestor nodes may not be in the latest snapshot.
>   return true;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15319) Fix INode#isInLatestSnapshot() API

2020-07-14 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17157498#comment-17157498
 ] 

Shashikant Banerjee commented on HDFS-15319:


patch v1 removes the TODO added as a part of fix for HDFS-15313 as it does not 
seem to be a problem.

> Fix INode#isInLatestSnapshot() API
> --
>
> Key: HDFS-15319
> URL: https://issues.apache.org/jira/browse/HDFS-15319
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-15319.000.patch, HDFS-15319.001.patch
>
>
> isInLatestSnapshot() may return true in cases where an inode's ancesstors 
> might not be in the latest snapshot.
> {code:java}
> // if parent is a reference node, parent must be a renamed node. We can 
> // stop the check at the reference node.
> if (parent != null && parent.isReference()) {
>   // TODO: Is it a bug to return true?
>   //   Some ancestor nodes may not be in the latest snapshot.
>   return true;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API

2020-05-01 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15319:
---
Status: Patch Available  (was: Open)

> Fix INode#isInLatestSnapshot() API
> --
>
> Key: HDFS-15319
> URL: https://issues.apache.org/jira/browse/HDFS-15319
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-15319.000.patch
>
>
> isInLatestSnapshot() may return true in cases where an inode's ancesstors 
> might not be in the latest snapshot.
> {code:java}
> // if parent is a reference node, parent must be a renamed node. We can 
> // stop the check at the reference node.
> if (parent != null && parent.isReference()) {
>   // TODO: Is it a bug to return true?
>   //   Some ancestor nodes may not be in the latest snapshot.
>   return true;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API

2020-05-01 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15319:
---
Attachment: HDFS-15319.000.patch

> Fix INode#isInLatestSnapshot() API
> --
>
> Key: HDFS-15319
> URL: https://issues.apache.org/jira/browse/HDFS-15319
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-15319.000.patch
>
>
> isInLatestSnapshot() may return true in cases where an inode's ancesstors 
> might not be in the latest snapshot.
> {code:java}
> // if parent is a reference node, parent must be a renamed node. We can 
> // stop the check at the reference node.
> if (parent != null && parent.isReference()) {
>   // TODO: Is it a bug to return true?
>   //   Some ancestor nodes may not be in the latest snapshot.
>   return true;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15319) Fix INode#isInLatestSnapshot() API

2020-05-01 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097210#comment-17097210
 ] 

Shashikant Banerjee commented on HDFS-15319:


[~szetszwo], can you please have a look?

> Fix INode#isInLatestSnapshot() API
> --
>
> Key: HDFS-15319
> URL: https://issues.apache.org/jira/browse/HDFS-15319
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-15319.000.patch
>
>
> isInLatestSnapshot() may return true in cases where an inode's ancesstors 
> might not be in the latest snapshot.
> {code:java}
> // if parent is a reference node, parent must be a renamed node. We can 
> // stop the check at the reference node.
> if (parent != null && parent.isReference()) {
>   // TODO: Is it a bug to return true?
>   //   Some ancestor nodes may not be in the latest snapshot.
>   return true;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API

2020-05-01 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15319:
---
Component/s: snapshots

> Fix INode#isInLatestSnapshot() API
> --
>
> Key: HDFS-15319
> URL: https://issues.apache.org/jira/browse/HDFS-15319
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-15319.000.patch
>
>
> isInLatestSnapshot() may return true in cases where an inode's ancesstors 
> might not be in the latest snapshot.
> {code:java}
> // if parent is a reference node, parent must be a renamed node. We can 
> // stop the check at the reference node.
> if (parent != null && parent.isReference()) {
>   // TODO: Is it a bug to return true?
>   //   Some ancestor nodes may not be in the latest snapshot.
>   return true;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-05-01 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17097200#comment-17097200
 ] 

Shashikant Banerjee commented on HDFS-15313:


Filed HDFS-15319 to address the issues in isInLatestSnapshot(). 

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15313.000.patch, HDFS-15313.001.patch
>
>
> After HDFS-13101, it was observed in one of our customer deployments that 
> delete snapshot ends up cleaning up inodes from active fs which can be 
> referred from only one snapshot as the isLastReference() check for the parent 
> dir introduced in HDFS-13101 may return true in certain cases. The aim of 
> this Jira to add a check to ensure if the Inodes are being referred in the 
> active fs , should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-15319) Fix INode#isInLatestSnapshot() API

2020-05-01 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDFS-15319:
--

Assignee: Shashikant Banerjee

> Fix INode#isInLatestSnapshot() API
> --
>
> Key: HDFS-15319
> URL: https://issues.apache.org/jira/browse/HDFS-15319
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> isInLatestSnapshot() may return true in cases where an inode's ancesstors 
> might not be in the latest snapshot.
> {code:java}
> // if parent is a reference node, parent must be a renamed node. We can 
> // stop the check at the reference node.
> if (parent != null && parent.isReference()) {
>   // TODO: Is it a bug to return true?
>   //   Some ancestor nodes may not be in the latest snapshot.
>   return true;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API

2020-05-01 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15319:
---
Description: 
isInLatestSnapshot() may return true in cases where an inode's ancesstors might 
not be in the latest snapshot.
{code:java}
// if parent is a reference node, parent must be a renamed node. We can 
// stop the check at the reference node.
if (parent != null && parent.isReference()) {
  // TODO: Is it a bug to return true?
  //   Some ancestor nodes may not be in the latest snapshot.
  return true;
}
{code}

  was:
{code:java}
// if parent is a reference node, parent must be a renamed node. We can 
// stop the check at the reference node.
if (parent != null && parent.isReference()) {
  // TODO: Is it a bug to return true?
  //   Some ancestor nodes may not be in the latest snapshot.
  return true;
}
{code}


> Fix INode#isInLatestSnapshot() API
> --
>
> Key: HDFS-15319
> URL: https://issues.apache.org/jira/browse/HDFS-15319
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Priority: Major
>
> isInLatestSnapshot() may return true in cases where an inode's ancesstors 
> might not be in the latest snapshot.
> {code:java}
> // if parent is a reference node, parent must be a renamed node. We can 
> // stop the check at the reference node.
> if (parent != null && parent.isReference()) {
>   // TODO: Is it a bug to return true?
>   //   Some ancestor nodes may not be in the latest snapshot.
>   return true;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API

2020-05-01 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15319:
---
Description: 
{code:java}
// if parent is a reference node, parent must be a renamed node. We can 
// stop the check at the reference node.
if (parent != null && parent.isReference()) {
  // TODO: Is it a bug to return true?
  //   Some ancestor nodes may not be in the latest snapshot.
  return true;
}
{code}

  was:The 


> Fix INode#isInLatestSnapshot() API
> --
>
> Key: HDFS-15319
> URL: https://issues.apache.org/jira/browse/HDFS-15319
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Priority: Major
>
> {code:java}
> // if parent is a reference node, parent must be a renamed node. We can 
> // stop the check at the reference node.
> if (parent != null && parent.isReference()) {
>   // TODO: Is it a bug to return true?
>   //   Some ancestor nodes may not be in the latest snapshot.
>   return true;
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API

2020-05-01 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15319:
---
Description: The 

> Fix INode#isInLatestSnapshot() API
> --
>
> Key: HDFS-15319
> URL: https://issues.apache.org/jira/browse/HDFS-15319
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Priority: Major
>
> The 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15319) Fix INode#isInLatestSnapshot

2020-05-01 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDFS-15319:
--

 Summary: Fix INode#isInLatestSnapshot
 Key: HDFS-15319
 URL: https://issues.apache.org/jira/browse/HDFS-15319
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Shashikant Banerjee






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15319) Fix INode#isInLatestSnapshot() API

2020-05-01 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15319:
---
Summary: Fix INode#isInLatestSnapshot() API  (was: Fix 
INode#isInLatestSnapshot)

> Fix INode#isInLatestSnapshot() API
> --
>
> Key: HDFS-15319
> URL: https://issues.apache.org/jira/browse/HDFS-15319
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-05-01 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15313:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~szetszwo] for the review. I have committed this.

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15313.000.patch, HDFS-15313.001.patch
>
>
> After HDFS-13101, it was observed in one of our customer deployments that 
> delete snapshot ends up cleaning up inodes from active fs which can be 
> referred from only one snapshot as the isLastReference() check for the parent 
> dir introduced in HDFS-13101 may return true in certain cases. The aim of 
> this Jira to add a check to ensure if the Inodes are being referred in the 
> active fs , should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-04-30 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096576#comment-17096576
 ] 

Shashikant Banerjee commented on HDFS-15313:


Patch v1 addresses checkstyle issues.

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15313.000.patch, HDFS-15313.001.patch
>
>
> After HDFS-13101, it was observed in one of our customer deployments that 
> delete snapshot ends up cleaning up inodes from active fs which can be 
> referred from only one snapshot as the isLastReference() check for the parent 
> dir introduced in HDFS-13101 may return true in certain cases. The aim of 
> this Jira to add a check to ensure if the Inodes are being referred in the 
> active fs , should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-04-30 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15313:
---
Attachment: HDFS-15313.001.patch

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15313.000.patch, HDFS-15313.001.patch
>
>
> After HDFS-13101, it was observed in one of our customer deployments that 
> delete snapshot ends up cleaning up inodes from active fs which can be 
> referred from only one snapshot as the isLastReference() check for the parent 
> dir introduced in HDFS-13101 may return true in certain cases. The aim of 
> this Jira to add a check to ensure if the Inodes are being referred in the 
> active fs , should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-04-30 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15313:
---
Description: After HDFS-13101, it was observed in one of our customer 
deployments that delete snapshot ends up cleaning up inodes from active fs 
which can be referred from only one snapshot as the isLastReference() check for 
the parent dir introduced in HDFS-13101 may return true in certain cases. The 
aim of this Jira to add a check to ensure if the Inodes are being referred in 
the active fs , should not get deleted while deletion of snapshot happens.  
(was: After HDFS-13101, it was observed that delete snapshot end up cleaning up 
inodes from active fs which can be referred from only one snapshot as the 
isLastReference() check for the parent dir introduced in HDFS-13101 may return 
true in certain cases. The aim of this Jira to add a check to ensure if the 
Inodes are being referred in the active fs , should not get deleted while 
deletion of snapshot happens.)

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15313.000.patch
>
>
> After HDFS-13101, it was observed in one of our customer deployments that 
> delete snapshot ends up cleaning up inodes from active fs which can be 
> referred from only one snapshot as the isLastReference() check for the parent 
> dir introduced in HDFS-13101 may return true in certain cases. The aim of 
> this Jira to add a check to ensure if the Inodes are being referred in the 
> active fs , should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-04-30 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17096271#comment-17096271
 ] 

Shashikant Banerjee commented on HDFS-15313:


[~szetszwo], [~weichiu] , can you please have a look?

cc ~[~arp], [~msingh]

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15313.000.patch
>
>
> After HDFS-13101, it was observed that delete snapshot end up cleaning up 
> inodes from active fs which can be referred from only one snapshot as the 
> isLastReference() check for the parent dir introduced in HDFS-13101 may 
> return true in certain cases. The aim of this Jira to add a check to ensure 
> if the Inodes are being referred in the active fs , should not get deleted 
> while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-04-30 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15313:
---
Description: After HDFS-13101, it was observed that delete snapshot end up 
cleaning up inodes from active fs which can be referred from only one snapshot 
as the isLastReference() check for the parent dir introduced in HDFS-13101 may 
return true in certain cases. The aim of this Jira to add a check to ensure if 
the Inodes are being referred in the active fs , should not get deleted while 
deletion of snapshot happens.  (was: After HDFS-13101, it was observed that 
delete snapshot end up cleaning up inodes from active fs which can be referred 
from only one snapshot as the isLastReference() check for the parent dir 
introduced in HDFS-13101 returns true. The aim of this Jira to add a check to 
ensure if the Inodes are being referred in the active fs , should not get 
deleted while deletion of snapshot happens.)

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15313.000.patch
>
>
> After HDFS-13101, it was observed that delete snapshot end up cleaning up 
> inodes from active fs which can be referred from only one snapshot as the 
> isLastReference() check for the parent dir introduced in HDFS-13101 may 
> return true in certain cases. The aim of this Jira to add a check to ensure 
> if the Inodes are being referred in the active fs , should not get deleted 
> while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-04-30 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15313:
---
Status: Patch Available  (was: Open)

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15313.000.patch
>
>
> After HDFS-13101, it was observed that delete snapshot end up cleaning up 
> inodes from active fs which can be referred from only one snapshot as the 
> isLastReference() check for the parent dir introduced in HDFS-13101 may 
> return true in certain cases. The aim of this Jira to add a check to ensure 
> if the Inodes are being referred in the active fs , should not get deleted 
> while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-04-30 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15313:
---
Attachment: HDFS-15313.000.patch

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15313.000.patch
>
>
> After HDFS-13101, it was observed that delete snapshot end up cleaning up 
> inodes from active fs which can be referred from only one snapshot as the 
> isLastReference() check for the parent dir introduced in HDFS-13101 returns 
> true. The aim of this Jira to add a check to ensure if the Inodes are being 
> referred in the active fs , should not get deleted while deletion of snapshot 
> happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-04-30 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15313:
---
Description: After HDFS-13101, it was observed that delete snapshot end up 
cleaning up inodes from active fs which can be referred from only one snapshot 
as the isLastReference() check for the parent dir introduced in HDFS-13101 
returns true. The aim of this Jira to add a check to ensure if the Inodes are 
being referred in the active fs , should not get deleted while deletion of 
snapshot happens.  (was: After HDFS-13101, it was observed that delete snapshot 
end up cleaning up inodes from active fs which can be referred from only one 
snapshot as the isLastReference() introduced in HDFS-13101 returns true. The 
aim of this Jira to add a check to ensure if the Inodes are being referred in 
the active fs , should not get deleted while deletion of snapshot happens.)

> Ensure inodes in active filesytem are not deleted during snapshot delete
> 
>
> Key: HDFS-15313
> URL: https://issues.apache.org/jira/browse/HDFS-15313
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
>
> After HDFS-13101, it was observed that delete snapshot end up cleaning up 
> inodes from active fs which can be referred from only one snapshot as the 
> isLastReference() check for the parent dir introduced in HDFS-13101 returns 
> true. The aim of this Jira to add a check to ensure if the Inodes are being 
> referred in the active fs , should not get deleted while deletion of snapshot 
> happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-15313) Ensure inodes in active filesytem are not deleted during snapshot delete

2020-04-30 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDFS-15313:
--

 Summary: Ensure inodes in active filesytem are not deleted during 
snapshot delete
 Key: HDFS-15313
 URL: https://issues.apache.org/jira/browse/HDFS-15313
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: snapshots
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 3.4.0


After HDFS-13101, it was observed that delete snapshot end up cleaning up 
inodes from active fs which can be referred from only one snapshot as the 
isLastReference() introduced in HDFS-13101 returns true. The aim of this Jira 
to add a check to ensure if the Inodes are being referred in the active fs , 
should not get deleted while deletion of snapshot happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14504) Rename with Snapshots does not honor quota limit

2020-04-24 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17092046#comment-17092046
 ] 

Shashikant Banerjee commented on HDFS-14504:


Thanks [~hemanthboyina] for the explanation. [~szetszwo], can you please have a 
look ?

> Rename with Snapshots does not honor quota limit
> 
>
> Key: HDFS-14504
> URL: https://issues.apache.org/jira/browse/HDFS-14504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14504.001.patch, HDFS-14504.002.patch
>
>
> Steps to Reproduce:
> 
> {code:java}
> HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2
> 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2
> 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2
> 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Allowing snapshot on /dir2 succeeded
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1
> 2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1
> 2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Created snapshot /dir2/.snapshot/snap1
> HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2
> 2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2
> 2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Found 1 items
> -rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex
> 2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> touchz: The NameSpace quota (directories and files) of directory /dir2 is 
> exceeded: quota=3 file count=4
> HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2
> 2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Created snapshot /dir2/.snapshot/snap2
> HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2
> 2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Found 1 items
> -rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2
> HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file2 /dir2/file3
> 2019-05-21 15:12:35,908 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filey
> 2019-05-21 15:12:49,998 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> touchz: The NameSpace quota (directories and files) of directory /dir2 is 
> exceeded: quota=3 file count=5
> {code}
> // create operation fails here as it has already exceeded the quota limit
> {code}
> HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap3
> 2019-05-21 15:13:07,656 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Created snapshot /dir2/.snapshot/snap3
> HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file3 /dir2/file4
> 2019-05-21 15:13:20,715 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> {code}
> // Rename operation succeeds here adding on to the namespace quota
> {code}
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filez
> 2019-05-21 15:13:30,486 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> touchz: The NameSpace quota (directories and files) of directory /dir2 is 
> exceeded: quota=3 

[jira] [Comment Edited] (HDFS-14504) Rename with Snapshots does not honor quota limit

2020-04-20 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087899#comment-17087899
 ] 

Shashikant Banerjee edited comment on HDFS-14504 at 4/20/20, 4:25 PM:
--

Thanks [~hemanthboyina] for updating the patch. 
{code:java}
@Test
public void testRenameAcrossDirWithinSnapshot() throws Exception {
  // snapshottable directory
  String dirr = "/dir";
  Path rootDir = new Path(dirr);
  hdfs.mkdirs(rootDir);
  hdfs.allowSnapshot(rootDir);

  // set quota for source directory under snapshottable root directory
  Path dir2 = new Path(rootDir, "dir2");
  Path fil1 = new Path(dir2, "file1");
  hdfs.mkdirs(dir2);
  hdfs.setQuota(dir2, 3, 0);
  hdfs.create(fil1);
  Path file2 = new Path(dir2, "file2");
  hdfs.rename(fil1, file2);
  Path fil3 = new Path(dir2, "file3");
  hdfs.create(fil3);

  // destination directory under snapshottable root directory
  Path dir1 = new Path(rootDir, "dir1");
  Path dir1fil1 = new Path(dir1, "file1");
  hdfs.mkdirs(dir1);
  hdfs.create(dir1fil1);
  Path dir1fil2 = new Path(dir1, "file2");
  hdfs.rename(dir1fil1, dir1fil2);

  hdfs.createSnapshot(rootDir, "snap1");
  Path filex = new Path(dir2, "filex");
  // create a file after exceeding namespace quota
  LambdaTestUtils.intercept(NSQuotaExceededException.class,
  "The NameSpace quota (directories and files) of "
  + "directory /dir/dir2 is exceeded",
  () -> hdfs.create(filex));

  // Rename across directories within snapshot with quota set on source
  // directory
  assertTrue(hdfs.rename(fil3, dir1));
}
{code}
In the test above, if "filex" cound not created in "dir2" because of exceeding 
quota limit but rename of "fil3" existing under the same directory "dir2" seems 
successful which ideally should fail as it will create  InodeReferene in dir2 
diff list for the snapshot snap1 hence will exceed the quota limit.

Can you plz check?


was (Author: shashikant):
Thanks [~hemanthboyina] for updating the patch. 
{code:java}
@Test
public void testRenameAcrossDirWithinSnapshot() throws Exception {
  // snapshottable directory
  String dirr = "/dir";
  Path rootDir = new Path(dirr);
  hdfs.mkdirs(rootDir);
  hdfs.allowSnapshot(rootDir);

  // set quota for source directory under snapshottable root directory
  Path dir2 = new Path(rootDir, "dir2");
  Path fil1 = new Path(dir2, "file1");
  hdfs.mkdirs(dir2);
  hdfs.setQuota(dir2, 3, 0);
  hdfs.create(fil1);
  Path file2 = new Path(dir2, "file2");
  hdfs.rename(fil1, file2);
  Path fil3 = new Path(dir2, "file3");
  hdfs.create(fil3);

  // destination directory under snapshottable root directory
  Path dir1 = new Path(rootDir, "dir1");
  Path dir1fil1 = new Path(dir1, "file1");
  hdfs.mkdirs(dir1);
  hdfs.create(dir1fil1);
  Path dir1fil2 = new Path(dir1, "file2");
  hdfs.rename(dir1fil1, dir1fil2);

  hdfs.createSnapshot(rootDir, "snap1");
  Path filex = new Path(dir2, "filex");
  // create a file after exceeding namespace quota
  LambdaTestUtils.intercept(NSQuotaExceededException.class,
  "The NameSpace quota (directories and files) of "
  + "directory /dir/dir2 is exceeded",
  () -> hdfs.create(filex));

  // Rename across directories within snapshot with quota set on source
  // directory
  assertTrue(hdfs.rename(fil3, dir1));
}
{code}
In the test above, if "filex" cound not created in "dir2" because of exceeding 
quota limit but rename of "fil3" existing under the same directory "dir2" seems 
successful which ideally should fail as it will create  InodeReferene in dir2 
diff list for the snapshot snap1 hence exceeding the quota limit.

Can you plz check?

> Rename with Snapshots does not honor quota limit
> 
>
> Key: HDFS-14504
> URL: https://issues.apache.org/jira/browse/HDFS-14504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14504.001.patch, HDFS-14504.002.patch
>
>
> Steps to Reproduce:
> 
> {code:java}
> HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2
> 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2
> 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2
> 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Allowing snapshot on /dir2 succeeded
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1
> 2019-05-21 

[jira] [Comment Edited] (HDFS-14504) Rename with Snapshots does not honor quota limit

2020-04-20 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087899#comment-17087899
 ] 

Shashikant Banerjee edited comment on HDFS-14504 at 4/20/20, 4:17 PM:
--

Thanks [~hemanthboyina] for updating the patch. 
{code:java}
@Test
public void testRenameAcrossDirWithinSnapshot() throws Exception {
  // snapshottable directory
  String dirr = "/dir";
  Path rootDir = new Path(dirr);
  hdfs.mkdirs(rootDir);
  hdfs.allowSnapshot(rootDir);

  // set quota for source directory under snapshottable root directory
  Path dir2 = new Path(rootDir, "dir2");
  Path fil1 = new Path(dir2, "file1");
  hdfs.mkdirs(dir2);
  hdfs.setQuota(dir2, 3, 0);
  hdfs.create(fil1);
  Path file2 = new Path(dir2, "file2");
  hdfs.rename(fil1, file2);
  Path fil3 = new Path(dir2, "file3");
  hdfs.create(fil3);

  // destination directory under snapshottable root directory
  Path dir1 = new Path(rootDir, "dir1");
  Path dir1fil1 = new Path(dir1, "file1");
  hdfs.mkdirs(dir1);
  hdfs.create(dir1fil1);
  Path dir1fil2 = new Path(dir1, "file2");
  hdfs.rename(dir1fil1, dir1fil2);

  hdfs.createSnapshot(rootDir, "snap1");
  Path filex = new Path(dir2, "filex");
  // create a file after exceeding namespace quota
  LambdaTestUtils.intercept(NSQuotaExceededException.class,
  "The NameSpace quota (directories and files) of "
  + "directory /dir/dir2 is exceeded",
  () -> hdfs.create(filex));

  // Rename across directories within snapshot with quota set on source
  // directory
  assertTrue(hdfs.rename(fil3, dir1));
}
{code}
In the test above, if "filex" cound not created in "dir2" because of exceeding 
quota limit but rename of "fil3" existing under the same directory "dir2" seems 
successful which ideally should fail as it will create  InodeReferene in dir2 
diff list for the snapshot snap1 hence exceeding the quota limit.

Can you plz check?


was (Author: shashikant):
Thanks [~hemanthboyina] for updating the patch. 
{code:java}
@Test
public void testRenameAcrossDirWithinSnapshot() throws Exception {
  // snapshottable directory
  String dirr = "/dir";
  Path rootDir = new Path(dirr);
  hdfs.mkdirs(rootDir);
  hdfs.allowSnapshot(rootDir);

  // set quota for source directory under snapshottable root directory
  Path dir2 = new Path(rootDir, "dir2");
  Path fil1 = new Path(dir2, "file1");
  hdfs.mkdirs(dir2);
  hdfs.setQuota(dir2, 3, 0);
  hdfs.create(fil1);
  Path file2 = new Path(dir2, "file2");
  hdfs.rename(fil1, file2);
  Path fil3 = new Path(dir2, "file3");
  hdfs.create(fil3);

  // destination directory under snapshottable root directory
  Path dir1 = new Path(rootDir, "dir1");
  Path dir1fil1 = new Path(dir1, "file1");
  hdfs.mkdirs(dir1);
  hdfs.create(dir1fil1);
  Path dir1fil2 = new Path(dir1, "file2");
  hdfs.rename(dir1fil1, dir1fil2);

  hdfs.createSnapshot(rootDir, "snap1");
  Path filex = new Path(dir2, "filex");
  // create a file after exceeding namespace quota
  LambdaTestUtils.intercept(NSQuotaExceededException.class,
  "The NameSpace quota (directories and files) of "
  + "directory /dir/dir2 is exceeded",
  () -> hdfs.create(filex));

  // Rename across directories within snapshot with quota set on source
  // directory
  assertTrue(hdfs.rename(fil3, dir1));
}
{code}
In the test above, if "filex" cound not created in "dir2" because of exceeding 
quota limit but rename of "fil3" existing under the same directory "dir2" seems 
successful which ideally should fail as it will crete  InodeReferene in dir2 
diff list for the snapshot snap1 hence exceeding the quota limit.

Can you plz check?

> Rename with Snapshots does not honor quota limit
> 
>
> Key: HDFS-14504
> URL: https://issues.apache.org/jira/browse/HDFS-14504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14504.001.patch, HDFS-14504.002.patch
>
>
> Steps to Reproduce:
> 
> {code:java}
> HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2
> 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2
> 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2
> 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Allowing snapshot on /dir2 succeeded
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1
> 2019-05-21 

[jira] [Commented] (HDFS-14504) Rename with Snapshots does not honor quota limit

2020-04-20 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17087899#comment-17087899
 ] 

Shashikant Banerjee commented on HDFS-14504:


Thanks [~hemanthboyina] for updating the patch. 
{code:java}
@Test
public void testRenameAcrossDirWithinSnapshot() throws Exception {
  // snapshottable directory
  String dirr = "/dir";
  Path rootDir = new Path(dirr);
  hdfs.mkdirs(rootDir);
  hdfs.allowSnapshot(rootDir);

  // set quota for source directory under snapshottable root directory
  Path dir2 = new Path(rootDir, "dir2");
  Path fil1 = new Path(dir2, "file1");
  hdfs.mkdirs(dir2);
  hdfs.setQuota(dir2, 3, 0);
  hdfs.create(fil1);
  Path file2 = new Path(dir2, "file2");
  hdfs.rename(fil1, file2);
  Path fil3 = new Path(dir2, "file3");
  hdfs.create(fil3);

  // destination directory under snapshottable root directory
  Path dir1 = new Path(rootDir, "dir1");
  Path dir1fil1 = new Path(dir1, "file1");
  hdfs.mkdirs(dir1);
  hdfs.create(dir1fil1);
  Path dir1fil2 = new Path(dir1, "file2");
  hdfs.rename(dir1fil1, dir1fil2);

  hdfs.createSnapshot(rootDir, "snap1");
  Path filex = new Path(dir2, "filex");
  // create a file after exceeding namespace quota
  LambdaTestUtils.intercept(NSQuotaExceededException.class,
  "The NameSpace quota (directories and files) of "
  + "directory /dir/dir2 is exceeded",
  () -> hdfs.create(filex));

  // Rename across directories within snapshot with quota set on source
  // directory
  assertTrue(hdfs.rename(fil3, dir1));
}
{code}
In the test above, if "filex" cound not created in "dir2" because of exceeding 
quota limit but rename of "fil3" existing under the same directory "dir2" seems 
successful which ideally should fail as it will crete  InodeReferene in dir2 
diff list for the snapshot snap1 hence exceeding the quota limit.

Can you plz check?

> Rename with Snapshots does not honor quota limit
> 
>
> Key: HDFS-14504
> URL: https://issues.apache.org/jira/browse/HDFS-14504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14504.001.patch, HDFS-14504.002.patch
>
>
> Steps to Reproduce:
> 
> {code:java}
> HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2
> 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2
> 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2
> 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Allowing snapshot on /dir2 succeeded
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1
> 2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1
> 2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Created snapshot /dir2/.snapshot/snap1
> HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2
> 2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2
> 2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Found 1 items
> -rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex
> 2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> touchz: The NameSpace quota (directories and files) of directory /dir2 is 
> exceeded: quota=3 file count=4
> HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2
> 2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Created snapshot /dir2/.snapshot/snap2
> HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2
> 2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Found 1 items
> -rw-r--r--   1 

[jira] [Comment Edited] (HDFS-14504) Rename with Snapshots does not honor quota limit

2020-04-08 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077306#comment-17077306
 ] 

Shashikant Banerjee edited comment on HDFS-14504 at 4/8/20, 5:52 PM:
-

Thanks [~hemanthboyina] for working on this. There are multiple cases to be 
considered while doing the rename across directories in a snapshot like:
1) Rename across directories within snapshot with quota set on source directory
2) Rename within the same directory within a snapshottable root with quota set 
3) Rename from a directory under snapshottable root to a directory with quota 
set to a directory not under under any snapshottable root.

The fix might not address all the cases here.

Also, disallowing the rename in case of a rename happening under a snapshotted 
directory with quota set might become incompatible change as well.


was (Author: shashikant):
Thanks [~hemanthboyina] for working on this. There are multiple cases to be 
considered while doing the rename across directories in a snapshot like:
1) Rename across directories within snapshot with quota set on source directory
2) Rename within the same directory within a snapshottable root with quota set 
3) Rename from a directory under snapshottable root to a directory with quota 
set to a directory not under under any snapshottable root.

The fix might not address all the cases here.

Also, disallowing the rename in case of a rename happening under a snapshotted 
directory with quota set will become incompatible change as well.

> Rename with Snapshots does not honor quota limit
> 
>
> Key: HDFS-14504
> URL: https://issues.apache.org/jira/browse/HDFS-14504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14504.001.patch
>
>
> Steps to Reproduce:
> 
> {code:java}
> HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2
> 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2
> 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2
> 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Allowing snapshot on /dir2 succeeded
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1
> 2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1
> 2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Created snapshot /dir2/.snapshot/snap1
> HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2
> 2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2
> 2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Found 1 items
> -rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex
> 2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> touchz: The NameSpace quota (directories and files) of directory /dir2 is 
> exceeded: quota=3 file count=4
> HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2
> 2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Created snapshot /dir2/.snapshot/snap2
> HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2
> 2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Found 1 items
> -rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2
> HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file2 /dir2/file3
> 2019-05-21 15:12:35,908 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filey
> 

[jira] [Commented] (HDFS-14504) Rename with Snapshots does not honor quota limit

2020-04-07 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17077306#comment-17077306
 ] 

Shashikant Banerjee commented on HDFS-14504:


Thanks [~hemanthboyina] for working on this. There are multiple cases to be 
considered while doing the rename across directories in a snapshot like:
1) Rename across directories within snapshot with quota set on source directory
2) Rename within the same directory within a snapshottable root with quota set 
3) Rename from a directory under snapshottable root to a directory with quota 
set to a directory not under under any snapshottable root.

The fix might not address all the cases here.

Also, disallowing the rename in case of a rename happening under a snapshotted 
directory with quota set will become incompatible change as well.

> Rename with Snapshots does not honor quota limit
> 
>
> Key: HDFS-14504
> URL: https://issues.apache.org/jira/browse/HDFS-14504
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-14504.001.patch
>
>
> Steps to Reproduce:
> 
> {code:java}
> HW15685:bin sbanerjee$ ./hdfs dfs -mkdir /dir2
> 2019-05-21 15:08:41,615 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfsadmin -setQuota 3 /dir2
> 2019-05-21 15:08:57,326 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfsadmin -allowSnapshot /dir2
> 2019-05-21 15:09:47,239 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Allowing snapshot on /dir2 succeeded
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/file1
> 2019-05-21 15:10:01,573 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap1
> 2019-05-21 15:10:16,332 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Created snapshot /dir2/.snapshot/snap1
> HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file1 /dir2/file2
> 2019-05-21 15:10:49,292 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2
> 2019-05-21 15:11:05,207 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Found 1 items
> -rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filex
> 2019-05-21 15:11:43,765 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> touchz: The NameSpace quota (directories and files) of directory /dir2 is 
> exceeded: quota=3 file count=4
> HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap2
> 2019-05-21 15:12:05,464 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Created snapshot /dir2/.snapshot/snap2
> HW15685:bin sbanerjee$ ./hdfs dfs -ls /dir2
> 2019-05-21 15:12:25,072 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Found 1 items
> -rw-r--r--   1 sbanerjee hadoop          0 2019-05-21 15:10 /dir2/file2
> HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file2 /dir2/file3
> 2019-05-21 15:12:35,908 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> HW15685:bin sbanerjee$ ./hdfs dfs -touchz /dir2/filey
> 2019-05-21 15:12:49,998 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> touchz: The NameSpace quota (directories and files) of directory /dir2 is 
> exceeded: quota=3 file count=5
> {code}
> // create operation fails here as it has already exceeded the quota limit
> {code}
> HW15685:bin sbanerjee$ ./hdfs dfs -createSnapshot /dir2 snap3
> 2019-05-21 15:13:07,656 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> Created snapshot /dir2/.snapshot/snap3
> HW15685:bin sbanerjee$ ./hdfs dfs -mv /dir2/file3 /dir2/file4
> 2019-05-21 15:13:20,715 WARN 

[jira] [Comment Edited] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101

2019-12-18 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16999407#comment-16999407
 ] 

Shashikant Banerjee edited comment on HDFS-15012 at 12/18/19 6:12 PM:
--

Thanks [~ericlin] for helping discovering the issue. Thanks [~arp], 
[~szetszwo], [~weichiu], [~ayushtkn] [~surendrasingh] for the review and 
feedback. I have committed this. The findbug issue reported is not related.


was (Author: shashikant):
Thanks [~ericlin] for helping discovering the issue. Thanks [~arp], 
[~szetszwo], [~weichiu], [~ayushtkn] [~surendrasingh] for the review and 
feedback. I have committed this.

> NN fails to parse Edit logs after applying HDFS-13101
> -
>
> Key: HDFS-15012
> URL: https://issues.apache.org/jira/browse/HDFS-15012
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nn
>Reporter: Eric Lin
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: release-blocker
> Fix For: 2.8.0, 2.9.0, 3.1.0, 2.10.0, 3.2.0, 3.3.0
>
> Attachments: HDFS-15012.000.patch, HDFS-15012.001.patch
>
>
> After applying HDFS-13101, and deleting and creating large number of 
> snapshots, SNN exited with below error:
>   
> {code:sh}
> 2019-11-18 08:28:06,528 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, 
> snapshotName=distcp-3479-31-old, 
> RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc
> CallId=1]
> java.lang.AssertionError: Element already exists: 
> element=partition_isactive=true, DELETED=[partition_isactive=true]
> at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
> at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
> at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
> at 
> 

[jira] [Updated] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101

2019-12-18 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15012:
---
Fix Version/s: 2.8.0
   2.9.0
   3.1.0
   2.10.0
   3.2.0
   3.3.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~ericlin] for helping discovering the issue. Thanks [~arp], 
[~szetszwo], [~weichiu], [~ayushtkn] [~surendrasingh] for the review and 
feedback. I have committed this.

> NN fails to parse Edit logs after applying HDFS-13101
> -
>
> Key: HDFS-15012
> URL: https://issues.apache.org/jira/browse/HDFS-15012
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nn
>Reporter: Eric Lin
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: release-blocker
> Fix For: 3.3.0, 3.2.0, 2.10.0, 3.1.0, 2.9.0, 2.8.0
>
> Attachments: HDFS-15012.000.patch, HDFS-15012.001.patch
>
>
> After applying HDFS-13101, and deleting and creating large number of 
> snapshots, SNN exited with below error:
>   
> {code:sh}
> 2019-11-18 08:28:06,528 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, 
> snapshotName=distcp-3479-31-old, 
> RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc
> CallId=1]
> java.lang.AssertionError: Element already exists: 
> element=partition_isactive=true, DELETED=[partition_isactive=true]
> at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
> at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
> at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
> at 
> 

[jira] [Commented] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101

2019-12-11 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16994244#comment-16994244
 ] 

Shashikant Banerjee commented on HDFS-15012:


Thanks [~szetszwo] . Patch v1 addresses the checkstyle issues.

> NN fails to parse Edit logs after applying HDFS-13101
> -
>
> Key: HDFS-15012
> URL: https://issues.apache.org/jira/browse/HDFS-15012
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nn
>Reporter: Eric Lin
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15012.000.patch, HDFS-15012.001.patch
>
>
> After applying HDFS-13101, and deleting and creating large number of 
> snapshots, SNN exited with below error:
>   
> {code:sh}
> 2019-11-18 08:28:06,528 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, 
> snapshotName=distcp-3479-31-old, 
> RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc
> CallId=1]
> java.lang.AssertionError: Element already exists: 
> element=partition_isactive=true, DELETED=[partition_isactive=true]
> at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
> at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
> at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {code}
> We confirmed that fsimage and edit files were NOT corrupted, as reverting 
> HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken 
> and failed to parse edit log files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Updated] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101

2019-12-11 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15012:
---
Attachment: HDFS-15012.001.patch

> NN fails to parse Edit logs after applying HDFS-13101
> -
>
> Key: HDFS-15012
> URL: https://issues.apache.org/jira/browse/HDFS-15012
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nn
>Reporter: Eric Lin
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15012.000.patch, HDFS-15012.001.patch
>
>
> After applying HDFS-13101, and deleting and creating large number of 
> snapshots, SNN exited with below error:
>   
> {code:sh}
> 2019-11-18 08:28:06,528 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, 
> snapshotName=distcp-3479-31-old, 
> RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc
> CallId=1]
> java.lang.AssertionError: Element already exists: 
> element=partition_isactive=true, DELETED=[partition_isactive=true]
> at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
> at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
> at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {code}
> We confirmed that fsimage and edit files were NOT corrupted, as reverting 
> HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken 
> and failed to parse edit log files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, 

[jira] [Updated] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101

2019-12-06 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15012:
---
Status: Patch Available  (was: Open)

Patch v0 adds a unit test and fix to address the issue.

> NN fails to parse Edit logs after applying HDFS-13101
> -
>
> Key: HDFS-15012
> URL: https://issues.apache.org/jira/browse/HDFS-15012
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nn
>Reporter: Eric Lin
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15012.000.patch
>
>
> After applying HDFS-13101, and deleting and creating large number of 
> snapshots, SNN exited with below error:
>   
> {code:sh}
> 2019-11-18 08:28:06,528 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, 
> snapshotName=distcp-3479-31-old, 
> RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc
> CallId=1]
> java.lang.AssertionError: Element already exists: 
> element=partition_isactive=true, DELETED=[partition_isactive=true]
> at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
> at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
> at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {code}
> We confirmed that fsimage and edit files were NOT corrupted, as reverting 
> HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken 
> and failed to parse edit log files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Updated] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101

2019-12-06 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15012:
---
Attachment: HDFS-15012.000.patch

> NN fails to parse Edit logs after applying HDFS-13101
> -
>
> Key: HDFS-15012
> URL: https://issues.apache.org/jira/browse/HDFS-15012
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nn
>Reporter: Eric Lin
>Assignee: Shashikant Banerjee
>Priority: Blocker
>  Labels: release-blocker
> Attachments: HDFS-15012.000.patch
>
>
> After applying HDFS-13101, and deleting and creating large number of 
> snapshots, SNN exited with below error:
>   
> {code:sh}
> 2019-11-18 08:28:06,528 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, 
> snapshotName=distcp-3479-31-old, 
> RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc
> CallId=1]
> java.lang.AssertionError: Element already exists: 
> element=partition_isactive=true, DELETED=[partition_isactive=true]
> at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
> at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
> at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {code}
> We confirmed that fsimage and edit files were NOT corrupted, as reverting 
> HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken 
> and failed to parse edit log files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Resolved] (HDFS-14869) Data loss in case of distcp using snapshot diff. Replication should include rename records if file was skipped in the previous iteration

2019-12-06 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDFS-14869.

Fix Version/s: 3.1.4
   Resolution: Fixed

Thanks [~aasha] for the contribution and [~ste...@apache.org] for the review. I 
have committed this.

> Data loss in case of distcp using snapshot diff. Replication should include 
> rename records if file was skipped in the previous iteration
> 
>
> Key: HDFS-14869
> URL: https://issues.apache.org/jira/browse/HDFS-14869
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
> Fix For: 3.1.4
>
>
> This issue arises when a directory or file is excluded by exclusion filter 
> during distcp replication. Later on if the directory is renamed later to a 
> name which is not excluded by the filter, the snapshot diff reports only a 
> rename operation.  The directory is never copied to target even though its 
> not excluded now. This also doesn't throw any error so there is no way to 
> find the issue. 
> Steps to reproduce
>  * Create a directory in hdfs to copy using distcp.
>  * Include a staging folder in the directory.
> {code:java}
> [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop fs -ls 
> /tmp/tocopy
> Found 4 items
> -rw-r--r--   3 hdfs hdfs 16 2019-09-12 10:32 /tmp/tocopy/.b.txt
> drwxr-xr-x   - hdfs hdfs  0 2019-09-23 09:18 /tmp/tocopy/.staging
> -rw-r--r--   3 hdfs hdfs 12 2019-09-12 10:32 /tmp/tocopy/a.txt
> -rw-r--r--   3 hdfs hdfs  4 2019-09-20 08:23 /tmp/tocopy/foo.txt{code}
>  * The exclusion filter is set to exclude any staging directory
> {code:java}
> [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ cat 
> /tmp/filter
> .*\.Trash.*
> .*\.staging.*{code}
>  * Do a copy using distcp snapshots, the staging directory is not replicated.
> {code:java}
> hadoop jar hadoop-distcp-3.3.0-SNAPSHOT.jar 
> -Dmapreduce.job.user.classpath.first=true -filters /tmp/filter 
> /tmp/tocopy/.snapshot/s1 /tmp/target
> [hdfs@ctr-e141-1563959304486-33995-01-03 root]$ hadoop fs -ls /tmp/target
> Found 3 items
> -rw-r--r--   3 hdfs hdfs 16 2019-09-24 06:56 /tmp/target/.b.txt
> -rw-r--r--   3 hdfs hdfs 12 2019-09-24 06:56 /tmp/target/a.txt
> -rw-r--r--   3 hdfs hdfs  4 2019-09-24 06:56 /tmp/target/foo.txt{code}
>  * Rename the staging directory to final
> {code:java}
> [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop fs -mv 
> /tmp/tocopy/.staging /tmp/tocopy/final{code}
>  * Do a copy using snapshot diff.
> {code:java}
> [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hdfs 
> snapshotDiff /tmp/tocopy s1 s2[hdfs@ctr-e141-1563959304486-33995-01-03 
> hadoop-mapreduce]$ hdfs snapshotDiff /tmp/tocopy s1 s2Difference between 
> snapshot s1 and snapshot s2 under directory /tmp/tocopy:M .R ./.staging -> 
> ./final
> {code}
>  * The diff report just has a rename record and the new final directory is 
> never copied.
> {code:java}
> [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop jar 
> hadoop-distcp-3.3.0-SNAPSHOT.jar -Dmapreduce.job.user.classpath.first=true 
> -filters /tmp/filter -diff s1 s2 -update /tmp/tocopy /tmp/target
> 19/09/24 07:05:32 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, 
> ignoreFailures=false, overwrite=false, append=false, useDiff=true, 
> useRdiff=false, fromSnapshot=s1, toSnapshot=s2, skipCRC=false, blocking=true, 
> numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, 
> copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, 
> logPath=null, sourceFileListing=null, sourcePaths=[/tmp/tocopy], 
> targetPath=/tmp/target, filtersFile='/tmp/filter', blocksPerChunk=0, 
> copyBufferSize=8192, verboseLog=false, directWrite=false}, 
> sourcePaths=[/tmp/tocopy], targetPathExists=true, preserveRawXattrsfalse
> 19/09/24 07:05:32 INFO client.RMProxy: Connecting to ResourceManager at 
> ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:8050
> 19/09/24 07:05:33 INFO client.AHSProxy: Connecting to Application History 
> server at ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:10200
> 19/09/24 07:05:33 INFO tools.DistCp: Number of paths in the copy list: 0
> 19/09/24 07:05:33 INFO client.RMProxy: Connecting to ResourceManager at 
> ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:8050
> 19/09/24 07:05:33 INFO client.AHSProxy: Connecting to Application History 
> server at ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:10200
> 19/09/24 07:05:33 INFO 

[jira] [Assigned] (HDFS-15012) NN fails to parse Edit logs after applying HDFS-13101

2019-11-27 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDFS-15012:
--

Assignee: Shashikant Banerjee

> NN fails to parse Edit logs after applying HDFS-13101
> -
>
> Key: HDFS-15012
> URL: https://issues.apache.org/jira/browse/HDFS-15012
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: nn
>Reporter: Eric Lin
>Assignee: Shashikant Banerjee
>Priority: Critical
>
> After applying HDFS-13101, and deleting and creating large number of 
> snapshots, SNN exited with below error:
>   
> {code:sh}
> 2019-11-18 08:28:06,528 ERROR 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception 
> on operation DeleteSnapshotOp [snapshotRoot=/path/to/hdfs/file, 
> snapshotName=distcp-3479-31-old, 
> RpcClientId=b16a6cb5-bdbb-45ae-9f9a-f7dc57931f37, Rpc
> CallId=1]
> java.lang.AssertionError: Element already exists: 
> element=partition_isactive=true, DELETED=[partition_isactive=true]
> at org.apache.hadoop.hdfs.util.Diff.insert(Diff.java:193)
> at org.apache.hadoop.hdfs.util.Diff.delete(Diff.java:239)
> at org.apache.hadoop.hdfs.util.Diff.combinePosterior(Diff.java:462)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.initChildren(DirectoryWithSnapshotFeature.java:240)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature$DirectoryDiff$2.iterator(DirectoryWithSnapshotFeature.java:250)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:755)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference.cleanSubtree(INodeReference.java:332)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeReference$WithName.cleanSubtree(INodeReference.java:583)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtreeRecursively(INodeDirectory.java:760)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectoryWithSnapshotFeature.cleanDirectory(DirectoryWithSnapshotFeature.java:753)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.cleanSubtree(INodeDirectory.java:790)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.DirectorySnapshottableFeature.removeSnapshot(DirectorySnapshottableFeature.java:235)
> at 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory.removeSnapshot(INodeDirectory.java:259)
> at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.SnapshotManager.deleteSnapshot(SnapshotManager.java:301)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:688)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:232)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:141)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:903)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:756)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:324)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1144)
> at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:796)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:614)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:676)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:844)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:823)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1547)
> at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1615)
> {code}
> We confirmed that fsimage and edit files were NOT corrupted, as reverting 
> HDFS-13101 fixed the issue. So the logic introduced in HDFS-13101 is broken 
> and failed to parse edit log files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2542) Race condition between read and write stateMachineData

2019-11-19 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-2542:
-

Assignee: Shashikant Banerjee

> Race condition between read and write stateMachineData
> --
>
> Key: HDDS-2542
> URL: https://issues.apache.org/jira/browse/HDDS-2542
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Marton Elek
>Assignee: Shashikant Banerjee
>Priority: Critical
>
> The write payload (the chunk itself) is sent to the Ratis as an external, 
> binary byte array. It's not part of the LogEntry and saved from an async 
> thread with calling ContainerStateMachine.writeStateMachineData
>  
> As it's an async thread it's possible that the stateMachineData is not yet 
> written when the data should be sent to the followers in the next heartbeat.
> By design a cache is used to avoid this issue but there are multiple problems 
> with the cache.
> First, the current cache size is chunkExecutor.getCorePoolSize() which is not 
> enough. By default it means 60 executor threads and a cache with size 60. But 
> in case of one very slow and 59 very fast writer the cache entries can be 
> invalidated before the write.
> In my tests (freon datanode-chunk-writer-generator) I have seen missed cache 
> hits even with cache size 5000.
> Second: as the readStateMachineData and writeStateMachien data are called 
> from two different thread there is a race condition independent from the the 
> cache size. It's possible that the write thread has not yet added the data to 
> the cache but the read thread needs it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2509) Code cleanup in replication package

2019-11-19 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2509:
--
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~adoroszlai] for working on this. I have committed this.

> Code cleanup in replication package
> ---
>
> Key: HDDS-2509
> URL: https://issues.apache.org/jira/browse/HDDS-2509
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available, sonar
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Fix couple of [issues 
> reported|https://sonarcloud.io/project/issues?directories=hadoop-hdds%2Fcontainer-service%2Fsrc%2Fmain%2Fjava%2Forg%2Fapache%2Fhadoop%2Fozone%2Fcontainer%2Freplication%2Chadoop-hdds%2Fcontainer-service%2Fsrc%2Ftest%2Fjava%2Forg%2Fapache%2Fhadoop%2Fozone%2Fcontainer%2Freplication=hadoop-ozone=false]
>  in {{org.apache.hadoop.ozone.container.replication}} package.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException

2019-11-14 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-2372:
-

Assignee: Marton Elek  (was: Shashikant Banerjee)

> Datanode pipeline is failing with NoSuchFileException
> -
>
> Key: HDDS-2372
> URL: https://issues.apache.org/jira/browse/HDDS-2372
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Marton Elek
>Assignee: Marton Elek
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Found it on a k8s based test cluster using a simple 3 node cluster and 
> HDDS-2327 freon test. After a while the StateMachine become unhealthy after 
> this error:
> {code:java}
> datanode-0 datanode java.util.concurrent.ExecutionException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  java.nio.file.NoSuchFileException: 
> /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830
>  {code}
> Can be reproduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException

2019-11-07 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969379#comment-16969379
 ] 

Shashikant Banerjee commented on HDDS-2372:
---

In ratis, raft log entries can get truncated after leader election happens. The 
data write actually happens as a part of append the log entry itself. 
Currently, if the raft log gets truncated , we don't do any handling for those 
entries i.e, we don't delete/validate the chunk files written as a part of log 
entry itself as the the data always exist in the tmp files which is stamped 
with the term and log index  which are not visible and will remain as garbage 
even if the corresponding log entries in the raft log have been truncated. 

If we write to the actual chunk file which happens as a part of writing the log 
itself, then correspondingly, if the those log entries get truncated, we might 
need to handle this inside ozone by deleting the corresponding chunk files as 
well to maintain the consistency or have to validate the data while updating 
the rocks db entries as well.

> Datanode pipeline is failing with NoSuchFileException
> -
>
> Key: HDDS-2372
> URL: https://issues.apache.org/jira/browse/HDDS-2372
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Marton Elek
>Assignee: Shashikant Banerjee
>Priority: Critical
>
> Found it on a k8s based test cluster using a simple 3 node cluster and 
> HDDS-2327 freon test. After a while the StateMachine become unhealthy after 
> this error:
> {code:java}
> datanode-0 datanode java.util.concurrent.ExecutionException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  java.nio.file.NoSuchFileException: 
> /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830
>  {code}
> Can be reproduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException

2019-11-06 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968960#comment-16968960
 ] 

Shashikant Banerjee commented on HDDS-2372:
---

Thanks [~aengineer] for the suggestion. Writing to the actual chunk file may 
lead to handling truncation log entries in Ratis inside Ozone which we don't 
need to handle right now as we always write to tmp chunk files. Even if log 
entries get truncated inside Ratis , tmp files are left behind as garbage.

> Datanode pipeline is failing with NoSuchFileException
> -
>
> Key: HDDS-2372
> URL: https://issues.apache.org/jira/browse/HDDS-2372
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Marton Elek
>Assignee: Shashikant Banerjee
>Priority: Critical
>
> Found it on a k8s based test cluster using a simple 3 node cluster and 
> HDDS-2327 freon test. After a while the StateMachine become unhealthy after 
> this error:
> {code:java}
> datanode-0 datanode java.util.concurrent.ExecutionException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  java.nio.file.NoSuchFileException: 
> /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830
>  {code}
> Can be reproduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2270) Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet

2019-11-06 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2270:
--
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~adoroszlai] for working on this. I have committed this.

> Avoid buffer copying in ContainerStateMachine.loadSnapshot/persistContainerSet
> --
>
> Key: HDDS-2270
> URL: https://issues.apache.org/jira/browse/HDDS-2270
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone Datanode
>Reporter: Tsz-wo Sze
>Assignee: Attila Doroszlai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> ContainerStateMachine:
> - In loadSnapshot(..), it first reads the snapshotFile to a  byte[] and then 
> parses it to ContainerProtos.Container2BCSIDMapProto.  The buffer copying can 
> be avoided.
> {code}
> try (FileInputStream fin = new FileInputStream(snapshotFile)) {
>   byte[] container2BCSIDData = IOUtils.toByteArray(fin);
>   ContainerProtos.Container2BCSIDMapProto proto =
>   ContainerProtos.Container2BCSIDMapProto
>   .parseFrom(container2BCSIDData);
>   ...
> }
> {code}
> - persistContainerSet(..) has similar problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2407) Reduce log level of per-node failure in XceiverClientGrpc

2019-11-06 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2407:
--
Fix Version/s: 0.5.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~adoroszlai] for the contribution. I have committed this.

> Reduce log level of per-node failure in XceiverClientGrpc
> -
>
> Key: HDDS-2407
> URL: https://issues.apache.org/jira/browse/HDDS-2407
> Project: Hadoop Distributed Data Store
>  Issue Type: Task
>  Components: Ozone Client
>Reporter: Attila Doroszlai
>Assignee: Attila Doroszlai
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When reading from a pipeline, client should not care if some datanode could 
> not service the request, as long as the pipeline as a whole is OK.  The [log 
> message|https://github.com/apache/hadoop-ozone/blob/2529cee1a7dd27c51cb9aed0dc57af283ff24e26/hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/XceiverClientGrpc.java#L303-L304]
>  indicating node failure was [increased to error 
> level|https://github.com/apache/hadoop-ozone/commit/a79dc4609a975d46a3e051ad6904fb1eb40705ee#diff-b9b6f3ccb12829d90886e041d11395b1R288]
>  in HDDS-1780.  This task proposes to change it back to debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2388) Teragen test failure due to OM exception

2019-11-04 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966770#comment-16966770
 ] 

Shashikant Banerjee commented on HDDS-2388:
---

[~avijayan], OM was not crashing because of this but because of HDDS-2379. But, 
this exception showed up quite a few times

in the test.

> Teragen test failure due to OM exception
> 
>
> Key: HDDS-2388
> URL: https://issues.apache.org/jira/browse/HDDS-2388
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Hanisha Koneru
>Priority: Major
> Fix For: 0.5.0
>
>
> Ran into below exception while running teragen:
> {code:java}
> Unable to get delta updates since sequenceNumber 79932 
> org.rocksdb.RocksDBException: Requested sequence not yet written in the db
>   at org.rocksdb.RocksDB.getUpdatesSince(Native Method)
>   at org.rocksdb.RocksDB.getUpdatesSince(RocksDB.java:3587)
>   at 
> org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(RDBStore.java:338)
>   at 
> org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(OzoneManager.java:3283)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(OzoneManagerRequestHandler.java:404)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handle(OzoneManagerRequestHandler.java:314)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:219)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:134)
>   at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:102)
>   at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:984)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:912)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException

2019-10-31 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-2372:
-

Assignee: Shashikant Banerjee

> Datanode pipeline is failing with NoSuchFileException
> -
>
> Key: HDDS-2372
> URL: https://issues.apache.org/jira/browse/HDDS-2372
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Marton Elek
>Assignee: Shashikant Banerjee
>Priority: Critical
>
> Found it on a k8s based test cluster using a simple 3 node cluster and 
> HDDS-2327 freon test. After a while the StateMachine become unhealthy after 
> this error:
> {code:java}
> datanode-0 datanode java.util.concurrent.ExecutionException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  java.nio.file.NoSuchFileException: 
> /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830
>  {code}
> Can be reproduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2388) Teragen test failure due to OM exception

2019-10-31 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963831#comment-16963831
 ] 

Shashikant Banerjee commented on HDDS-2388:
---

cc - [~bharat]

> Teragen test failure due to OM exception
> 
>
> Key: HDDS-2388
> URL: https://issues.apache.org/jira/browse/HDDS-2388
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Ran into below exception while running teragen:
> {code:java}
> Unable to get delta updates since sequenceNumber 79932 
> org.rocksdb.RocksDBException: Requested sequence not yet written in the db
>   at org.rocksdb.RocksDB.getUpdatesSince(Native Method)
>   at org.rocksdb.RocksDB.getUpdatesSince(RocksDB.java:3587)
>   at 
> org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(RDBStore.java:338)
>   at 
> org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(OzoneManager.java:3283)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(OzoneManagerRequestHandler.java:404)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handle(OzoneManagerRequestHandler.java:314)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:219)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:134)
>   at 
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
>   at 
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:102)
>   at 
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:984)
>   at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:912)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2388) Teragen test failure due to OM exception

2019-10-31 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-2388:
-

 Summary: Teragen test failure due to OM exception
 Key: HDDS-2388
 URL: https://issues.apache.org/jira/browse/HDDS-2388
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
 Fix For: 0.5.0


Ran into below exception while running teragen:
{code:java}

Unable to get delta updates since sequenceNumber 79932 
org.rocksdb.RocksDBException: Requested sequence not yet written in the db
at org.rocksdb.RocksDB.getUpdatesSince(Native Method)
at org.rocksdb.RocksDB.getUpdatesSince(RocksDB.java:3587)
at 
org.apache.hadoop.hdds.utils.db.RDBStore.getUpdatesSince(RDBStore.java:338)
at 
org.apache.hadoop.ozone.om.OzoneManager.getDBUpdates(OzoneManager.java:3283)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.getOMDBUpdates(OzoneManagerRequestHandler.java:404)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerRequestHandler.handle(OzoneManagerRequestHandler.java:314)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:219)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:134)
at 
org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
at 
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:102)
at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:984)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:912)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2882)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException

2019-10-31 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963813#comment-16963813
 ] 

Shashikant Banerjee edited comment on HDDS-2372 at 10/31/19 9:52 AM:
-

Thanks [~elek] . I do agree that, there is no synchronisation between 
readStateMachineData and applyTransaction which may lead to NoSuchFile 
exception as you suggested but the appendRequest will be retried in the leader 
and the system should recover thereafter once the commit of writeChunk 
completes.

In teragen testing as well, i ran into same issue but my test did complete. Can 
you share the logs for this?


was (Author: shashikant):
Thanks [~elek] . I do agree that, there is no synchronisation between 
readStateMachineData and applyTransaction which may lead to NoSuchFile 
exception as you suggested but the appendRequest will be retried in the leader 
and the system should recover thereafter once the commit of writeChunk 
completes.

In teragen testing as well, i ran into same issue but my test did complete. Can 
you share the logs/test to reproduce this?

> Datanode pipeline is failing with NoSuchFileException
> -
>
> Key: HDDS-2372
> URL: https://issues.apache.org/jira/browse/HDDS-2372
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Marton Elek
>Priority: Critical
>
> Found it on a k8s based test cluster using a simple 3 node cluster and 
> HDDS-2327 freon test. After a while the StateMachine become unhealthy after 
> this error:
> {code:java}
> datanode-0 datanode java.util.concurrent.ExecutionException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  java.nio.file.NoSuchFileException: 
> /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830
>  {code}
> Can be reproduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException

2019-10-31 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963815#comment-16963815
 ] 

Shashikant Banerjee commented on HDDS-2372:
---

[~szetszwo], to answer your question precisely, while reading the data from 
stateMachine, it first checks whether the chunk file does exist. If this 
exists, it reads from the actual chunk file and if it does not exist, it reads 
from the temporary chunk file.

> Datanode pipeline is failing with NoSuchFileException
> -
>
> Key: HDDS-2372
> URL: https://issues.apache.org/jira/browse/HDDS-2372
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Marton Elek
>Priority: Critical
>
> Found it on a k8s based test cluster using a simple 3 node cluster and 
> HDDS-2327 freon test. After a while the StateMachine become unhealthy after 
> this error:
> {code:java}
> datanode-0 datanode java.util.concurrent.ExecutionException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  java.nio.file.NoSuchFileException: 
> /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830
>  {code}
> Can be reproduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2372) Datanode pipeline is failing with NoSuchFileException

2019-10-31 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963813#comment-16963813
 ] 

Shashikant Banerjee commented on HDDS-2372:
---

Thanks [~elek] . I do agree that, there is no synchronisation between 
readStateMachineData and applyTransaction which may lead to NoSuchFile 
exception as you suggested but the appendRequest will be retried in the leader 
and the system should recover thereafter once the commit of writeChunk 
completes.

In teragen testing as well, i ran into same issue but my test did complete. Can 
you share the logs/test to reproduce this?

> Datanode pipeline is failing with NoSuchFileException
> -
>
> Key: HDDS-2372
> URL: https://issues.apache.org/jira/browse/HDDS-2372
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Marton Elek
>Priority: Critical
>
> Found it on a k8s based test cluster using a simple 3 node cluster and 
> HDDS-2327 freon test. After a while the StateMachine become unhealthy after 
> this error:
> {code:java}
> datanode-0 datanode java.util.concurrent.ExecutionException: 
> java.util.concurrent.ExecutionException: 
> org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException:
>  java.nio.file.NoSuchFileException: 
> /data/storage/hdds/2a77fab9-9dc5-4f73-9501-b5347ac6145c/current/containerDir0/1/chunks/gGYYgiTTeg_testdata_chunk_13931.tmp.2.20830
>  {code}
> Can be reproduced.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2331) Client OOME due to buffer retention

2019-10-21 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955856#comment-16955856
 ] 

Shashikant Banerjee commented on HDDS-2331:
---

One more point to add here is, as per the cmd, freon is run with a single 
thread itself, which i assume will write only 1 key at a time. For a 1MB key, 
we should have only at max 2 containerCommadRequestMsgs(one for write chunk and 
one for putBlock) and these objects should disappear/garbage collected after 
the response is received. At any point of time, it should not hence show up 
150+ containerCommandRequestObjects lying around.

> Client OOME due to buffer retention
> ---
>
> Key: HDDS-2331
> URL: https://issues.apache.org/jira/browse/HDDS-2331
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Shashikant Banerjee
>Priority: Critical
> Attachments: profiler.png
>
>
> Freon random key generator exhausts default heap after just few hundred 1MB 
> keys.  Heap dump on OOME reveals 150+ instances of 
> {{ContainerCommandRequestMessage}}, each with 16MB {{byte[]}}.
> Steps to reproduce:
> # Start Ozone cluster with 1 datanode
> # Start Freon (5K keys of size 1MB)
> Result: OOME after a few hundred keys
> {noformat}
> $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone
> $ docker-compose up -d
> $ docker-compose exec scm bash
> $ export HADOOP_OPTS='-XX:+HeapDumpOnOutOfMemoryError'
> $ ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 
> --replicationType RATIS --factor ONE --keySize 1048576 --numOfKeys 5120 
> --bufferSize 65536
> ...
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid289.hprof ...
> Heap dump file created [1456141975 bytes in 7.760 secs]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Deleted] (HDDS-2338) Avoid buffer copy while submitting write chunk request in Ozone Client

2019-10-21 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee deleted HDDS-2338:
--


> Avoid buffer copy while submitting write chunk request in Ozone Client
> --
>
> Key: HDDS-2338
> URL: https://issues.apache.org/jira/browse/HDDS-2338
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> Based, on the config value of "ozone.UnsafeByteOperations.enabled" which by 
> default is set to true we used to avoid buffer copy while submitting write 
> chunk request to Ratis. With recent changes around byteStringConversion 
> utility, seems like the config is never passed to BlockOutputStream and it 
> results in buffer copying every time there is a byteBuffer to byteString 
> conversion is done in ozone client. This Jira is to pass the appropriate 
> config value so that buffer copy can be avoided.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2338) Avoid buffer copy while submitting write chunk request in Ozone Client

2019-10-21 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-2338:
-

Assignee: Shashikant Banerjee

> Avoid buffer copy while submitting write chunk request in Ozone Client
> --
>
> Key: HDDS-2338
> URL: https://issues.apache.org/jira/browse/HDDS-2338
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 0.5.0
>
>
> Based, on the config value of "ozone.UnsafeByteOperations.enabled" which by 
> default is set to true we used to avoid buffer copy while submitting write 
> chunk request to Ratis. With recent changes around byteStringConversion 
> utility, seems like the config is never passed to BlockOutputStream and it 
> results in buffer copying every time there is a byteBuffer to byteString 
> conversion is done in ozone client. This Jira is to pass the appropriate 
> config value so that buffer copy can be avoided.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2338) Avoid buffer copy while submitting write chunk request in Ozone Client

2019-10-21 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-2338:
-

 Summary: Avoid buffer copy while submitting write chunk request in 
Ozone Client
 Key: HDDS-2338
 URL: https://issues.apache.org/jira/browse/HDDS-2338
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Reporter: Shashikant Banerjee
 Fix For: 0.5.0


Based, on the config value of "ozone.UnsafeByteOperations.enabled" which by 
default is set to true we used to avoid buffer copy while submitting write 
chunk request to Ratis. With recent changes around byteStringConversion 
utility, seems like the config is never passed to BlockOutputStream and it 
results in buffer copying every time there is a byteBuffer to byteString 
conversion is done in ozone client. This Jira is to pass the appropriate config 
value so that buffer copy can be avoided.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDDS-2331) Client OOME due to buffer retention

2019-10-21 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955814#comment-16955814
 ] 

Shashikant Banerjee edited comment on HDDS-2331 at 10/21/19 7:04 AM:
-

In Ozone, by default the buffer size is equal to the chunk size(16 MB default). 
Once a write call happens, a buffer is allocated and data is being written just 
into the buffer till it gets full/flush/close  and then pushed to datanode  and 
the buffer gets only released when watchForCommit call for the respective 
putBlock call log Index succeeds successfully. So until and unless, 
watchForCommit call gets acknowledged by Ozone Client, we keep holding onto the 
buffer so that, in case the ratis request fails, we have the user data cached 
in the client buffer which can be written over to the next block.

We have  had multiple discussions around on reducing the default buffer size 
and implement a true streaming client, but this is still under consideration. 

[~adoroszlai], for your test, you can try changing the default chunk size to 
say 1 MB and see if it works well. It might also be possible that buffer 
release handling got broken with some changes introduced which need to be 
verified.


was (Author: shashikant):
In Ozone, by default the buffer size is equal to the chunk size(16 MB default). 
Once a write call happens, a buffer is allocated and data is being written just 
into the buffer till it gets full/flush/close  and then pushed to datanode  and 
the buffer gets only released when watchForCommit call for the respective 
putBlock call log Index succeeds successfully. So until and unless, 
watchForCommit call gets acknowledged by Ratis, we keep holding onto the buffer 
so that, in case the ratis request fails, we have the user data cached in the 
client buffer which can be written over to the next block.

We have  had multiple discussions around this on reducing the default buffer 
size and implement a true streaming client, but this is still under 
consideration. 

[~adoroszlai], for your test, you can try changing the default chunk size to 
say 1 MB and see if it works well. It might also be possible that buffer 
release handling got broken with some changes introduced which need to be 
verified.

> Client OOME due to buffer retention
> ---
>
> Key: HDDS-2331
> URL: https://issues.apache.org/jira/browse/HDDS-2331
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Shashikant Banerjee
>Priority: Critical
> Attachments: profiler.png
>
>
> Freon random key generator exhausts default heap after just few hundred 1MB 
> keys.  Heap dump on OOME reveals 150+ instances of 
> {{ContainerCommandRequestMessage}}, each with 16MB {{byte[]}}.
> Steps to reproduce:
> # Start Ozone cluster with 1 datanode
> # Start Freon (5K keys of size 1MB)
> Result: OOME after a few hundred keys
> {noformat}
> $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone
> $ docker-compose up -d
> $ docker-compose exec scm bash
> $ export HADOOP_OPTS='-XX:+HeapDumpOnOutOfMemoryError'
> $ ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 
> --replicationType RATIS --factor ONE --keySize 1048576 --numOfKeys 5120 
> --bufferSize 65536
> ...
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid289.hprof ...
> Heap dump file created [1456141975 bytes in 7.760 secs]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2331) Client OOME due to buffer retention

2019-10-21 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16955814#comment-16955814
 ] 

Shashikant Banerjee commented on HDDS-2331:
---

In Ozone, by default the buffer size is equal to the chunk size(16 MB default). 
Once a write call happens, a buffer is allocated and data is being written just 
into the buffer till it gets full/flush/close  and then pushed to datanode  and 
the buffer gets only released when watchForCommit call for the respective 
putBlock call log Index succeeds successfully. So until and unless, 
watchForCommit call gets acknowledged by Ratis, we keep holding onto the buffer 
so that, in case the ratis request fails, we have the user data cached in the 
client buffer which can be written over to the next block.

We have  had multiple discussions around this on reducing the default buffer 
size and implement a true streaming client, but this is still under 
consideration. 

[~adoroszlai], for your test, you can try changing the default chunk size to 
say 1 MB and see if it works well. It might also be possible that buffer 
release handling got broken with some changes introduced which need to be 
verified.

> Client OOME due to buffer retention
> ---
>
> Key: HDDS-2331
> URL: https://issues.apache.org/jira/browse/HDDS-2331
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Shashikant Banerjee
>Priority: Critical
> Attachments: profiler.png
>
>
> Freon random key generator exhausts default heap after just few hundred 1MB 
> keys.  Heap dump on OOME reveals 150+ instances of 
> {{ContainerCommandRequestMessage}}, each with 16MB {{byte[]}}.
> Steps to reproduce:
> # Start Ozone cluster with 1 datanode
> # Start Freon (5K keys of size 1MB)
> Result: OOME after a few hundred keys
> {noformat}
> $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone
> $ docker-compose up -d
> $ docker-compose exec scm bash
> $ export HADOOP_OPTS='-XX:+HeapDumpOnOutOfMemoryError'
> $ ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 
> --replicationType RATIS --factor ONE --keySize 1048576 --numOfKeys 5120 
> --bufferSize 65536
> ...
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid289.hprof ...
> Heap dump file created [1456141975 bytes in 7.760 secs]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2331) Client OOME due to buffer retention

2019-10-21 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-2331:
-

Assignee: Shashikant Banerjee

> Client OOME due to buffer retention
> ---
>
> Key: HDDS-2331
> URL: https://issues.apache.org/jira/browse/HDDS-2331
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Attila Doroszlai
>Assignee: Shashikant Banerjee
>Priority: Critical
> Attachments: profiler.png
>
>
> Freon random key generator exhausts default heap after just few hundred 1MB 
> keys.  Heap dump on OOME reveals 150+ instances of 
> {{ContainerCommandRequestMessage}}, each with 16MB {{byte[]}}.
> Steps to reproduce:
> # Start Ozone cluster with 1 datanode
> # Start Freon (5K keys of size 1MB)
> Result: OOME after a few hundred keys
> {noformat}
> $ cd hadoop-ozone/dist/target/ozone-0.5.0-SNAPSHOT/compose/ozone
> $ docker-compose up -d
> $ docker-compose exec scm bash
> $ export HADOOP_OPTS='-XX:+HeapDumpOnOutOfMemoryError'
> $ ozone freon rk --numOfThreads 1 --numOfVolumes 1 --numOfBuckets 1 
> --replicationType RATIS --factor ONE --keySize 1048576 --numOfKeys 5120 
> --bufferSize 65536
> ...
> java.lang.OutOfMemoryError: Java heap space
> Dumping heap to java_pid289.hprof ...
> Heap dump file created [1456141975 bytes in 7.760 secs]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2280) HddsUtils#CheckForException should not return null in case the ratis exception cause is not set

2019-10-18 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954702#comment-16954702
 ] 

Shashikant Banerjee commented on HDDS-2280:
---

st

> HddsUtils#CheckForException should not return null in case the ratis 
> exception cause is not set
> ---
>
> Key: HDDS-2280
> URL: https://issues.apache.org/jira/browse/HDDS-2280
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> HddsUtils#CheckForException checks for the cause to be set properly to one of 
> the defined/expected exceptions. In case, ratis throws up any runtime 
> exception, HddsUtils#CheckForException can return null and lead to 
> NullPointerException while write.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2332) BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future

2019-10-18 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16954697#comment-16954697
 ] 

Shashikant Banerjee commented on HDDS-2332:
---

[~ljain], i think we should timeout all requests in ozone client.

> BlockOutputStream#waitOnFlushFutures blocks on putBlock combined future
> ---
>
> Key: HDDS-2332
> URL: https://issues.apache.org/jira/browse/HDDS-2332
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Lokesh Jain
>Priority: Major
>
> BlockOutputStream blocks on waitOnFlushFutures call. Two jstacks show that 
> the thread is blocked on the same condition.
> {code:java}
> 2019-10-18 06:30:38
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
> "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on 
> condition [0x7fbea96d6000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xe4739888> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
>   at 
> java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.waitOnFlushFutures(BlockOutputStream.java:518)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.handleFlush(BlockOutputStream.java:481)
>   at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.close(BlockOutputStream.java:496)
>   at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.close(BlockOutputStreamEntry.java:143)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleFlushOrClose(KeyOutputStream.java:439)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:232)
>   at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:190)
>   at 
> org.apache.hadoop.fs.ozone.OzoneFSOutputStream.write(OzoneFSOutputStream.java:46)
>   at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
>   at java.io.DataOutputStream.write(DataOutputStream.java:107)
>   - locked <0xa6a75930> (a 
> org.apache.hadoop.fs.FSDataOutputStream)
>   at 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:77)
>   - locked <0xa6a75918> (a 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter)
>   at 
> org.apache.hadoop.examples.terasort.TeraOutputFormat$TeraRecordWriter.write(TeraOutputFormat.java:64)
>   at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:670)
>   at 
> org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
>   at 
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
>   at 
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:230)
>   at 
> org.apache.hadoop.examples.terasort.TeraGen$SortGenMapper.map(TeraGen.java:203)
>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1876)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
> 2019-10-18 07:02:50
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (25.141-b15 mixed mode):
> "main" #1 prio=5 os_prio=0 tid=0x7fbea001a800 nid=0x2a56 waiting on 
> condition [0x7fbea96d6000]
>java.lang.Thread.State: WAITING (parking)
>   at sun.misc.Unsafe.park(Native Method)
>   - parking to wait for  <0xe4739888> (a 
> java.util.concurrent.CompletableFuture$Signaller)
>   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
>   at 
> java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
>   at 
> java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
>   at 
> 

[jira] [Created] (HDDS-2286) Add a log info in ozone client and scm to print the exclusion list during allocate block

2019-10-11 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-2286:
-

 Summary: Add a log info in ozone client and scm to print the 
exclusion list during allocate block
 Key: HDDS-2286
 URL: https://issues.apache.org/jira/browse/HDDS-2286
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2281) ContainerStateMachine#handleWriteChunk should ignore close container exception

2019-10-10 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-2281:
-

 Summary: ContainerStateMachine#handleWriteChunk should ignore 
close container exception 
 Key: HDDS-2281
 URL: https://issues.apache.org/jira/browse/HDDS-2281
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


Currently, ContainerStateMachine#applyTrannsaction ignores close container 
exception.Similarly,ContainerStateMachine#handleWriteChunk call also should 
ignore close container exception.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2280) HddsUtils#CheckForException should not return null in case the ratis exception cause is not set

2019-10-10 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2280:
--
Summary: HddsUtils#CheckForException should not return null in case the 
ratis exception cause is not set  (was: HddsUtils#CheckForException may return 
null in case the ratis exception cause is not set)

> HddsUtils#CheckForException should not return null in case the ratis 
> exception cause is not set
> ---
>
> Key: HDDS-2280
> URL: https://issues.apache.org/jira/browse/HDDS-2280
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>
> HddsUtils#CheckForException checks for the cause to be set properly to one of 
> the defined/expected exceptions. In case, ratis throws up any runtime 
> exception, HddsUtils#CheckForException can return null and lead to 
> NullPointerException while write.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2280) HddsUtils#CheckForException may return null in case the ratis exception cause is not set

2019-10-10 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-2280:
-

 Summary: HddsUtils#CheckForException may return null in case the 
ratis exception cause is not set
 Key: HDDS-2280
 URL: https://issues.apache.org/jira/browse/HDDS-2280
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee


HddsUtils#CheckForException checks for the cause to be set properly to one of 
the defined/expected exceptions. In case, ratis throws up any runtime 
exception, HddsUtils#CheckForException can return null and lead to 
NullPointerException while write.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2266) Avoid evaluation of LOG.trace and LOG.debug statement in the read/write path (Ozone)

2019-10-10 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-2266.
---
Resolution: Fixed

Thanks [~swagle] for the contribution. I have committed this.

> Avoid evaluation of LOG.trace and LOG.debug statement in the read/write path 
> (Ozone)
> 
>
> Key: HDDS-2266
> URL: https://issues.apache.org/jira/browse/HDDS-2266
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>  Components: Ozone CLI, Ozone Manager
>Affects Versions: 0.5.0
>Reporter: Siddharth Wagle
>Assignee: Siddharth Wagle
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> LOG.trace and LOG.debug with logging information will be evaluated even when 
> debug/trace logging is disabled. This jira proposes to wrap all the 
> trace/debug logging with
> LOG.isDebugEnabled and LOG.isTraceEnabled to prevent the logging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14869) Data loss in case of distcp using snapshot diff. Replication should include rename records if file was skipped in the previous iteration

2019-10-10 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-14869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16948318#comment-16948318
 ] 

Shashikant Banerjee commented on HDFS-14869:


[~ste...@apache.org], can you please have a look at this?

> Data loss in case of distcp using snapshot diff. Replication should include 
> rename records if file was skipped in the previous iteration
> 
>
> Key: HDFS-14869
> URL: https://issues.apache.org/jira/browse/HDFS-14869
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: distcp
>Reporter: Aasha Medhi
>Assignee: Aasha Medhi
>Priority: Major
>
> This issue arises when a directory or file is excluded by exclusion filter 
> during distcp replication. Later on if the directory is renamed later to a 
> name which is not excluded by the filter, the snapshot diff reports only a 
> rename operation.  The directory is never copied to target even though its 
> not excluded now. This also doesn't throw any error so there is no way to 
> find the issue. 
> Steps to reproduce
>  * Create a directory in hdfs to copy using distcp.
>  * Include a staging folder in the directory.
> {code:java}
> [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop fs -ls 
> /tmp/tocopy
> Found 4 items
> -rw-r--r--   3 hdfs hdfs 16 2019-09-12 10:32 /tmp/tocopy/.b.txt
> drwxr-xr-x   - hdfs hdfs  0 2019-09-23 09:18 /tmp/tocopy/.staging
> -rw-r--r--   3 hdfs hdfs 12 2019-09-12 10:32 /tmp/tocopy/a.txt
> -rw-r--r--   3 hdfs hdfs  4 2019-09-20 08:23 /tmp/tocopy/foo.txt{code}
>  * The exclusion filter is set to exclude any staging directory
> {code:java}
> [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ cat 
> /tmp/filter
> .*\.Trash.*
> .*\.staging.*{code}
>  * Do a copy using distcp snapshots, the staging directory is not replicated.
> {code:java}
> hadoop jar hadoop-distcp-3.3.0-SNAPSHOT.jar 
> -Dmapreduce.job.user.classpath.first=true -filters /tmp/filter 
> /tmp/tocopy/.snapshot/s1 /tmp/target
> [hdfs@ctr-e141-1563959304486-33995-01-03 root]$ hadoop fs -ls /tmp/target
> Found 3 items
> -rw-r--r--   3 hdfs hdfs 16 2019-09-24 06:56 /tmp/target/.b.txt
> -rw-r--r--   3 hdfs hdfs 12 2019-09-24 06:56 /tmp/target/a.txt
> -rw-r--r--   3 hdfs hdfs  4 2019-09-24 06:56 /tmp/target/foo.txt{code}
>  * Rename the staging directory to final
> {code:java}
> [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop fs -mv 
> /tmp/tocopy/.staging /tmp/tocopy/final{code}
>  * Do a copy using snapshot diff.
> {code:java}
> [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hdfs 
> snapshotDiff /tmp/tocopy s1 s2[hdfs@ctr-e141-1563959304486-33995-01-03 
> hadoop-mapreduce]$ hdfs snapshotDiff /tmp/tocopy s1 s2Difference between 
> snapshot s1 and snapshot s2 under directory /tmp/tocopy:M .R ./.staging -> 
> ./final
> {code}
>  * The diff report just has a rename record and the new final directory is 
> never copied.
> {code:java}
> [hdfs@ctr-e141-1563959304486-33995-01-03 hadoop-mapreduce]$ hadoop jar 
> hadoop-distcp-3.3.0-SNAPSHOT.jar -Dmapreduce.job.user.classpath.first=true 
> -filters /tmp/filter -diff s1 s2 -update /tmp/tocopy /tmp/target
> 19/09/24 07:05:32 INFO tools.DistCp: Input Options: 
> DistCpOptions{atomicCommit=false, syncFolder=true, deleteMissing=false, 
> ignoreFailures=false, overwrite=false, append=false, useDiff=true, 
> useRdiff=false, fromSnapshot=s1, toSnapshot=s2, skipCRC=false, blocking=true, 
> numListstatusThreads=0, maxMaps=20, mapBandwidth=0.0, 
> copyStrategy='uniformsize', preserveStatus=[BLOCKSIZE], atomicWorkPath=null, 
> logPath=null, sourceFileListing=null, sourcePaths=[/tmp/tocopy], 
> targetPath=/tmp/target, filtersFile='/tmp/filter', blocksPerChunk=0, 
> copyBufferSize=8192, verboseLog=false, directWrite=false}, 
> sourcePaths=[/tmp/tocopy], targetPathExists=true, preserveRawXattrsfalse
> 19/09/24 07:05:32 INFO client.RMProxy: Connecting to ResourceManager at 
> ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:8050
> 19/09/24 07:05:33 INFO client.AHSProxy: Connecting to Application History 
> server at ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:10200
> 19/09/24 07:05:33 INFO tools.DistCp: Number of paths in the copy list: 0
> 19/09/24 07:05:33 INFO client.RMProxy: Connecting to ResourceManager at 
> ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:8050
> 19/09/24 07:05:33 INFO client.AHSProxy: Connecting to Application History 
> server at ctr-e141-1563959304486-33995-01-03.hwx.site/172.27.68.128:10200
> 19/09/24 07:05:33 INFO mapreduce.JobResourceUploader: Disabling Erasure 
> Coding for path: 

[jira] [Resolved] (HDDS-2261) Change readChunk methods to return ByteBuffer

2019-10-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-2261.
---
Resolution: Fixed

> Change readChunk methods to return ByteBuffer
> -
>
> Key: HDDS-2261
> URL: https://issues.apache.org/jira/browse/HDDS-2261
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Major
>  Labels: pull-request-available
>
> During refactoring to HDDS-2233 I realized the following:
> KeyValueHandler.handleReadChunk and handleGetSmallFile methods are using 
> ChunkManager.readChunk, which returns a byte[], but then both of them (the 
> only usage points) converts the returning byte[] to a ByteBuffer, and then to 
> a ByteString.
> ChunkManagerImpl on the other hand in readChunk utilizes 
> ChunkUtils.readChunk, which in order to conform the return value converts a 
> ByteBuffer back to a byte[].
> I open this JIRA to change the internal logic to fully rely on ByteBuffers 
> instead of converting from ByteBuffer to byte[] then to ByteBuffer again.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDDS-2233) Remove ByteStringHelper and refactor the code to the place where it used

2019-10-09 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDDS-2233.
---
Fix Version/s: 0.5.0
   Resolution: Fixed

> Remove ByteStringHelper and refactor the code to the place where it used
> 
>
> Key: HDDS-2233
> URL: https://issues.apache.org/jira/browse/HDDS-2233
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>Reporter: Istvan Fajth
>Assignee: Istvan Fajth
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> See HDDS-2203 where there is a race condition reported by me.
> Later in the discussion we agreed that it is better to refactor the code and 
> remove the class completely for now, and that would also resolve the race 
> condition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2169) Avoid buffer copies while submitting client requests in Ratis

2019-10-06 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2169:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Avoid buffer copies while submitting client requests in Ratis
> -
>
> Key: HDDS-2169
> URL: https://issues.apache.org/jira/browse/HDDS-2169
> Project: Hadoop Distributed Data Store
>  Issue Type: Improvement
>Reporter: Shashikant Banerjee
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Currently, while sending write requests to Ratis from ozone, a protobuf 
> object containing data encoded  and then resultant protobuf is again 
> converted to a byteString which internally does a copy of the buffer embedded 
> inside the protobuf again so that it can be submitted over to Ratis client. 
> Again, while sending the appendRequest as well while building up the 
> appendRequestProto, it might be again copying the data. The idea here is to 
> provide client so pass the raw data(stateMachine data) separately to ratis 
> client without copying overhead. 
>  
> {code:java}
> private CompletableFuture sendRequestAsync(
> ContainerCommandRequestProto request) {
>   try (Scope scope = GlobalTracer.get()
>   .buildSpan("XceiverClientRatis." + request.getCmdType().name())
>   .startActive(true)) {
> ContainerCommandRequestProto finalPayload =
> ContainerCommandRequestProto.newBuilder(request)
> .setTraceID(TracingUtil.exportCurrentSpan())
> .build();
> boolean isReadOnlyRequest = HddsUtils.isReadOnly(finalPayload);
> //  finalPayload already has the byteString data embedded. 
> ByteString byteString = finalPayload.toByteString(); -> It involves a 
> copy again.
> if (LOG.isDebugEnabled()) {
>   LOG.debug("sendCommandAsync {} {}", isReadOnlyRequest,
>   sanitizeForDebug(finalPayload));
> }
> return isReadOnlyRequest ?
> getClient().sendReadOnlyAsync(() -> byteString) :
> getClient().sendAsync(() -> byteString);
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2210) ContainerStateMachine should not be marked unhealthy if applyTransaction fails with closed container exception

2019-10-01 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2210:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~msingh] for the review. I have committed this change to trunk.

> ContainerStateMachine should not be marked unhealthy if applyTransaction 
> fails with closed container exception
> --
>
> Key: HDDS-2210
> URL: https://issues.apache.org/jira/browse/HDDS-2210
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Currently, if applyTransaction fails, the stateMachine is marked unhealthy 
> and next snapshot creation will fail. As a result of which the the raftServer 
> will close down leading to pipeline failure. ClosedContainer exception should 
> be ignored while marking the stateMachine unhealthy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-14492) Snapshot memory leak

2019-10-01 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee resolved HDFS-14492.

Fix Version/s: 3.1.4
   Resolution: Fixed

Thanks [~jojochuang] for the contribution. I have committed this change to 
trunk.

> Snapshot memory leak
> 
>
> Key: HDFS-14492
> URL: https://issues.apache.org/jira/browse/HDFS-14492
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 2.6.0
> Environment: CDH5.14.4
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
> Fix For: 3.1.4
>
>
> We recently examined the NameNode heap dump of a big, heavy snapshot user, 
> trying to trim some fat, and surely enough we found memory leak in it: when 
> snapshots are removed, the corresponding data structures are not removed.
> This cluster has 586 million file system objects (286 million files, 287 
> million blocks, 13 million directories), using around 132gb of heap.
> While only 44.5 million files have snapshotted copies, 
> (INodeFileAttributes$SnapshotCopy), most inodes (nearly 212 million) have 
> FileWithSnapshotFeature and FileDiffList. Those inodes had snapshotted copies 
> at some point in the past, but after snapshots are removed, those data 
> structured are still kept in the heap.
> INode$Feature = 32.5 byte on average, FileWithSnapshotFeature = 32 bytes, 
> FileDiffList = 24 bytes. It may not sound a lot, but they add up quickly in 
> large clusters like this. In this cluster, a whopping 13.8gb of memory could 
> have been saved:  ((32.5 + 32 + 24) bytes * (211997769 -  44572380) =~ 
> 13.8gb) if not for this bug. That is more than 10% of savings in heap size.
> Heap histogram for reference:
> {noformat}
> num #instances #bytes class name
>  --
>  1: 286418254 27496152384 org.apache.hadoop.hdfs.server.namenode.INodeFile
>  2: 28737 18388622528 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
>  3: 227899550 17144816120 [B
>  4: 287324031 13769408616 
> [Lorg.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo;
>  5: 71352116 12353841568 [Ljava.lang.Object;
>  6: 286322650 9170335840 
> [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
>  7: 235632329 7658462416 
> [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature;
>  8: 4 7046430816 [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement;
>  9: 211997769 6783928608 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature
>  10: 211997769 5087946456 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList
>  11: 76586261 3780468856 [I
>  12: 44572380 3209211360 
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy
>  13: 58634517 2345380680 java.util.ArrayList
>  14: 44572380 2139474240 
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff
>  15: 76582416 1837977984 org.apache.hadoop.hdfs.server.namenode.AclFeature
>  16: 12907668 1135874784 
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory{noformat}
> [~szetszwo] [~arpaga] [~smeng] [~shashikant]  any thoughts?
> I am thinking that inside 
> AbstractINodeDiffList#deleteSnapshotDiff() , in addition to cleaning up file 
> diffs, it should also remove FileWithSnapshotFeature. I am not familiar with 
> the snapshot implementation, so any guidance is greatly appreciated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2210) ContainerStateMachine should not be marked unhealthy if applyTransaction fails with closed container exception

2019-09-30 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2210:
--
Status: Patch Available  (was: Open)

> ContainerStateMachine should not be marked unhealthy if applyTransaction 
> fails with closed container exception
> --
>
> Key: HDDS-2210
> URL: https://issues.apache.org/jira/browse/HDDS-2210
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, if applyTransaction fails, the stateMachine is marked unhealthy 
> and next snapshot creation will fail. As a result of which the the raftServer 
> will close down leading to pipeline failure. ClosedContainer exception should 
> be ignored while marking the stateMachine unhealthy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2210) ContainerStateMachine should not be marked unhealthy if applyTransaction fails with closed container exception

2019-09-30 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-2210:
-

 Summary: ContainerStateMachine should not be marked unhealthy if 
applyTransaction fails with closed container exception
 Key: HDDS-2210
 URL: https://issues.apache.org/jira/browse/HDDS-2210
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


Currently, if applyTransaction fails, the stateMachine is marked unhealthy and 
next snapshot creation will fail. As a result of which the the raftServer will 
close down leading to pipeline failure. ClosedContainer exception should be 
ignored while marking the stateMachine unhealthy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDDS-2152) Ozone client fails with OOM while writing a large (~300MB) key.

2019-09-30 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee reassigned HDDS-2152:
-

Assignee: Shashikant Banerjee

> Ozone client fails with OOM while writing a large (~300MB) key.
> ---
>
> Key: HDDS-2152
> URL: https://issues.apache.org/jira/browse/HDDS-2152
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Aravindan Vijayan
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: largekey.png
>
>
> {code}
> dd if=/dev/zero of=testfile bs=1024 count=307200
> ozone sh key put /vol1/bucket1/key testfile
> {code}
> {code}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at 
> java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at 
> java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at 
> org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:66)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:234)
>  at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129)
>  at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211)
>  at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193)
>  at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
>  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:96) at 
> org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:117)
>  at 
> org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:55)
>  at picocli.CommandLine.execute(CommandLine.java:1173) at 
> picocli.CommandLine.access$800(CommandLine.java:141)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2207) Update Ratis to latest snnapshot

2019-09-30 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-2207:
-

 Summary: Update Ratis to latest snnapshot
 Key: HDDS-2207
 URL: https://issues.apache.org/jira/browse/HDDS-2207
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Client
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0


This Jira aims to update ozone with latest ratis snapshot which has a crtical 
fix for retry behaviour on getting not leader exception in client.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2152) Ozone client fails with OOM while writing a large (~300MB) key.

2019-09-23 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935959#comment-16935959
 ] 

Shashikant Banerjee commented on HDDS-2152:
---

[~jnp], this requires changes in Ratis as tracked by RATIS-688 plus 
corresponding changes in Ozone client as well Datannode.

> Ozone client fails with OOM while writing a large (~300MB) key.
> ---
>
> Key: HDDS-2152
> URL: https://issues.apache.org/jira/browse/HDDS-2152
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Aravindan Vijayan
>Assignee: YiSheng Lien
>Priority: Major
> Attachments: largekey.png
>
>
> {code}
> dd if=/dev/zero of=testfile bs=1024 count=307200
> ozone sh key put /vol1/bucket1/key testfile
> {code}
> {code}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at 
> java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at 
> java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at 
> org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:66)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:234)
>  at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129)
>  at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211)
>  at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193)
>  at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
>  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:96) at 
> org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:117)
>  at 
> org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:55)
>  at picocli.CommandLine.execute(CommandLine.java:1173) at 
> picocli.CommandLine.access$800(CommandLine.java:141)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2152) Ozone client fails with OOM while writing a large (~300MB) key.

2019-09-23 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935663#comment-16935663
 ] 

Shashikant Banerjee commented on HDDS-2152:
---

The issue gets recreated when u try to create/write a key of size 300 MB as in 
the test with java heap to set to 256 MB or lower. This issue needs some 
discussions on how to avoid buffer copies while doing protobuf conversion as 
well as dig in areas of code where actual possible buffer copy is happening 
while writes or possible memory leaks. 

> Ozone client fails with OOM while writing a large (~300MB) key.
> ---
>
> Key: HDDS-2152
> URL: https://issues.apache.org/jira/browse/HDDS-2152
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Aravindan Vijayan
>Assignee: YiSheng Lien
>Priority: Major
> Attachments: largekey.png
>
>
> {code}
> dd if=/dev/zero of=testfile bs=1024 count=307200
> ozone sh key put /vol1/bucket1/key testfile
> {code}
> {code}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at 
> java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at 
> java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at 
> org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:66)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:234)
>  at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129)
>  at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211)
>  at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193)
>  at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
>  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:96) at 
> org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:117)
>  at 
> org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:55)
>  at picocli.CommandLine.execute(CommandLine.java:1173) at 
> picocli.CommandLine.access$800(CommandLine.java:141)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDDS-2152) Ozone client fails with OOM while writing a large (~300MB) key.

2019-09-19 Thread Shashikant Banerjee (Jira)


[ 
https://issues.apache.org/jira/browse/HDDS-2152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933588#comment-16933588
 ] 

Shashikant Banerjee commented on HDDS-2152:
---

[~cxorm], do you have a solution/fix to address this?

> Ozone client fails with OOM while writing a large (~300MB) key.
> ---
>
> Key: HDDS-2152
> URL: https://issues.apache.org/jira/browse/HDDS-2152
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Reporter: Aravindan Vijayan
>Assignee: YiSheng Lien
>Priority: Major
>
> {code}
> dd if=/dev/zero of=testfile bs=1024 count=307200
> ozone sh key put /vol1/bucket1/key testfile
> {code}
> {code}
> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at 
> java.nio.HeapByteBuffer.(HeapByteBuffer.java:57) at 
> java.nio.ByteBuffer.allocate(ByteBuffer.java:335) at 
> org.apache.hadoop.hdds.scm.storage.BufferPool.allocateBufferIfNeeded(BufferPool.java:66)
>  at 
> org.apache.hadoop.hdds.scm.storage.BlockOutputStream.write(BlockOutputStream.java:234)
>  at 
> org.apache.hadoop.ozone.client.io.BlockOutputStreamEntry.write(BlockOutputStreamEntry.java:129)
>  at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.handleWrite(KeyOutputStream.java:211)
>  at 
> org.apache.hadoop.ozone.client.io.KeyOutputStream.write(KeyOutputStream.java:193)
>  at 
> org.apache.hadoop.ozone.client.io.OzoneOutputStream.write(OzoneOutputStream.java:49)
>  at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:96) at 
> org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:117)
>  at 
> org.apache.hadoop.ozone.web.ozShell.keys.PutKeyHandler.call(PutKeyHandler.java:55)
>  at picocli.CommandLine.execute(CommandLine.java:1173) at 
> picocli.CommandLine.access$800(CommandLine.java:141)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2153) Add a config to tune max pending requests in Ratis leader

2019-09-19 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2153:
--
Status: Patch Available  (was: Open)

> Add a config to tune max pending requests in Ratis leader
> -
>
> Key: HDDS-2153
> URL: https://issues.apache.org/jira/browse/HDDS-2153
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Datanode
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDDS-2153) Add a config to tune max pending requests in Ratis leader

2019-09-19 Thread Shashikant Banerjee (Jira)
Shashikant Banerjee created HDDS-2153:
-

 Summary: Add a config to tune max pending requests in Ratis leader
 Key: HDDS-2153
 URL: https://issues.apache.org/jira/browse/HDDS-2153
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: Ozone Datanode
Affects Versions: 0.5.0
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 0.5.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDDS-2032) Ozone client should retry writes in case of any ratis/stateMachine exceptions

2019-09-18 Thread Shashikant Banerjee (Jira)


 [ 
https://issues.apache.org/jira/browse/HDDS-2032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDDS-2032:
--
Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks [~msingh] for the review. I have committed this.

> Ozone client should retry writes in case of any ratis/stateMachine exceptions
> -
>
> Key: HDDS-2032
> URL: https://issues.apache.org/jira/browse/HDDS-2032
> Project: Hadoop Distributed Data Store
>  Issue Type: Bug
>  Components: Ozone Client
>Affects Versions: 0.5.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.5.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Currently, Ozone client retry writes on a different pipeline or container in 
> case of some specific exceptions. But in case, it sees exception such as 
> DISK_FULL, CONTAINER_UNHEALTHY or any corruption , it just aborts the write. 
> In general, the every such exception on the client should be a retriable  
> exception in ozone client and on some specific exceptions, it should take 
> some more specific exception like excluding certain containers or pipelines 
> while retrying or informing SCM of a corrupt replica etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



<    1   2   3   4   5   6   7   8   9   10   >