date:20200729



 [ 
https://issues.apache.org/jira/browse/HDFS-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15498:
---
Description: 
HDFS-15488 adds a cmd to list all snapshots for a given snapshottable 
directory. A snapshot can be just marked as deleted with ordered deletion 
config set. This Jira aims to add deletion status to cmd output.

 

SAMPLE OUTPUT:
{noformat}
sbanerjee-MBP15:hadoop-3.4.0-SNAPSHOT sbanerjee$ bin/hdfs lsSnapshottableDir
drwxr-xr-x 0 sbanerjee supergroup 0 2020-07-27 11:52 2 65536 /user
sbanerjee-MBP15:hadoop-3.4.0-SNAPSHOT sbanerjee$ bin/hdfs lsSnapshot /user
drwxr-xr-x 0 sbanerjee supergroup 0 2020-07-27 11:52 1 ACTIVE /user/.snapshot/s1
drwxr-xr-x 0 sbanerjee supergroup 0 2020-07-27 11:51 0 DELETED 
/user/.snapshot/s20200727-115156.407{noformat}

  was:HDFS-15488 adds a cmd to list all snapshots for a given snapshottable 
directory. A snapshot can be just marked as deleted with ordered deletion 
config set. This Jira aims to add deletion status to cmd output.


> Show snapshots deletion status in snapList cmd
> --
>
> Key: HDFS-15498
> URL: https://issues.apache.org/jira/browse/HDFS-15498
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15498.000.patch
>
>
> HDFS-15488 adds a cmd to list all snapshots for a given snapshottable 
> directory. A snapshot can be just marked as deleted with ordered deletion 
> config set. This Jira aims to add deletion status to cmd output.
>  
> SAMPLE OUTPUT:
> {noformat}
> sbanerjee-MBP15:hadoop-3.4.0-SNAPSHOT sbanerjee$ bin/hdfs lsSnapshottableDir
> drwxr-xr-x 0 sbanerjee supergroup 0 2020-07-27 11:52 2 65536 /user
> sbanerjee-MBP15:hadoop-3.4.0-SNAPSHOT sbanerjee$ bin/hdfs lsSnapshot /user
> drwxr-xr-x 0 sbanerjee supergroup 0 2020-07-27 11:52 1 ACTIVE 
> /user/.snapshot/s1
> drwxr-xr-x 0 sbanerjee supergroup 0 2020-07-27 11:51 0 DELETED 
> /user/.snapshot/s20200727-115156.407{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15498) Show snapshots deletion status in snapList cmd



 [ 
https://issues.apache.org/jira/browse/HDFS-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15498:
---
Attachment: HDFS-15498.000.patch

> Show snapshots deletion status in snapList cmd
> --
>
> Key: HDFS-15498
> URL: https://issues.apache.org/jira/browse/HDFS-15498
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15498.000.patch
>
>
> HDFS-15488 adds a cmd to list all snapshots for a given snapshottable 
> directory. A snapshot can be just marked as deleted with ordered deletion 
> config set. This Jira aims to add deletion status to cmd output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15498) Show snapshots deletion status in snapList cmd



 [ 
https://issues.apache.org/jira/browse/HDFS-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15498:
---
Description: HDFS-15488 adds a cmd to list all snapshots for a given 
snapshottable directory. A snapshot can be just marked as deleted with ordered 
deletion config set. This Jira aims to add deletion status to cmd output.  
(was: HDFS-15488 adds a cmd to list all snapshots for a given snapshottable 
directory. A snapshot can be just marked as deleted with ordered deletion 
config set. This Jira aims to add an option to show the deletion status.)

> Show snapshots deletion status in snapList cmd
> --
>
> Key: HDFS-15498
> URL: https://issues.apache.org/jira/browse/HDFS-15498
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
>
> HDFS-15488 adds a cmd to list all snapshots for a given snapshottable 
> directory. A snapshot can be just marked as deleted with ordered deletion 
> config set. This Jira aims to add deletion status to cmd output.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15498) Show snapshots deletion status in snapList cmd



 [ 
https://issues.apache.org/jira/browse/HDFS-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15498:
---
Summary: Show snapshots deletion status in snapList cmd  (was: Add an 
option in snapList cmd to show snapshots deletion status)

> Show snapshots deletion status in snapList cmd
> --
>
> Key: HDFS-15498
> URL: https://issues.apache.org/jira/browse/HDFS-15498
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
>
> HDFS-15488 adds a cmd to list all snapshots for a given snapshottable 
> directory. A snapshot can be just marked as deleted with ordered deletion 
> config set. This Jira aims to add an option to show the deletion status.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15498) Add an option in snapList cmd to show snapshots deletion status



 [ 
https://issues.apache.org/jira/browse/HDFS-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15498:
---
Summary: Add an option in snapList cmd to show snapshots deletion status  
(was: Add an option in snapList cmd to show snapshots which are marked deleted)

> Add an option in snapList cmd to show snapshots deletion status
> ---
>
> Key: HDFS-15498
> URL: https://issues.apache.org/jira/browse/HDFS-15498
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
>
> HDFS-15488 adds a cmd to list all snapshots for a given snapshottable 
> directory. A snapshot can be just marked as deleted with ordered deletion 
> config set. This Jira aims to add an option to show the deletion status.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-15496) Add UI for deleted snapshots



 [ 
https://issues.apache.org/jira/browse/HDFS-15496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh reassigned HDFS-15496:


Assignee: Vivek Ratnavel Subramanian

> Add UI for deleted snapshots
> 
>
> Key: HDFS-15496
> URL: https://issues.apache.org/jira/browse/HDFS-15496
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Mukul Kumar Singh
>Assignee: Vivek Ratnavel Subramanian
>Priority: Major
>
> Add UI for deleted snapshots
> a) Show the list of snapshots per snapshottable directory
> b) Add deleted status in the JMX output for the Snapshot along with a snap ID
> e) NN UI, should sort the snapshots for snapIds. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-15501) Update Apache documentation for new ordered snapshot deletion feature

Mukul Kumar Singh created HDFS-15501:


 Summary: Update Apache documentation for new ordered snapshot 
deletion feature
 Key: HDFS-15501
 URL: https://issues.apache.org/jira/browse/HDFS-15501
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Mukul Kumar Singh


Update Apache documentation for new ordered snapshot deletion feature.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-15500) Add more assertions about ordered deletion of snapshot

Mukul Kumar Singh created HDFS-15500:


 Summary: Add more assertions about ordered deletion of snapshot
 Key: HDFS-15500
 URL: https://issues.apache.org/jira/browse/HDFS-15500
 Project: Hadoop HDFS
  Issue Type: Sub-task
Reporter: Mukul Kumar Singh
Assignee: Tsz-wo Sze


The jira proposes to add new assertions, one of the assertion to start with is
a) Add an assertion that with ordered snapshot deletion flag true, prior 
snapshot in cleansubtree is null



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15492) Make trash root inside each snapshottable directory



 [ 
https://issues.apache.org/jira/browse/HDFS-15492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-15492:
-
Parent: HDFS-15477
Issue Type: Sub-task  (was: Improvement)

> Make trash root inside each snapshottable directory
> ---
>
> Key: HDFS-15492
> URL: https://issues.apache.org/jira/browse/HDFS-15492
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: hdfs, hdfs-client
>Affects Versions: 3.2.1
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>
> We have seen FSImage corruption cases (e.g. HDFS-13101) where files inside 
> one snapshottable directories are moved outside of it. The most common case 
> of this is when trash is enabled and user deletes some file via the command 
> line without skipTrash.
> This jira aims to make a trash root for each snapshottable directory, same as 
> how encryption zone behaves at the moment.
> This will make trash cleanup a little bit more expensive on the NameNode as 
> it will be to iterate all trash roots. But should be fine as long as there 
> aren't many snapshottable directories.
> I could make this improvement as an option and disable it by default if 
> needed, such as {{dfs.namenode.snapshot.trashroot.enabled}}
> One small caveat though, when disabling (disallowing) snapshot on the 
> snapshottable directory when this improvement is in place. The client should 
> merge the snapshottable directory's trash with that user's trash to ensure 
> proper trash cleanup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2020-07-29 Thread Chengwei Wang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167604#comment-17167604
 ] 

Chengwei Wang edited comment on HDFS-15493 at 7/30/20, 3:01 AM:


Hi [~sodonnell]，sorry for missing some messages.
{quote}I mis-understood how it worked, as I thought `awaitTermination(...)` 
threw an exception after the timeout, which is not the case.
{quote}
I guess you got a misunderstanding about `awaitTermination(...)` . It just like 
object.wait(long time), wolud just stop blocking rather than throw a 
InterruptedException.
{code:java}
/**
 * Blocks until all tasks have completed execution after a shutdown
 * request, or the timeout occurs, or the current thread is
 * interrupted, whichever happens first.
 *
 * @param timeout the maximum time to wait
 * @param unit the time unit of the timeout argument
 * @return {@code true} if this executor terminated and
 * {@code false} if the timeout elapsed before termination
 * @throws InterruptedException if interrupted while waiting
 */
boolean awaitTermination(long timeout, TimeUnit unit)
throws InterruptedException;
{code}
So, awaitTermination 1 ms would make executor shutdown quickly. 
{quote}Did you find the runtime was about the same with a single executor with 
4 threads and two executors with a single thread?As my testing showed a small 
improvement with the two single threaded executors case, and as locking 
prevents more than one thread to run concurrently, I think it would be better 
to go with the two executors with a single thread.
{quote}
I understand what you mean about the runtime. Intuitively, using two single 
thread executors would perform better than one fiexd threads executor. But I 
had tested update blocks map and cache name by two single thread executor and 
removed the lock yesterday after reply,  with the same fsimage, the time cost 
would increase to 430s with about 10s+ time to wait two executors shutdown.  
So, I'm not sure that using two single thread executors would perfrom better.
 For more info, our fsimage had few snapshot, loading fsimage finished as soon 
as loadINodeDirectorySection finished. In other word, delay to shutdown the 
executors wouldn't work better.

 


was (Author: smarthan):
Hi [~sodonnell]，sorry for missing some messages.
{quote}I mis-understood how it worked, as I thought `awaitTermination(...)` 
threw an exception after the timeout, which is not the case.
{quote}
I guess you got a misunderstanding about `awaitTermination(...)` . It just like 
object.wait(long time), wolud just stop blocking rather than throw a 
InterruptedException.
{code:java}
/**
 * Blocks until all tasks have completed execution after a shutdown
 * request, or the timeout occurs, or the current thread is
 * interrupted, whichever happens first.
 *
 * @param timeout the maximum time to wait
 * @param unit the time unit of the timeout argument
 * @return {@code true} if this executor terminated and
 * {@code false} if the timeout elapsed before termination
 * @throws InterruptedException if interrupted while waiting
 */
boolean awaitTermination(long timeout, TimeUnit unit)
throws InterruptedException;
{code}
So, awaitTermination 1 ms would make executor shutdown quickly. 
{quote}Did you find the runtime was about the same with a single executor with 
4 threads and two executors with a single thread?As my testing showed a small 
improvement with the two single threaded executors case, and as locking 
prevents more than one thread to run concurrently, I think it would be better 
to go with the two executors with a single thread.
{quote}
 
 
I understand what you mean about the runtime. Intuitively, using two single 
thread executors would perform better than one fiexd threads executor. But I 
had tested update blocks map and cache name by two single thread executor and 
removed the lock yesterday after reply,  with the same fsimage, the time cost 
would increase to 430s with about 10s+ time to wait two executors shutdown.  
So, I'm not sure that using two single thread executors would perfrom better.
For more info, our fsimage had few snapshot, loading fsimage finished as soon 
as loadINodeDirectorySection finished. In other word, delay to shutdown the 
executors wouldn't work better.

 

> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: HDFS-15493.001.patch, fsimage-loading.log
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory.

[jira] [Commented] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.

2020-07-29 Thread Chengwei Wang (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167604#comment-17167604
 ] 

Chengwei Wang commented on HDFS-15493:
--

Hi [~sodonnell]，sorry for missing some messages.
{quote}I mis-understood how it worked, as I thought `awaitTermination(...)` 
threw an exception after the timeout, which is not the case.
{quote}
I guess you got a misunderstanding about `awaitTermination(...)` . It just like 
object.wait(long time), wolud just stop blocking rather than throw a 
InterruptedException.
{code:java}
/**
 * Blocks until all tasks have completed execution after a shutdown
 * request, or the timeout occurs, or the current thread is
 * interrupted, whichever happens first.
 *
 * @param timeout the maximum time to wait
 * @param unit the time unit of the timeout argument
 * @return {@code true} if this executor terminated and
 * {@code false} if the timeout elapsed before termination
 * @throws InterruptedException if interrupted while waiting
 */
boolean awaitTermination(long timeout, TimeUnit unit)
throws InterruptedException;
{code}
So, awaitTermination 1 ms would make executor shutdown quickly. 
{quote}Did you find the runtime was about the same with a single executor with 
4 threads and two executors with a single thread?As my testing showed a small 
improvement with the two single threaded executors case, and as locking 
prevents more than one thread to run concurrently, I think it would be better 
to go with the two executors with a single thread.
{quote}
 
 
I understand what you mean about the runtime. Intuitively, using two single 
thread executors would perform better than one fiexd threads executor. But I 
had tested update blocks map and cache name by two single thread executor and 
removed the lock yesterday after reply,  with the same fsimage, the time cost 
would increase to 430s with about 10s+ time to wait two executors shutdown.  
So, I'm not sure that using two single thread executors would perfrom better.
For more info, our fsimage had few snapshot, loading fsimage finished as soon 
as loadINodeDirectorySection finished. In other word, delay to shutdown the 
executors wouldn't work better.

 

> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: HDFS-15493.001.patch, fsimage-loading.log
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduc to 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15499) Exclude aws-java-sdk-bundle from httpfs pom.xml

2020-07-29 Thread Mingliang Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-15499:
-
Summary: Exclude aws-java-sdk-bundle from httpfs pom.xml  (was: Exclude 
aws-java-sdk-bundle from httpfs)

> Exclude aws-java-sdk-bundle from httpfs pom.xml
> ---
>
> Key: HDFS-15499
> URL: https://issues.apache.org/jira/browse/HDFS-15499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Reporter: Mingliang Liu
>Priority: Major
>
> In [HADOOP-14040] we use shaded aws-sdk uber-JAR for instead of s3 jar in 
> hadoop-project/pom.xml. After that, we should also update httpfs `pom.xml` 
> file to exclude the correct jar dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15499) Exclude aws-java-sdk-bundle from httpfs

2020-07-29 Thread Mingliang Liu (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mingliang Liu updated HDFS-15499:
-
Description: In [HADOOP-14040] we use shaded aws-sdk uber-JAR for instead 
of s3 jar in hadoop-project/pom.xml. After that, we should also update httpfs 
`pom.xml` file to exclude the correct jar dependency.  (was: In 
[[HADOOP-14040]] we use shaded aws-sdk uber-JAR for instead of s3 jar in 
hadoop-project/pom.xml. After that, we should update httpfs `pom.xml` to 
exclude the correct jar dependency.)

> Exclude aws-java-sdk-bundle from httpfs
> ---
>
> Key: HDFS-15499
> URL: https://issues.apache.org/jira/browse/HDFS-15499
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: httpfs
>Reporter: Mingliang Liu
>Priority: Major
>
> In [HADOOP-14040] we use shaded aws-sdk uber-JAR for instead of s3 jar in 
> hadoop-project/pom.xml. After that, we should also update httpfs `pom.xml` 
> file to exclude the correct jar dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-15499) Exclude aws-java-sdk-bundle from httpfs

2020-07-29 Thread Mingliang Liu (Jira)

Mingliang Liu created HDFS-15499:


 Summary: Exclude aws-java-sdk-bundle from httpfs
 Key: HDFS-15499
 URL: https://issues.apache.org/jira/browse/HDFS-15499
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: httpfs
Reporter: Mingliang Liu


In [[HADOOP-14040]] we use shaded aws-sdk uber-JAR for instead of s3 jar in 
hadoop-project/pom.xml. After that, we should update httpfs `pom.xml` to 
exclude the correct jar dependency.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15492) Make trash root inside each snapshottable directory



 [ 
https://issues.apache.org/jira/browse/HDFS-15492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDFS-15492:
--
Description: 
We have seen FSImage corruption cases (e.g. HDFS-13101) where files inside one 
snapshottable directories are moved outside of it. The most common case of this 
is when trash is enabled and user deletes some file via the command line 
without skipTrash.

This jira aims to make a trash root for each snapshottable directory, same as 
how encryption zone behaves at the moment.

This will make trash cleanup a little bit more expensive on the NameNode as it 
will be to iterate all trash roots. But should be fine as long as there aren't 
many snapshottable directories.

I could make this improvement as an option and disable it by default if needed, 
such as {{dfs.namenode.snapshot.trashroot.enabled}}

One small caveat though, when disabling (disallowing) snapshot on the 
snapshottable directory when this improvement is in place. The client should 
merge the snapshottable directory's trash with that user's trash to ensure 
proper trash cleanup.

  was:
We have seen FSImage corruption cases (e.g. HDFS-13101) where files inside one 
snapshottable directories are moved outside of it. The most common case of this 
is when trash is enabled and user deletes some file via the command line 
without skipTrash.

This jira aims to make a trash root for each snapshottable directory, same as 
how encryption zone behaves at the moment.

This will make trash cleanup a little bit more expensive on the NameNode as it 
will be to iterate all trash roots. But should be fine as long as there aren't 
many snapshottable directories.

I could make this improvement as an option and disable it by default if needed, 
such as {{dfs.namenode.snapshot.trashroot.enable}}

One small caveat though, when disabling (disallowing) snapshot on the 
snapshottable directory when this improvement is in place. The client should 
merge the snapshottable directory's trash with that user's trash to ensure 
proper trash cleanup.


> Make trash root inside each snapshottable directory
> ---
>
> Key: HDFS-15492
> URL: https://issues.apache.org/jira/browse/HDFS-15492
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs, hdfs-client
>Affects Versions: 3.2.1
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>
> We have seen FSImage corruption cases (e.g. HDFS-13101) where files inside 
> one snapshottable directories are moved outside of it. The most common case 
> of this is when trash is enabled and user deletes some file via the command 
> line without skipTrash.
> This jira aims to make a trash root for each snapshottable directory, same as 
> how encryption zone behaves at the moment.
> This will make trash cleanup a little bit more expensive on the NameNode as 
> it will be to iterate all trash roots. But should be fine as long as there 
> aren't many snapshottable directories.
> I could make this improvement as an option and disable it by default if 
> needed, such as {{dfs.namenode.snapshot.trashroot.enabled}}
> One small caveat though, when disabling (disallowing) snapshot on the 
> snapshottable directory when this improvement is in place. The client should 
> merge the snapshottable directory's trash with that user's trash to ensure 
> proper trash cleanup.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15495) Decommissioning a DataNode with corrupted EC files should not be blocked indefinitely



[ 
https://issues.apache.org/jira/browse/HDFS-15495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167526#comment-17167526
 ] 

Siyao Meng edited comment on HDFS-15495 at 7/29/20, 10:20 PM:
--

Awesome! Thanks [~sodonnell] for the writeup and unit test.
I tried your UT and it works - I mean it failed because the decom is hung. So 
we can confirm this is still the case on trunk.


was (Author: smeng):
Awesome! Thanks [~sodonnell] for the writeup and unit test.
I tried your UT and it works - I mean it failed because the decom is hang. So 
we can confirm this is still the case on trunk.

> Decommissioning a DataNode with corrupted EC files should not be blocked 
> indefinitely
> -
>
> Key: HDFS-15495
> URL: https://issues.apache.org/jira/browse/HDFS-15495
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement, ec
>Affects Versions: 3.0.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>
> Originally discovered in patched CDH 6.2.1 (with a bunch of EC fixes: 
> HDFS-14699, HDFS-14849, HDFS-14847, HDFS-14920, HDFS-14768, HDFS-14946, 
> HDFS-15186).
> When there's an EC file marked as corrupted on NN, if the admin tries to 
> decommission a DataNode having one of the remaining blocks of the corrupted 
> EC file, *the decom will never finish* unless the file is recovered by 
> putting the missing blocks back in:
> {code:title=The endless DatanodeAdminManager check loop, every 30s}
> 2020-07-23 16:36:12,805 TRACE blockmanagement.DatanodeAdminManager: Processed 
> 0 blocks so far this tick
> 2020-07-23 16:36:12,806 DEBUG blockmanagement.DatanodeAdminManager: 
> Processing Decommission In Progress node 127.0.1.7:5007
> 2020-07-23 16:36:12,806 TRACE blockmanagement.DatanodeAdminManager: Block 
> blk_-9223372036854775728_1013 numExpected=9, numLive=4
> 2020-07-23 16:36:12,806 INFO BlockStateChange: Block: 
> blk_-9223372036854775728_1013, Expected Replicas: 9, live replicas: 4, 
> corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, 
> maintenance replicas: 0, live entering maintenance replicas: 0, excess 
> replicas: 0, Is Open File: false, Datanodes having this block: 
> 127.0.1.12:5012 127.0.1.10:5010 127.0.1.8:5008 127.0.1.11:5011 127.0.1.7:5007 
> , Current Datanode: 127.0.1.7:5007, Is current datanode decommissioning: 
> true, Is current datanode entering maintenance: false
> 2020-07-23 16:36:12,806 DEBUG blockmanagement.DatanodeAdminManager: Node 
> 127.0.1.7:5007 still has 1 blocks to replicate before it is a candidate to 
> finish Decommission In Progress.
> 2020-07-23 16:36:12,806 INFO blockmanagement.DatanodeAdminManager: Checked 1 
> blocks and 1 nodes this tick
> {code}
> "Corrupted" file here meaning the EC file doesn't have enough EC blocks in 
> the block group to be reconstructed. e.g. for {{RS-6-3-1024k}}, when there 
> are less than 6 blocks for an EC file, the file can no longer be retrieved 
> correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15495) Decommissioning a DataNode with corrupted EC files should not be blocked indefinitely



 [ 
https://issues.apache.org/jira/browse/HDFS-15495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siyao Meng updated HDFS-15495:
--
Description: 
Originally discovered in patched CDH 6.2.1 (with a bunch of EC fixes: 
HDFS-14699, HDFS-14849, HDFS-14847, HDFS-14920, HDFS-14768, HDFS-14946, 
HDFS-15186).

When there's an EC file marked as corrupted on NN, if the admin tries to 
decommission a DataNode having one of the remaining blocks of the corrupted EC 
file, *the decom will never finish* unless the file is recovered by putting the 
missing blocks back in:

{code:title=The endless DatanodeAdminManager check loop, every 30s}
2020-07-23 16:36:12,805 TRACE blockmanagement.DatanodeAdminManager: Processed 0 
blocks so far this tick
2020-07-23 16:36:12,806 DEBUG blockmanagement.DatanodeAdminManager: Processing 
Decommission In Progress node 127.0.1.7:5007
2020-07-23 16:36:12,806 TRACE blockmanagement.DatanodeAdminManager: Block 
blk_-9223372036854775728_1013 numExpected=9, numLive=4
2020-07-23 16:36:12,806 INFO BlockStateChange: Block: 
blk_-9223372036854775728_1013, Expected Replicas: 9, live replicas: 4, corrupt 
replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, 
maintenance replicas: 0, live entering maintenance replicas: 0, excess 
replicas: 0, Is Open File: false, Datanodes having this block: 127.0.1.12:5012 
127.0.1.10:5010 127.0.1.8:5008 127.0.1.11:5011 127.0.1.7:5007 , Current 
Datanode: 127.0.1.7:5007, Is current datanode decommissioning: true, Is current 
datanode entering maintenance: false
2020-07-23 16:36:12,806 DEBUG blockmanagement.DatanodeAdminManager: Node 
127.0.1.7:5007 still has 1 blocks to replicate before it is a candidate to 
finish Decommission In Progress.
2020-07-23 16:36:12,806 INFO blockmanagement.DatanodeAdminManager: Checked 1 
blocks and 1 nodes this tick
{code}

"Corrupted" file here meaning the EC file doesn't have enough EC blocks in the 
block group to be reconstructed. e.g. for {{RS-6-3-1024k}}, when there are less 
than 6 blocks for an EC file, the file can no longer be retrieved correctly.

  was:
Originally discovered in patched CDH 6.2.1 (with a bunch of EC fixes: 
HDFS-14699, HDFS-14849, HDFS-14847, HDFS-14920, HDFS-14768, HDFS-14946, 
HDFS-15186).

When there's an EC file marked as corrupted on NN, if the admin tries to 
decommission a DataNode having one of the remaining blocks of the corrupted EC 
file, *the decom will never finish* unless the file is recovered by putting the 
missing blocks back in:

{code:title=The endless DatanodeAdminManager check loop, every 30s}
2020-07-23 16:36:12,805 TRACE blockmanagement.DatanodeAdminManager: Processed 0 
blocks so far this tick
2020-07-23 16:36:12,806 DEBUG blockmanagement.DatanodeAdminManager: Processing 
Decommission In Progress node 127.0.1.7:5007
2020-07-23 16:36:12,806 TRACE blockmanagement.DatanodeAdminManager: Block 
blk_-9223372036854775728_1013 numExpected=9, numLive=4
2020-07-23 16:36:12,806 INFO BlockStateChange: Block: 
blk_-9223372036854775728_1013, Expected Replicas: 9, live replicas: 4, corrupt 
replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, 
maintenance replicas: 0, live entering maintenance replicas: 0, excess 
replicas: 0, Is Open File: false, Datanodes having this block: 127.0.1.12:5012 
127.0.1.10:5010 127.0.1.8:5008 127.0.1.11:5011 127.0.1.7:5007 , Current 
Datanode: 127.0.1.7:5007, Is current datanode decommissioning: true, Is current 
datanode entering maintenance: false
2020-07-23 16:36:12,806 DEBUG blockmanagement.DatanodeAdminManager: Node 
127.0.1.7:5007 still has 1 blocks to replicate before it is a candidate to 
finish Decommission In Progress.
2020-07-23 16:36:12,806 INFO blockmanagement.DatanodeAdminManager: Checked 1 
blocks and 1 nodes this tick
{code}

"Corrupted" file here meaning the EC file doesn't have enough EC blocks in the 
block group to be reconstructed. e.g. for {{RS-6-3-1024k}}, when there are less 
than 6 blocks for an EC file, the file can no longer be retrieved correctly.

Will check on trunk as well soon.


> Decommissioning a DataNode with corrupted EC files should not be blocked 
> indefinitely
> -
>
> Key: HDFS-15495
> URL: https://issues.apache.org/jira/browse/HDFS-15495
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement, ec
>Affects Versions: 3.0.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>
> Originally discovered in patched CDH 6.2.1 (with a bunch of EC fixes: 
> HDFS-14699, HDFS-14849, HDFS-14847, HDFS-14920, HDFS-14768, HDFS-14946, 
> HDFS-15186).
> When there's an EC file marked as corrupted on NN, if the admin tries to 
> decommission a DataNode having one of the remaining blocks of the corrupted 
> EC file, *the decom will never

[jira] [Commented] (HDFS-15495) Decommissioning a DataNode with corrupted EC files should not be blocked indefinitely



[ 
https://issues.apache.org/jira/browse/HDFS-15495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167526#comment-17167526
 ] 

Siyao Meng commented on HDFS-15495:
---

Awesome! Thanks [~sodonnell] for the writeup and unit test.
I tried your UT and it works - I mean it failed because the decom is hang. So 
we can confirm this is still the case on trunk.

> Decommissioning a DataNode with corrupted EC files should not be blocked 
> indefinitely
> -
>
> Key: HDFS-15495
> URL: https://issues.apache.org/jira/browse/HDFS-15495
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement, ec
>Affects Versions: 3.0.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>
> Originally discovered in patched CDH 6.2.1 (with a bunch of EC fixes: 
> HDFS-14699, HDFS-14849, HDFS-14847, HDFS-14920, HDFS-14768, HDFS-14946, 
> HDFS-15186).
> When there's an EC file marked as corrupted on NN, if the admin tries to 
> decommission a DataNode having one of the remaining blocks of the corrupted 
> EC file, *the decom will never finish* unless the file is recovered by 
> putting the missing blocks back in:
> {code:title=The endless DatanodeAdminManager check loop, every 30s}
> 2020-07-23 16:36:12,805 TRACE blockmanagement.DatanodeAdminManager: Processed 
> 0 blocks so far this tick
> 2020-07-23 16:36:12,806 DEBUG blockmanagement.DatanodeAdminManager: 
> Processing Decommission In Progress node 127.0.1.7:5007
> 2020-07-23 16:36:12,806 TRACE blockmanagement.DatanodeAdminManager: Block 
> blk_-9223372036854775728_1013 numExpected=9, numLive=4
> 2020-07-23 16:36:12,806 INFO BlockStateChange: Block: 
> blk_-9223372036854775728_1013, Expected Replicas: 9, live replicas: 4, 
> corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, 
> maintenance replicas: 0, live entering maintenance replicas: 0, excess 
> replicas: 0, Is Open File: false, Datanodes having this block: 
> 127.0.1.12:5012 127.0.1.10:5010 127.0.1.8:5008 127.0.1.11:5011 127.0.1.7:5007 
> , Current Datanode: 127.0.1.7:5007, Is current datanode decommissioning: 
> true, Is current datanode entering maintenance: false
> 2020-07-23 16:36:12,806 DEBUG blockmanagement.DatanodeAdminManager: Node 
> 127.0.1.7:5007 still has 1 blocks to replicate before it is a candidate to 
> finish Decommission In Progress.
> 2020-07-23 16:36:12,806 INFO blockmanagement.DatanodeAdminManager: Checked 1 
> blocks and 1 nodes this tick
> {code}
> "Corrupted" file here meaning the EC file doesn't have enough EC blocks in 
> the block group to be reconstructed. e.g. for {{RS-6-3-1024k}}, when there 
> are less than 6 blocks for an EC file, the file can no longer be retrieved 
> correctly.
> Will check on trunk as well soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15329) Provide FileContext based ViewFSOverloadScheme implementation

2020-07-29 Thread Uma Maheswara Rao G (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167508#comment-17167508
 ] 

Uma Maheswara Rao G commented on HDFS-15329:


HI [~abhishekd], sorry for late reply. I missed ur comment. Please take it up 
if you wanted to work on. I can help you on reviews!
Thanks a lot, for offering help. I was spending some time in integrating 
ViewFileSystemOverloadScheme with Hive.
So far, I figured out issues that: with encryption zones enables, we/Hive may 
need some handlings as they are creating HDFS specific shims.
Looks like ORC also using fileIds specific APIs from HDFS. Except that basic 
queries passed. How about in your case integration stuff? any components 
enabled with ViewFileSystemOverloadScheme? Just curious to know :-)


> Provide FileContext based ViewFSOverloadScheme implementation
> -
>
> Key: HDFS-15329
> URL: https://issues.apache.org/jira/browse/HDFS-15329
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, hdfs, viewfs, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>
> This Jira to track for FileContext based ViewFSOverloadScheme implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15444) mkdir should not create dir in fallback if the dir already in mount Path

2020-07-29 Thread Uma Maheswara Rao G (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167506#comment-17167506
 ] 

Uma Maheswara Rao G commented on HDFS-15444:


[~jianghuazhu], thanks you for the question.
It's already covered in mkdirs of ViewFileSystem specific [implementation | 
https://github.com/apache/hadoop/blob/5d8600e80ad7864b332b60d5a01585fdf00848ee/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFileSystem.java#L1429].
 It's should be an issue with mkdir of ViewFs.java.

The scenario is:
  mount link /a/b/c --> hdfs://nn1/a/b/c
  fallback --> hdfs://nn1/

 now if you try to create hdfs://nn1/a/b, it might end up creating a dir in 
hdfs://nn1/a/b, It should just return saying dir already exist in mount.
because it will check the parent dir first, that is hdfs://nn1/a/. Since /a is 
an internal dir, it will go to InternalViewFS#mkdir with path as /b to create.
Here it should check if internalViewFS /a existing children matching with the 
path /b. Yes, there is ia path already in mount link /a/b, so it should not 
create this path in fallback. We did not have that children check in 
[ViewFs.java|https://github.com/apache/hadoop/blob/5d8600e80ad7864b332b60d5a01585fdf00848ee/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/viewfs/ViewFs.java#L1185]
 



> mkdir should not create dir in fallback if the dir already in mount Path
> 
>
> Key: HDFS-15444
> URL: https://issues.apache.org/jira/browse/HDFS-15444
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15498) Add an option in snapList cmd to show snapshots which are marked deleted



 [ 
https://issues.apache.org/jira/browse/HDFS-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15498:
---
Description: HDFS-15488 adds a cmd to list all snapshots for a given 
snapshottable directory. A snapshot can be just marked as deleted with ordered 
deletion config set. This Jira aims to add an option to show the deletion 
status.

> Add an option in snapList cmd to show snapshots which are marked deleted
> 
>
> Key: HDFS-15498
> URL: https://issues.apache.org/jira/browse/HDFS-15498
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
>
> HDFS-15488 adds a cmd to list all snapshots for a given snapshottable 
> directory. A snapshot can be just marked as deleted with ordered deletion 
> config set. This Jira aims to add an option to show the deletion status.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-15498) Add an option in snapList cmd to show snapshots which are marked deleted

Shashikant Banerjee created HDFS-15498:
--

 Summary: Add an option in snapList cmd to show snapshots which are 
marked deleted
 Key: HDFS-15498
 URL: https://issues.apache.org/jira/browse/HDFS-15498
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: snapshots
Reporter: Shashikant Banerjee
Assignee: Shashikant Banerjee
 Fix For: 3.4.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15014) RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport

2020-07-29 Thread Chao Sun (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167374#comment-17167374
 ] 

Chao Sun commented on HDFS-15014:
-

Thanks [~fengnanli]. Closing this as duplicate.

> RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport 
> -
>
> Key: HDFS-15014
> URL: https://issues.apache.org/jira/browse/HDFS-15014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chao Sun
>Priority: Major
>
> Currently the {{chooseDatanode}} call (which is shared by {{open}}, 
> {{create}}, {{append}} and {{getFileChecksum}}) in RBF WebHDFS calls 
> {{getDatanodeReport}} from ALL downstream namenodes:
> {code}
>   private DatanodeInfo chooseDatanode(final Router router,
>   final String path, final HttpOpParam.Op op, final long openOffset,
>   final String excludeDatanodes) throws IOException {
> // We need to get the DNs as a privileged user
> final RouterRpcServer rpcServer = getRPCServer(router);
> UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
> RouterRpcServer.setCurrentUser(loginUser);
> DatanodeInfo[] dns = null;
> try {
>   dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE);
> } catch (IOException e) {
>   LOG.error("Cannot get the datanodes from the RPC server", e);
> } finally {
>   // Reset ugi to remote user for remaining operations.
>   RouterRpcServer.resetCurrentUser();
> }
> HashSet excludes = new HashSet();
> if (excludeDatanodes != null) {
>   Collection collection =
>   getTrimmedStringCollection(excludeDatanodes);
>   for (DatanodeInfo dn : dns) {
> if (collection.contains(dn.getName())) {
>   excludes.add(dn);
> }
>   }
> }
> ...
> {code}
> The {{getDatanodeReport}} is very expensive (particularly in a large cluster) 
> as it need to lock the {{DatanodeManager}} which is also shared by calls such 
> as processing heartbeats. Check HDFS-14366 for a similar issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-15014) RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport

2020-07-29 Thread Chao Sun (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-15014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HDFS-15014.
-
Resolution: Duplicate

> RBF: WebHdfs chooseDatanode shouldn't call getDatanodeReport 
> -
>
> Key: HDFS-15014
> URL: https://issues.apache.org/jira/browse/HDFS-15014
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Chao Sun
>Priority: Major
>
> Currently the {{chooseDatanode}} call (which is shared by {{open}}, 
> {{create}}, {{append}} and {{getFileChecksum}}) in RBF WebHDFS calls 
> {{getDatanodeReport}} from ALL downstream namenodes:
> {code}
>   private DatanodeInfo chooseDatanode(final Router router,
>   final String path, final HttpOpParam.Op op, final long openOffset,
>   final String excludeDatanodes) throws IOException {
> // We need to get the DNs as a privileged user
> final RouterRpcServer rpcServer = getRPCServer(router);
> UserGroupInformation loginUser = UserGroupInformation.getLoginUser();
> RouterRpcServer.setCurrentUser(loginUser);
> DatanodeInfo[] dns = null;
> try {
>   dns = rpcServer.getDatanodeReport(DatanodeReportType.LIVE);
> } catch (IOException e) {
>   LOG.error("Cannot get the datanodes from the RPC server", e);
> } finally {
>   // Reset ugi to remote user for remaining operations.
>   RouterRpcServer.resetCurrentUser();
> }
> HashSet excludes = new HashSet();
> if (excludeDatanodes != null) {
>   Collection collection =
>   getTrimmedStringCollection(excludeDatanodes);
>   for (DatanodeInfo dn : dns) {
> if (collection.contains(dn.getName())) {
>   excludes.add(dn);
> }
>   }
> }
> ...
> {code}
> The {{getDatanodeReport}} is very expensive (particularly in a large cluster) 
> as it need to lock the {{DatanodeManager}} which is also shared by calls such 
> as processing heartbeats. Check HDFS-14366 for a similar issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15488) Add a command to list all snapshots for a snaphottable root with snapshot Ids

2020-07-29 Thread Hudson (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167343#comment-17167343
 ] 

Hudson commented on HDFS-15488:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18478 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18478/])
HDFS-15488. Add a command to list all snapshots for a snaphottable root 
(github: rev 68287371ccc66da80e6a3d7981ae6c7ce7238920)
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs.cmd
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HDFSCommands.md
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/ClientNamenodeProtocol.proto
* (add) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/snapshot/LsSnapshot.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/site/markdown/HdfsSnapshots.md
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelperClient.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/tools/AdminHelper.java
* (add) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/SnapshotStatus.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRpc.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/snapshot/SnapshotManager.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/dev-support/findbugsExcludeFile.xml
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java
* (add) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestListSnapshot.java
* (edit) hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolTranslatorPB.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterSnapshot.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/metrics/NameNodeMetrics.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirSnapshotOp.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/ClientNamenodeProtocolServerSideTranslatorPB.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSOpsCountStatistics.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-client/src/test/java/org/apache/hadoop/hdfs/protocol/TestReadOnly.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java
* (edit) hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/hdfs.proto


> Add a command to list all snapshots for a snaphottable root with snapshot Ids
> -
>
> Key: HDFS-15488
> URL: https://issues.apache.org/jira/browse/HDFS-15488
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15488.000.patch
>
>
> Currently, the way to list snapshots is do a ls on  
> /.snapshot directory. Since creation time is not 
> recorded , there is no way to actually figure out the chronological order of 
> snapshots. The idea here is to add a command to list snapshots for a 
> snapshottable directory along with snapshot Ids which grow monotonically as 
> snapshots are created in the system. With snapID, it will be helpful to 
> figure out the chronology of snapshots in the system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15495) Decommissioning a DataNode with corrupted EC files should not be blocked indefinitely



[ 
https://issues.apache.org/jira/browse/HDFS-15495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167341#comment-17167341
 ] 

Stephen O'Donnell commented on HDFS-15495:
--

Here is a unit test which can be added to TestDecommissionWithStriped to 
reproduce the issue - the test will fail as it times out:


{code:java}
@Test(timeout = 12)
public void testDecomCorruptFile() throws Exception {
  int fileLen = 10*1024*1024;

  Path file = new Path(ecDir,"testcorruptfile");

  writeStripedFile(dfs, file, fileLen);

  LocatedBlocks locatedBlocks =
  StripedFileTestUtil.getLocatedBlocks(file, dfs);

  LocatedStripedBlock lastBlock =
  (LocatedStripedBlock)locatedBlocks.getLastLocatedBlock();
  DatanodeInfo[] storageInfos = lastBlock.getLocations();

  // Kill PARITY_BLOCKS + 1 datanodes to corrupt the EC file
  ArrayList dns = cluster.getDataNodes();
  for (int i=0; i decom = new ArrayList<>();
  decom.add(storageInfos[parityBlocks+1]);
  decommissionNode(0, decom, AdminStates.DECOMMISSIONED);
} {code}
 

> Decommissioning a DataNode with corrupted EC files should not be blocked 
> indefinitely
> -
>
> Key: HDFS-15495
> URL: https://issues.apache.org/jira/browse/HDFS-15495
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement, ec
>Affects Versions: 3.0.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>
> Originally discovered in patched CDH 6.2.1 (with a bunch of EC fixes: 
> HDFS-14699, HDFS-14849, HDFS-14847, HDFS-14920, HDFS-14768, HDFS-14946, 
> HDFS-15186).
> When there's an EC file marked as corrupted on NN, if the admin tries to 
> decommission a DataNode having one of the remaining blocks of the corrupted 
> EC file, *the decom will never finish* unless the file is recovered by 
> putting the missing blocks back in:
> {code:title=The endless DatanodeAdminManager check loop, every 30s}
> 2020-07-23 16:36:12,805 TRACE blockmanagement.DatanodeAdminManager: Processed 
> 0 blocks so far this tick
> 2020-07-23 16:36:12,806 DEBUG blockmanagement.DatanodeAdminManager: 
> Processing Decommission In Progress node 127.0.1.7:5007
> 2020-07-23 16:36:12,806 TRACE blockmanagement.DatanodeAdminManager: Block 
> blk_-9223372036854775728_1013 numExpected=9, numLive=4
> 2020-07-23 16:36:12,806 INFO BlockStateChange: Block: 
> blk_-9223372036854775728_1013, Expected Replicas: 9, live replicas: 4, 
> corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, 
> maintenance replicas: 0, live entering maintenance replicas: 0, excess 
> replicas: 0, Is Open File: false, Datanodes having this block: 
> 127.0.1.12:5012 127.0.1.10:5010 127.0.1.8:5008 127.0.1.11:5011 127.0.1.7:5007 
> , Current Datanode: 127.0.1.7:5007, Is current datanode decommissioning: 
> true, Is current datanode entering maintenance: false
> 2020-07-23 16:36:12,806 DEBUG blockmanagement.DatanodeAdminManager: Node 
> 127.0.1.7:5007 still has 1 blocks to replicate before it is a candidate to 
> finish Decommission In Progress.
> 2020-07-23 16:36:12,806 INFO blockmanagement.DatanodeAdminManager: Checked 1 
> blocks and 1 nodes this tick
> {code}
> "Corrupted" file here meaning the EC file doesn't have enough EC blocks in 
> the block group to be reconstructed. e.g. for {{RS-6-3-1024k}}, when there 
> are less than 6 blocks for an EC file, the file can no longer be retrieved 
> correctly.
> Will check on trunk as well soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HDFS-15495) Decommissioning a DataNode with corrupted EC files should not be blocked indefinitely

[
https://issues.apache.org/jira/browse/HDFS-15495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167200#comment-17167200
]

Stephen O'Donnell edited comment on HDFS-15495 at 7/29/20, 4:12 PM:

There are a few things to think about here.

For non EC blocks:
* If a block is missing it will not block replication, as it will not be on
the DN in question and hence will not be checked.
* If a block is under-replicated already, then decomission should proceed OK,
provided the block can be made perfectly replicated.
* Decommission will block if there are not enough nodes on a cluster to make
the blocks perfectly replicated - eg decommission 1 node from a 3 node cluster.

For EC, a missing block is more complicated. Consider a 6-3 EC file.
* If there is already 1 to 3 blocks lost, the file is still readable. If you
decommission a host holding the block, I think it will first reconstruct the
missing 1 - 3 blocks, and then schedule a simple copy of the decommission block.
* If it >3 blocks are lost, then it will not be able to complete the first
step, and then will never get to the second step and it will likely hang (I
have not tested it out myself as yet).

-Looking at the code, I think the NN does not check if there are sufficient EC
block sources before it schedules the reconstruction work on a DN - it is left
to the DN to figure that part out and fail the task.-

-It looks like we might need to do something a bit smarter in ErasureCodingWork
to allow the block being decommissioned to be copied to a new DN even if EC
reconstruction cannot happen. Something would also need to change in the
Decommission logic to notice the file is corrupt and also handle the local
block, and not wait for the file to be healthy.-

I looked into this a bit more, and it will be tricky to fix I think. When the
EC file is corrupted, it goes into the LowRedundancyBlocks list, but in the
QUEUE_WITH_CORRUPT_BLOCKS.

Then when the decommission monitor checks the block, it sees it as "needing
replication", but it also sees it is already in neededReconstruction, therefore
it does not add it to the list of blocks the BlockManager needs to replicate.

The decommission monitor relies on
`BlockManager.computeBlockReconstructionWork` to take care of the
under-replication. It never considers corrupt blocks, as it knows it cannot
reconstruct them.

Therefore, we have an already corrupt EC file stuck in the
needingReplication#CORRUPT queue, and the decommission monitor which needs the
block to simply be copied from the DN it is currently on, but nothing will ever
do that.

was (Author: sodonnell):
There are a few things to think about here.

For non EC blocks:

* If a block is missing it will not block replication, as it will not be on
the DN in question and hence will not be checked.
* If a block is under-replicated already, then decomission should proceed OK,
provided the block can be made perfectly replicated.
* Decommission will block if there are not enough nodes on a cluster to make
the blocks perfectly replicated - eg decommission 1 node from a 3 node cluster.

For EC, a missing block is more complicated. Consider a 6-3 EC file.

* If there is already 1 to 3 blocks lost, the file is still readable. If you
decommission a host holding the block, I think it will first reconstruct the
missing 1 - 3 blocks, and then schedule a simple copy of the decommission block.
* If it >3 blocks are lost, then it will not be able to complete the first
step, and then will never get to the second step and it will likely hang (I
have not tested it out myself as yet).
-
Looking at the code, I think the NN does not check if there are sufficient EC
block sources before it schedules the reconstruction work on a DN - it is left
to the DN to figure that part out and fail the task.

It looks like we might need to do something a bit smarter in ErasureCodingWork
to allow the block being decommissioned to be copied to a new DN even if EC
reconstruction cannot happen. Something would also need to change in the
Decommission logic to notice the file is corrupt and also handle the local
block, and not wait for the file to be healthy.-

I looked into this a bit more, and it will be tricky to fix I think. When the
EC file is corrupted, it goes into the LowRedundancyBlocks list, but in the
QUEUE_WITH_CORRUPT_BLOCKS.

The decommission monitor relies on
`BlockManager.computeBlockReconstructionWork` to take care of the
under-replication. It never considers corrupt blocks, as it knows it cannot
reconstruct them.

Therefore, we have an already corrupt EC file stuck in the

[jira] [Comment Edited] (HDFS-15495) Decommissioning a DataNode with corrupted EC files should not be blocked indefinitely



[ 
https://issues.apache.org/jira/browse/HDFS-15495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167200#comment-17167200
 ] 

Stephen O'Donnell edited comment on HDFS-15495 at 7/29/20, 4:12 PM:


There are a few things to think about here.

For non EC blocks:

 * If a block is missing it will not block replication, as it will not be on 
the DN in question and hence will not be checked.
 * If a block is under-replicated already, then decomission should proceed OK, 
provided the block can be made perfectly replicated.
 * Decommission will block if there are not enough nodes on a cluster to make 
the blocks perfectly replicated - eg decommission 1 node from a 3 node cluster.

For EC, a missing block is more complicated. Consider a 6-3 EC file.

 * If there is already 1 to 3 blocks lost, the file is still readable. If you 
decommission a host holding the block, I think it will first reconstruct the 
missing 1 - 3 blocks, and then schedule a simple copy of the decommission block.
 * If it >3 blocks are lost, then it will not be able to complete the first 
step, and then will never get to the second step and it will likely hang (I 
have not tested it out myself as yet).
-
Looking at the code, I think the NN does not check if there are sufficient EC 
block sources before it schedules the reconstruction work on a DN - it is left 
to the DN to figure that part out and fail the task.

It looks like we might need to do something a bit smarter in ErasureCodingWork 
to allow the block being decommissioned to be copied to a new DN even if EC 
reconstruction cannot happen. Something would also need to change in the 
Decommission logic to notice the file is corrupt and also handle the local 
block, and not wait for the file to be healthy.-

I looked into this a bit more, and it will be tricky to fix I think. When the 
EC file is corrupted, it goes into the LowRedundancyBlocks list, but in the 
QUEUE_WITH_CORRUPT_BLOCKS. 

Then when the decommission monitor checks the block, it sees it as "needing 
replication", but it also sees it is already in neededReconstruction, therefore 
it does not add it to the list of blocks the BlockManager needs to replicate.

The decommission monitor relies on 
`BlockManager.computeBlockReconstructionWork` to take care of the 
under-replication. It never considers corrupt blocks, as it knows it cannot 
reconstruct them.

Therefore, we have an already corrupt EC file stuck in the 
needingReplication#CORRUPT queue, and the decommission monitor which needs the 
block to simply be copied from the DN it is currently on, but nothing will ever 
do that.


was (Author: sodonnell):
There are a few things to think about here.

For non EC blocks:

 * If a block is missing it will not block replication, as it will not be on 
the DN in question and hence will not be checked.
 * If a block is under-replicated already, then decomission should proceed OK, 
provided the block can be made perfectly replicated.
 * Decommission will block if there are not enough nodes on a cluster to make 
the blocks perfectly replicated - eg decommission 1 node from a 3 node cluster.

For EC, a missing block is more complicated. Consider a 6-3 EC file.

 * If there is already 1 to 3 blocks lost, the file is still readable. If you 
decommission a host holding the block, I think it will first reconstruct the 
missing 1 - 3 blocks, and then schedule a simple copy of the decommission block.
 * If it >3 blocks are lost, then it will not be able to complete the first 
step, and then will never get to the second step and it will likely hang (I 
have not tested it out myself as yet).

Looking at the code, I think the NN does not check if there are sufficient EC 
block sources before it schedules the reconstruction work on a DN - it is left 
to the DN to figure that part out and fail the task.

It looks like we might need to do something a bit smarter in ErasureCodingWork 
to allow the block being decommissioned to be copied to a new DN even if EC 
reconstruction cannot happen. Something would also need to change in the 
Decommission logic to notice the file is corrupt and also handle the local 
block, and not wait for the file to be healthy.

You could argue that decommission should not care about the health of the EC 
file - it should just ensure any blocks on the decommissioning hosts get copied 
elsewhere before decommission can complete.

> Decommissioning a DataNode with corrupted EC files should not be blocked 
> indefinitely
> -
>
> Key: HDFS-15495
> URL: https://issues.apache.org/jira/browse/HDFS-15495
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement, ec
>Affects Versions: 3.0.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>

[jira] [Updated] (HDFS-15488) Add a command to list all snapshots for a snaphottable root with snapshot Ids



 [ 
https://issues.apache.org/jira/browse/HDFS-15488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15488:
---
Fix Version/s: 3.4.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Add a command to list all snapshots for a snaphottable root with snapshot Ids
> -
>
> Key: HDFS-15488
> URL: https://issues.apache.org/jira/browse/HDFS-15488
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Fix For: 3.4.0
>
> Attachments: HDFS-15488.000.patch
>
>
> Currently, the way to list snapshots is do a ls on  
> /.snapshot directory. Since creation time is not 
> recorded , there is no way to actually figure out the chronological order of 
> snapshots. The idea here is to add a command to list snapshots for a 
> snapshottable directory along with snapshot Ids which grow monotonically as 
> snapshots are created in the system. With snapID, it will be helpful to 
> figure out the chronology of snapshots in the system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-13934) Multipart uploaders to be created through API call to FileSystem/FileContext, not service loader

2020-07-29 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reopened HDFS-13934:
-

> Multipart uploaders to be created through API call to FileSystem/FileContext, 
> not service loader
> 
>
> Key: HDFS-13934
> URL: https://issues.apache.org/jira/browse/HDFS-13934
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, fs/s3, hdfs
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.3.1
>
>
> the Multipart Uploaders are created via service loaders. This is troublesome
> # HADOOP-12636, HADOOP-13323, HADOOP-13625 highlight how the load process 
> forces the transient loading of dependencies.  If a dependent class cannot be 
> loaded (e.g aws-sdk is not on the classpath), that service won't load. 
> Without error handling round the load process, this stops any uploader from 
> loading. Even with that error handling, the performance hit of that load, 
> especially with reshaded dependencies, hurts performance (HADOOP-13138).
> # it makes wrapping the the load with any filter impossible, stops transitive 
> binding through viewFS, mocking, etc.
> # It complicates security in a kerberized world. If you have an FS instance 
> of user A, then you should be able to create an MPU instance with that user's 
> permissions. currently, if a service were to try to create one, you'd be 
> looking at doAs() games around the service loading, and a more complex bind 
> process.
> Proposed
> # remove the service loader mech entirely
> # add to FS & FC as createMultipartUploader(path) call, which will create one 
> bound to the current FS, with its permissions, DTs, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15484) Add option in enum Rename to suport batch rename

2020-07-29 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167208#comment-17167208
 ] 

Steve Loughran commented on HDFS-15484:
---


# make the new interface one in hadoop-common, so other stores can implement
# add a new path capability and filestores which implement the API to return 
true for a hasPathCapabilities(), to allow for transitive probes of support

> Add option in enum Rename to suport batch rename
> 
>
> Key: HDFS-15484
> URL: https://issues.apache.org/jira/browse/HDFS-15484
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsclient, namenode, performance
>Affects Versions: 3.3.0
>Reporter: Yang Yun
>Assignee: Yang Yun
>Priority: Minor
> Attachments: HDFS-15484.001.patch, HDFS-15484.new_method.patch
>
>
> Sometime we need rename many files after a task,  add a new option in enum 
> Rename to support batch rename, which only need one RPC and one lock. For 
> example,
> rename(new Path("/dir1/f1::/dir2/f2"), new Path("/dir3/f1::dir4/f4"), 
> Rename.BATCH)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15495) Decommissioning a DataNode with corrupted EC files should not be blocked indefinitely



[ 
https://issues.apache.org/jira/browse/HDFS-15495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167200#comment-17167200
 ] 

Stephen O'Donnell commented on HDFS-15495:
--

There are a few things to think about here.

For non EC blocks:

 * If a block is missing it will not block replication, as it will not be on 
the DN in question and hence will not be checked.
 * If a block is under-replicated already, then decomission should proceed OK, 
provided the block can be made perfectly replicated.
 * Decommission will block if there are not enough nodes on a cluster to make 
the blocks perfectly replicated - eg decommission 1 node from a 3 node cluster.

For EC, a missing block is more complicated. Consider a 6-3 EC file.

 * If there is already 1 to 3 blocks lost, the file is still readable. If you 
decommission a host holding the block, I think it will first reconstruct the 
missing 1 - 3 blocks, and then schedule a simple copy of the decommission block.
 * If it >3 blocks are lost, then it will not be able to complete the first 
step, and then will never get to the second step and it will likely hang (I 
have not tested it out myself as yet).

Looking at the code, I think the NN does not check if there are sufficient EC 
block sources before it schedules the reconstruction work on a DN - it is left 
to the DN to figure that part out and fail the task.

It looks like we might need to do something a bit smarter in ErasureCodingWork 
to allow the block being decommissioned to be copied to a new DN even if EC 
reconstruction cannot happen. Something would also need to change in the 
Decommission logic to notice the file is corrupt and also handle the local 
block, and not wait for the file to be healthy.

You could argue that decommission should not care about the health of the EC 
file - it should just ensure any blocks on the decommissioning hosts get copied 
elsewhere before decommission can complete.

> Decommissioning a DataNode with corrupted EC files should not be blocked 
> indefinitely
> -
>
> Key: HDFS-15495
> URL: https://issues.apache.org/jira/browse/HDFS-15495
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement, ec
>Affects Versions: 3.0.0
>Reporter: Siyao Meng
>Assignee: Siyao Meng
>Priority: Major
>
> Originally discovered in patched CDH 6.2.1 (with a bunch of EC fixes: 
> HDFS-14699, HDFS-14849, HDFS-14847, HDFS-14920, HDFS-14768, HDFS-14946, 
> HDFS-15186).
> When there's an EC file marked as corrupted on NN, if the admin tries to 
> decommission a DataNode having one of the remaining blocks of the corrupted 
> EC file, *the decom will never finish* unless the file is recovered by 
> putting the missing blocks back in:
> {code:title=The endless DatanodeAdminManager check loop, every 30s}
> 2020-07-23 16:36:12,805 TRACE blockmanagement.DatanodeAdminManager: Processed 
> 0 blocks so far this tick
> 2020-07-23 16:36:12,806 DEBUG blockmanagement.DatanodeAdminManager: 
> Processing Decommission In Progress node 127.0.1.7:5007
> 2020-07-23 16:36:12,806 TRACE blockmanagement.DatanodeAdminManager: Block 
> blk_-9223372036854775728_1013 numExpected=9, numLive=4
> 2020-07-23 16:36:12,806 INFO BlockStateChange: Block: 
> blk_-9223372036854775728_1013, Expected Replicas: 9, live replicas: 4, 
> corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, 
> maintenance replicas: 0, live entering maintenance replicas: 0, excess 
> replicas: 0, Is Open File: false, Datanodes having this block: 
> 127.0.1.12:5012 127.0.1.10:5010 127.0.1.8:5008 127.0.1.11:5011 127.0.1.7:5007 
> , Current Datanode: 127.0.1.7:5007, Is current datanode decommissioning: 
> true, Is current datanode entering maintenance: false
> 2020-07-23 16:36:12,806 DEBUG blockmanagement.DatanodeAdminManager: Node 
> 127.0.1.7:5007 still has 1 blocks to replicate before it is a candidate to 
> finish Decommission In Progress.
> 2020-07-23 16:36:12,806 INFO blockmanagement.DatanodeAdminManager: Checked 1 
> blocks and 1 nodes this tick
> {code}
> "Corrupted" file here meaning the EC file doesn't have enough EC blocks in 
> the block group to be reconstructed. e.g. for {{RS-6-3-1024k}}, when there 
> are less than 6 blocks for an EC file, the file can no longer be retrieved 
> correctly.
> Will check on trunk as well soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15493) Update block map and name cache in parallel while loading fsimage.



[ 
https://issues.apache.org/jira/browse/HDFS-15493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167143#comment-17167143
 ] 

Stephen O'Donnell commented on HDFS-15493:
--

{quote}
I had tested loading the caches and blocks by two single thread executors, same 
to your test result, there would be a long time to wait the executors 
terminated, so the time cost was not better than the one executor with four 
threads.
{quote}

Did you find the runtime was about the same with a single executor with 4 
threads and two executors with a single thread? As my testing showed a small 
improvement with the two single threaded executors case, and as locking 
prevents more than one thread to run concurrently, I think it would be better 
to go with the two executors with a single thread. This think the time required 
for the executors to shutdown should be about the same in both cases.

I also made an earlier comment on this code:

{code}
  if (blocksMapUpdateExecutor != null) {
blocksMapUpdateExecutor.shutdown();
Try {
  while (!blocksMapUpdateExecutor.isTerminated()) {
blocksMapUpdateExecutor.awaitTermination(1, TimeUnit.MILLISECONDS);
  }
} catch (InterruptedException e) {
  LOG.error("Interrupted waiting for blocksMap update threads.", e);
  throw new IOException(e);
}
  }
{code}

I mis-understood how it worked, as I thought `awaitTermination(...)` threw an 
exception after the timeout, which is not the case. However, I think it makes 
sense to wait 500 or 1000ms rather than 1ms, and log a message indicating the 
executor is not yet shutdown. Or, we could time how long it takes to shutdown 
and log a message after the shutdown completes. That means we will get some 
visibility into how long the executors take to catch up.

Also, for info, I ran my tests on trunk and the image also had some snapshots 
which will have extended the load time.

> Update block map and name cache in parallel while loading fsimage.
> --
>
> Key: HDFS-15493
> URL: https://issues.apache.org/jira/browse/HDFS-15493
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Chengwei Wang
>Priority: Major
> Attachments: HDFS-15493.001.patch, fsimage-loading.log
>
>
> While loading INodeDirectorySection of fsimage, it will update name cache and 
> block map after added inode file to inode directory. It would reduce time 
> cost of fsimage loading to enable these steps run in parallel.
> In our test case, with patch HDFS-13694 and HDFS-14617, the time cost to load 
> fsimage (220M files & 240M blocks) is 470s, with this patch , the time cost 
> reduc to 410s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15483) Ordered snapshot deletion: Disallow rename between two snapshottable directories



 [ 
https://issues.apache.org/jira/browse/HDFS-15483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukul Kumar Singh updated HDFS-15483:
-
Status: Patch Available  (was: Open)

> Ordered snapshot deletion: Disallow rename between two snapshottable 
> directories
> 
>
> Key: HDFS-15483
> URL: https://issues.apache.org/jira/browse/HDFS-15483
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Reporter: Tsz-wo Sze
>Assignee: Shashikant Banerjee
>Priority: Major
>
> With the ordered snapshot deletion feature, only the *earliest* snapshot can 
> be actually deleted from the file system.  If renaming between snapshottable 
> directories is allowed, only the earliest snapshot among all the 
> snapshottable directories can be actually deleted.  In such case, individual 
> snapshottable directory may not be able to free up the resources by itself.
> Therefore, we propose disallowing renaming between snapshottable directories 
> in this JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-13934) Multipart uploaders to be created through API call to FileSystem/FileContext, not service loader

2020-07-29 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167094#comment-17167094
 ] 

Ayush Saxena commented on HDFS-13934:
-

Hi [~ste...@apache.org], [~fabbri] 
 This tends to break {{TestHDFSContractMultipartUploader.testConcurrentUploads}}
 Should be a test only issue, I guess :
{code:java}
eventually(timeToBecomeConsistentMillis(),
() -> verifyFileLength(file, size2),
new LambdaTestUtils.ProportionalRetryInterval(
CONSISTENCY_INTERVAL, timeToBecomeConsistentMillis() == 0 ?
CONSISTENCY_INTERVAL :
timeToBecomeConsistentMillis())); // This is 0 for HDFS
{code}
The reason being {{timeToBecomeConsistentMillis()}} is 0, and there is 
{{PreCondition}} check in {{LambdaTestUtils.ProportionalRetryInterval}} making 
sure it shouldn't be 0,
 Earlier that {{LambdaTestUtils.FixedRetryInterval}} was being used, which 
didn't had this issue
 We can change back to it?, or handle this specifically for HDFS?

This test is even extended by {{ITestS3AContractMultipartUploader}} which I can 
not verify, and any change here would impact that as well, So, would be good if 
either of you, help fix this.

Ref : 
[https://builds.apache.org/job/PreCommit-HDFS-Build/29566/testReport/org.apache.hadoop.fs.contract.hdfs/TestHDFSContractMultipartUploader/testConcurrentUploads/]

> Multipart uploaders to be created through API call to FileSystem/FileContext, 
> not service loader
> 
>
> Key: HDFS-13934
> URL: https://issues.apache.org/jira/browse/HDFS-13934
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, fs/s3, hdfs
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.3.1
>
>
> the Multipart Uploaders are created via service loaders. This is troublesome
> # HADOOP-12636, HADOOP-13323, HADOOP-13625 highlight how the load process 
> forces the transient loading of dependencies.  If a dependent class cannot be 
> loaded (e.g aws-sdk is not on the classpath), that service won't load. 
> Without error handling round the load process, this stops any uploader from 
> loading. Even with that error handling, the performance hit of that load, 
> especially with reshaded dependencies, hurts performance (HADOOP-13138).
> # it makes wrapping the the load with any filter impossible, stops transitive 
> binding through viewFS, mocking, etc.
> # It complicates security in a kerberized world. If you have an FS instance 
> of user A, then you should be able to create an MPU instance with that user's 
> permissions. currently, if a service were to try to create one, you'd be 
> looking at doAs() games around the service loading, and a more complex bind 
> process.
> Proposed
> # remove the service loader mech entirely
> # add to FS & FC as createMultipartUploader(path) call, which will create one 
> bound to the current FS, with its permissions, DTs, etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-15497) Make snapshot limit on global as well per snapshot root directory configurable



 [ 
https://issues.apache.org/jira/browse/HDFS-15497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shashikant Banerjee updated HDFS-15497:
---
Attachment: HDFS-15497.000.patch

> Make snapshot limit on global as well per snapshot root directory configurable
> --
>
> Key: HDFS-15497
> URL: https://issues.apache.org/jira/browse/HDFS-15497
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: snapshots
>Affects Versions: 3.4.0
>Reporter: Shashikant Banerjee
>Assignee: Shashikant Banerjee
>Priority: Major
> Attachments: HDFS-15497.000.patch
>
>
> Currently, there is no configurable limit imposed on the no of snapshots 
> remaining in the system neither on the filesystem level nor on a snaphottable 
> root directory. Too many snapshots in the system can potentially bloat up the 
> namespace and with ordered deletion feature on , too many snapshots per 
> snapshottable root directory will make the deletion of the oldest snapshot 
> more expensive. This Jira aims to impose these configurable limits .



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15438) Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy

2020-07-29 Thread Ayush Saxena (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167085#comment-17167085
 ] 

Ayush Saxena commented on HDFS-15438:
-

Can you check the test failures, Couple of them seems related


> Setting dfs.disk.balancer.max.disk.errors = 0 will fail the block copy
> --
>
> Key: HDFS-15438
> URL: https://issues.apache.org/jira/browse/HDFS-15438
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: balancer  mover
>Reporter: AMC-team
>Priority: Major
> Attachments: HDFS-15438.000.patch, HDFS-15438.001.patch
>
>
> In HDFS disk balancer, the config parameter 
> "dfs.disk.balancer.max.disk.errors" is to control the value of maximum number 
> of errors we can ignore for a specific move between two disks before it is 
> abandoned.
> The parameter can accept value that >= 0. And setting the value to 0 should 
> mean no error tolerance. However, setting the value to 0 will simply don't do 
> the block copy even there is no disk error occur because the while loop 
> condition *item.getErrorCount() < getMaxError(item)* will not satisfied.
> {code:java}
> // Gets the next block that we can copy
> private ExtendedBlock getBlockToCopy(FsVolumeSpi.BlockIterator iter,
>  DiskBalancerWorkItem item) {
>   while (!iter.atEnd() && item.getErrorCount() < getMaxError(item)) {
> try {
>   ... //get the block
> }  catch (IOException e) {
> item.incErrorCount();
> }
>if (item.getErrorCount() >= getMaxError(item)) {
> item.setErrMsg("Error count exceeded.");
> LOG.info("Maximum error count exceeded. Error count: {} Max error:{} 
> ",
> item.getErrorCount(), item.getMaxDiskErrors());
>   }
> {code}
> *How to fix*
> Change the while loop condition to support value 0.
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15229) Truncate info should be logged at INFO level

2020-07-29 Thread Hadoop QA (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167044#comment-17167044
 ] 

Hadoop QA commented on HDFS-15229:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
11s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 36m 
12s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
18s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
54s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
30s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 28s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
48s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  3m 
31s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
28s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 18s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  3m 
11s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 99m 42s{color} 
| {color:red} hadoop-hdfs in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
41s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}187m  6s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.TestDecommissionWithBackoffMonitor |
|   | hadoop.hdfs.TestFileChecksum |
|   | hadoop.hdfs.server.namenode.TestNameNodeRespectsBindHostKeys |
|   | hadoop.fs.contract.hdfs.TestHDFSContractMultipartUploader |
|   | hadoop.tools.TestHdfsConfigFields |
|   | hadoop.hdfs.TestDFSStripedInputStream |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | ClientAPI=1.40 ServerAPI=1.40 base: 
https://ci-hadoop.apache.org/job/PreCommit-HDFS-Build/13/artifact/out/Dockerfile
 |
| JIRA Issue | HDFS-15229 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/13008636/HDFS-15229.001.patch |
| Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite 
unit shadedclient findbugs checkstyle |
| uname | Linux 71e02f86de5f 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64

[jira] [Created] (HDFS-15497) Make snapshot limit on global as well per snapshot root directory configurable