[jira] [Updated] (HDFS-15394) Add all available fs.viewfs.overload.scheme.target..impl classes in core-default.xml bydefault.

2020-06-16 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-15394:
---
Fix Version/s: 3.3.1

> Add all available fs.viewfs.overload.scheme.target..impl classes in 
> core-default.xml bydefault.
> ---
>
> Key: HDFS-15394
> URL: https://issues.apache.org/jira/browse/HDFS-15394
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: configuration, viewfs, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> This proposes to add all available 
> fs.viewfs.overload.scheme.target..impl classes in core-default.xml. 
> So, that users need not configure them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15387) FSUsage$DF should consider ViewFSOverloadScheme in processPath

2020-06-16 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-15387:
---
Fix Version/s: 3.3.1

> FSUsage$DF should consider ViewFSOverloadScheme in processPath
> --
>
> Key: HDFS-15387
> URL: https://issues.apache.org/jira/browse/HDFS-15387
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Minor
> Fix For: 3.3.1, 3.4.0
>
>
> Currently for calculating DF, processPath checks if it's ViewFS scheme, it 
> gets status from all fs and calculate. If not it will directly call 
> fs.getStatus.
> Here we should treat ViewFSOverloadScheme also in ViewFS flow



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15321) Make DFSAdmin tool to work with ViewFSOverloadScheme

2020-06-16 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-15321:
---
Fix Version/s: 3.4.0
   3.3.1

> Make DFSAdmin tool to work with ViewFSOverloadScheme
> 
>
> Key: HDFS-15321
> URL: https://issues.apache.org/jira/browse/HDFS-15321
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: dfsadmin, fs, viewfs
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> When we enable ViewFSOverLoadScheme and used hdfs scheme as overloaded 
> scheme, users work with hdfs uris. But here DFSAdmin expects the impl classe 
> to be DistribbuteFileSystem. If impl class is ViewFSoverloadScheme, it will 
> fail.
> So, when impl is ViewFSoverloadScheme, we should get corresponding child hdfs 
> to make DFSAdmin to work.
> This Jira makes the DFSAdmin to work with ViewFSoverloadScheme.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15389) DFSAdmin should close filesystem and dfsadmin -setBalancerBandwidth should work with ViewFSOverloadScheme

2020-06-16 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-15389:
---
Fix Version/s: 3.3.1

> DFSAdmin should close filesystem and dfsadmin -setBalancerBandwidth should 
> work with ViewFSOverloadScheme 
> --
>
> Key: HDFS-15389
> URL: https://issues.apache.org/jira/browse/HDFS-15389
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: dfsadmin, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Ayush Saxena
>Assignee: Ayush Saxena
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15389-01.patch
>
>
> Two Issues Here :
> Firstly Prior to HDFS-15321, When DFSAdmin was closed the FileSystem 
> associated with it was closed as part of close method, But post HDFS-15321, 
> the {{FileSystem}} isn't stored as part of {{FsShell}}, hence during close, 
> the FileSystem still stays and isn't close.
> * This is the reason for failure of TestDFSHAAdmin
> Second : {{DfsAdmin -setBalancerBandwidth}} doesn't work with 
> {{ViewFSOverloadScheme}} since the setBalancerBandwidth calls {{getFS()}} 
> rather than {{getDFS()}} which resolves the scheme in {{HDFS-15321}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15330) Document the ViewFSOverloadScheme details in ViewFS guide

2020-06-16 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-15330:
---
Fix Version/s: 3.3.1

> Document the ViewFSOverloadScheme details in ViewFS guide
> -
>
> Key: HDFS-15330
> URL: https://issues.apache.org/jira/browse/HDFS-15330
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: viewfs, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> This Jira to track for documentation of ViewFSOverloadScheme usage guide.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15322) Make NflyFS to work when ViewFsOverloadScheme's scheme and target uris schemes are same.

2020-06-16 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-15322:
---
Fix Version/s: 3.3.1

> Make NflyFS to work when ViewFsOverloadScheme's scheme and target uris 
> schemes are same.
> 
>
> Key: HDFS-15322
> URL: https://issues.apache.org/jira/browse/HDFS-15322
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: fs, nflyFs, viewfs, viewfsOverloadScheme
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> Currently Nfly mount link will not work when we use ViewFSOverloadScheme.
> Because when when configured scheme is hdfs and target uris scheme also hdfs, 
> it will face the similar issue of looping what we discussed in design. We 
> need to use FsGetter to handle looping. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15306) Make mount-table to read from central place ( Let's say from HDFS)

2020-06-16 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-15306:
---
Fix Version/s: 3.3.1

> Make mount-table to read from central place ( Let's say from HDFS)
> --
>
> Key: HDFS-15306
> URL: https://issues.apache.org/jira/browse/HDFS-15306
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: configuration, hadoop-client
>Affects Versions: 3.2.1
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
>
> ViewFsOverloadScheme should be able to read mount-table.xml configuration 
> from remote servers.
>  Below are the discussed options in design doc:
>  # XInclude and HTTP Server ( including WebHDFS)
>  # Hadoop Compatible FS (*HCFS)*
>  a) Keep mount-table in Hadoop compatible FS
>  b)Read mount-table from Hadoop compatible FS using Xinclude
> We prefer to have 1 and 2a. For 1 we don't need to modify any code. So, this 
> Jira can cover 2a.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2020-06-16 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17137978#comment-17137978
 ] 

Hudson commented on HDFS-15372:
---

SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #18354 (See 
[https://builds.apache.org/job/Hadoop-trunk-Commit/18354/])
HDFS-15372. Files in snapshots no longer see attribute provider (weichiu: rev 
730a39d1388548f22f76132a6734d61c24c3eb72)
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestINodeAttributeProvider.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/INodesInPath.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSPermissionChecker.java
* (edit) 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java


> Files in snapshots no longer see attribute provider permissions
> ---
>
> Key: HDFS-15372
> URL: https://issues.apache.org/jira/browse/HDFS-15372
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, 
> HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>   throws IOException {
> INode node = FSDirectory.resolveLastINode(iip);
> int snapshot = iip.getPathSnapshotId();
> INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
> UserGroupInformation ugi = NameNode.getRemoteUser();
> INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
> if (ap != null) {
>   // permission checking sends the full components array including the
>   // first empty component for the root.  however file status
>   // related calls are expected to strip out the root component according
>   // to TestINodeAttributeProvider.
>   byte[][] components = iip.getPathComponents();
>   components = Arrays.copyOfRange(components, 1, components.length);
>   nodeAttrs = ap.getAttributes(components, nodeAttrs);
> }
> return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> The pre Hadoop 3.0 code passes the inode directly to the provider, and hence 
> it only ever sees the path as "/user/data/tab1".
> It is debatable which path should be passed to the provider - 
> /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as 
> the behaviour has changed I feel we should ensure the old behaviour is 
> retained.
> It would also be fairly easy to provide a config switch so the provider gets 
> the full snapshot path or the resolved path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Updated] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2020-06-16 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-15372:
---
Fix Version/s: 3.4.0
   3.3.1
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Resolving the jira. Thanks [~sodonnell] for finding the issue and fixing it, 
and [~hemanthboyina] for reviewing the patch.

> Files in snapshots no longer see attribute provider permissions
> ---
>
> Key: HDFS-15372
> URL: https://issues.apache.org/jira/browse/HDFS-15372
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Fix For: 3.3.1, 3.4.0
>
> Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, 
> HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>   throws IOException {
> INode node = FSDirectory.resolveLastINode(iip);
> int snapshot = iip.getPathSnapshotId();
> INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
> UserGroupInformation ugi = NameNode.getRemoteUser();
> INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
> if (ap != null) {
>   // permission checking sends the full components array including the
>   // first empty component for the root.  however file status
>   // related calls are expected to strip out the root component according
>   // to TestINodeAttributeProvider.
>   byte[][] components = iip.getPathComponents();
>   components = Arrays.copyOfRange(components, 1, components.length);
>   nodeAttrs = ap.getAttributes(components, nodeAttrs);
> }
> return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> The pre Hadoop 3.0 code passes the inode directly to the provider, and hence 
> it only ever sees the path as "/user/data/tab1".
> It is debatable which path should be passed to the provider - 
> /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as 
> the behaviour has changed I feel we should ensure the old behaviour is 
> retained.
> It would also be fairly easy to provide a config switch so the provider gets 
> the full snapshot path or the resolved path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2020-06-16 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17137975#comment-17137975
 ] 

Wei-Chiu Chuang commented on HDFS-15372:


Committed the change to trunk and branch-3.3. There are pretty big conflicts 
cherrypicking to the branch-3.2 due to HDFS-14743.

> Files in snapshots no longer see attribute provider permissions
> ---
>
> Key: HDFS-15372
> URL: https://issues.apache.org/jira/browse/HDFS-15372
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, 
> HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>   throws IOException {
> INode node = FSDirectory.resolveLastINode(iip);
> int snapshot = iip.getPathSnapshotId();
> INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
> UserGroupInformation ugi = NameNode.getRemoteUser();
> INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
> if (ap != null) {
>   // permission checking sends the full components array including the
>   // first empty component for the root.  however file status
>   // related calls are expected to strip out the root component according
>   // to TestINodeAttributeProvider.
>   byte[][] components = iip.getPathComponents();
>   components = Arrays.copyOfRange(components, 1, components.length);
>   nodeAttrs = ap.getAttributes(components, nodeAttrs);
> }
> return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> The pre Hadoop 3.0 code passes the inode directly to the provider, and hence 
> it only ever sees the path as "/user/data/tab1".
> It is debatable which path should be passed to the provider - 
> /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as 
> the behaviour has changed I feel we should ensure the old behaviour is 
> retained.
> It would also be fairly easy to provide a config switch so the provider gets 
> the full snapshot path or the resolved path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15372) Files in snapshots no longer see attribute provider permissions

2020-06-16 Thread Wei-Chiu Chuang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17137971#comment-17137971
 ] 

Wei-Chiu Chuang commented on HDFS-15372:


+1 from me. Will commit later.

> Files in snapshots no longer see attribute provider permissions
> ---
>
> Key: HDFS-15372
> URL: https://issues.apache.org/jira/browse/HDFS-15372
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
> Attachments: HDFS-15372.001.patch, HDFS-15372.002.patch, 
> HDFS-15372.003.patch, HDFS-15372.004.patch, HDFS-15372.005.patch
>
>
> Given a cluster with an authorization provider configured (eg Sentry) and the 
> paths covered by the provider are snapshotable, there was a change in 
> behaviour in how the provider permissions and ACLs are applied to files in 
> snapshots between the 2.x branch and Hadoop 3.0.
> Eg, if we have the snapshotable path /data, which is Sentry managed. The ACLs 
> below are provided by Sentry:
> {code}
> hadoop fs -getfacl -R /data
> # file: /data
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::---
> group:flume:rwx
> user:hive:rwx
> group:hive:rwx
> group:testgroup:rwx
> mask::rwx
> other::--x
> /data/tab1
> {code}
> After taking a snapshot, the files in the snapshot do not see the provider 
> permissions:
> {code}
> hadoop fs -getfacl -R /data/.snapshot
> # file: /data/.snapshot
> # owner: 
> # group: 
> user::rwx
> group::rwx
> other::rwx
> # file: /data/.snapshot/snap1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> # file: /data/.snapshot/snap1/tab1
> # owner: hive
> # group: hive
> user::rwx
> group::rwx
> other::--x
> {code}
> However pre-Hadoop 3.0 (when the attribute provider etc was extensively 
> refactored) snapshots did get the provider permissions.
> The reason is this code in FSDirectory.java which ultimately calls the 
> attribute provider and passes the path we want permissions for:
> {code}
>   INodeAttributes getAttributes(INodesInPath iip)
>   throws IOException {
> INode node = FSDirectory.resolveLastINode(iip);
> int snapshot = iip.getPathSnapshotId();
> INodeAttributes nodeAttrs = node.getSnapshotINode(snapshot);
> UserGroupInformation ugi = NameNode.getRemoteUser();
> INodeAttributeProvider ap = this.getUserFilteredAttributeProvider(ugi);
> if (ap != null) {
>   // permission checking sends the full components array including the
>   // first empty component for the root.  however file status
>   // related calls are expected to strip out the root component according
>   // to TestINodeAttributeProvider.
>   byte[][] components = iip.getPathComponents();
>   components = Arrays.copyOfRange(components, 1, components.length);
>   nodeAttrs = ap.getAttributes(components, nodeAttrs);
> }
> return nodeAttrs;
>   }
> {code}
> The line:
> {code}
> INode node = FSDirectory.resolveLastINode(iip);
> {code}
> Picks the last resolved Inode and if you then call node.getPathComponents, 
> for a path like '/data/.snapshot/snap1/tab1' it will return /data/tab1. It 
> resolves the snapshot path to its original location, but its still the 
> snapshot inode.
> However the logic passes 'iip.getPathComponents' which returns 
> "/user/.snapshot/snap1/tab" to the provider.
> The pre Hadoop 3.0 code passes the inode directly to the provider, and hence 
> it only ever sees the path as "/user/data/tab1".
> It is debatable which path should be passed to the provider - 
> /user/.snapshot/snap1/tab or /data/tab1 in the case of snapshots. However as 
> the behaviour has changed I feel we should ensure the old behaviour is 
> retained.
> It would also be fairly easy to provide a config switch so the provider gets 
> the full snapshot path or the resolved path.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-15289) Allow viewfs mounts with HDFS/HCFS scheme and centralized mount table

2020-06-16 Thread Uma Maheswara Rao G (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-15289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uma Maheswara Rao G updated HDFS-15289:
---
Description: 
ViewFS provides flexibility to mount different filesystem types with mount 
points configuration table. This approach is solving the scalability problems, 
but users need to reconfigure the filesystem to ViewFS and to its scheme.  This 
will be problematic in the case of paths persisted in meta stores, ex: Hive. In 
systems like Hive, it will store uris in meta store. So, changing the file 
system scheme will create a burden to upgrade/recreate meta stores. In our 
experience many users are not ready to change that.  

Router based federation is another implementation to provide coordinated mount 
points for HDFS federation clusters. Even though this provides flexibility to 
handle mount points easily, this will not allow other(non-HDFS) file systems to 
mount. So, this does not solve the purpose when users want to mount 
external(non-HDFS) filesystems.

So, the problem here is: Even though many users want to adapt to the scalable 
fs options available, technical challenges of changing schemes (ex: in meta 
stores) in deployments are obstructing them. 

So, we propose to allow hdfs scheme in ViewFS like client side mount system and 
provision user to create mount links without changing URI paths. 
I will upload detailed design doc shortly.

  was:
ViewFS provides flexibility to mount different filesystem types with mount 
points configuration table. Additionally viewFS provides flexibility to 
configure any fs (not only HDFS) scheme in mount table mapping. This approach 
is solving the scalability problems, but users need to reconfigure the 
filesystem to ViewFS and to its scheme.  This will be problematic in the case 
of paths persisted in meta stores, ex: Hive. In systems like Hive, it will 
store uris in meta store. So, changing the file system scheme will create a 
burden to upgrade/recreate meta stores. In our experience many users are not 
ready to change that.  

Router based federation is another implementation to provide coordinated mount 
points for HDFS federation clusters. Even though this provides flexibility to 
handle mount points easily, this will not allow other(non-HDFS) file systems to 
mount. So, this does not solve the purpose when users want to mount 
external(non-HDFS) filesystems.

So, the problem here is: Even though many users want to adapt to the scalable 
fs options available, technical challenges of changing schemes (ex: in meta 
stores) in deployments are obstructing them. 

So, we propose to allow hdfs scheme in ViewFS like client side mount system and 
provision user to create mount links without changing URI paths. 
I will upload detailed design doc shortly.


> Allow viewfs mounts with HDFS/HCFS scheme and centralized mount table
> -
>
> Key: HDFS-15289
> URL: https://issues.apache.org/jira/browse/HDFS-15289
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: fs
>Affects Versions: 3.2.0
>Reporter: Uma Maheswara Rao G
>Assignee: Uma Maheswara Rao G
>Priority: Major
> Attachments: ViewFSOverloadScheme - V1.0.pdf, ViewFSOverloadScheme.png
>
>
> ViewFS provides flexibility to mount different filesystem types with mount 
> points configuration table. This approach is solving the scalability 
> problems, but users need to reconfigure the filesystem to ViewFS and to its 
> scheme.  This will be problematic in the case of paths persisted in meta 
> stores, ex: Hive. In systems like Hive, it will store uris in meta store. So, 
> changing the file system scheme will create a burden to upgrade/recreate meta 
> stores. In our experience many users are not ready to change that.  
> Router based federation is another implementation to provide coordinated 
> mount points for HDFS federation clusters. Even though this provides 
> flexibility to handle mount points easily, this will not allow 
> other(non-HDFS) file systems to mount. So, this does not solve the purpose 
> when users want to mount external(non-HDFS) filesystems.
> So, the problem here is: Even though many users want to adapt to the scalable 
> fs options available, technical challenges of changing schemes (ex: in meta 
> stores) in deployments are obstructing them. 
> So, we propose to allow hdfs scheme in ViewFS like client side mount system 
> and provision user to create mount links without changing URI paths. 
> I will upload detailed design doc shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (HDFS-8538) Change the default volume choosing policy to AvailableSpaceVolumeChoosingPolicy

2020-06-16 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-8538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136810#comment-17136810
 ] 

Stephen O'Donnell commented on HDFS-8538:
-

As this is an old Jira and there was some objections to this change 
historically, I asked on the mailing list to see if the community was happy to 
make this change.

To add some context, in the 5 years since this change was suggested, at 
Cloudera we have seen about 1000 clusters running with Available Space enabled, 
and we have not seen any issues caused by it. It feels like this policy should 
be the default, as we have to change it more often than not.

>From the mailing list, I got 3 +1's to move forward with making 
>AvailableSpaceVolumeChoosingPolicy the default from 3.4.0 and no objections.

Archives are at:

http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/202004.mbox/browser

and

http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/202005.mbox/browser

With that in mind I will move forward on this in a week or so, unless anyone 
objects here.

> Change the default volume choosing policy to 
> AvailableSpaceVolumeChoosingPolicy
> ---
>
> Key: HDFS-8538
> URL: https://issues.apache.org/jira/browse/HDFS-8538
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 2.7.0
>Reporter: Andrew Wang
>Assignee: Andrew Wang
>Priority: Major
> Attachments: hdfs-8538.001.patch
>
>
> For datanodes with different sized disks, they almost always want the 
> available space policy. Users with homogenous disks are unaffected.
> Since this code has baked for a while, let's change it to be the default.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-16 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136797#comment-17136797
 ] 

Stephen O'Donnell commented on HDFS-15406:
--

Going from 3m20 to 52 seconds with such a simple change is a big win.

It would be great to see how long the sort call takes on this line:

{code}
Collections.sort(bl); // Sort based on blockId
{code}

My feeling is that we can just drop this call due to the already sorted 
FoldedTreeSet, but it might be best to create a new method on FSDatasetSPI 
called getSortedFinalizedBlocks to make it obvious, as there are possibly other 
implementations of FSDatasetImpl which may not have the blocks already sorted.

If you don't want to investigate that as part of this Jira, we can create a 
sub-jira for it. It might be a trivial change to improve things a bit further, 
perhaps saving about another 5 seconds.

> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15406.001.patch
>
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has taken nearly 5mins
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock identifier: 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
> lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>  | InstrumentedLock.java:143 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-16 Thread hemanthboyina (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136730#comment-17136730
 ] 

hemanthboyina commented on HDFS-15406:
--

thanks [~sodonnell] for the comment 
{quote} # After you started caching `getBaseURI()` did it improve the runtime 
of both the getDiskReport() step and compare with in-memory step?{quote}
yes , on caching the getBaseURI the time taken is improved in both places , 
more details below
{quote}2. Looking at the code on trunk, I don't think we create any scanInfo 
objects under the lock in the compare sections unless there is a difference. If 
this change improved your runtime under the lock from 6m -> 52 seconds, is this 
because there is a large number of differences between disk and memory on your 
cluster for some reason?
{quote}
i think you are referring creation of scan info objects here because for 
creating ScanInfo object we use vol.getBaseUri() , Even we do not create 
scaninfo objects  we internally  call getBaseURI 3 times inside the lock by  
info.getBlockFile() , info.getGenStamp() , memBlock.compareWith(info) , so if 
we have large number of differences between disk and in memory calls to 
getBaseURI will be even more , So by caching getBaseURI we saved  atleast 3 
times inside lock and once in outside lock  , so there was huge decrease in 
lock time held ,We tried creating 11M blocks in our independent cluster and we 
could see lock time is held for 3min 20 sec before caching and upon caching the 
getBaseURI lock time was52sec
{quote}3. Did you do capture any profiles (flame chart or debug log messages) 
to see how long each part of the code under the lock runs for? I am interested 
in these lines in DirectoryScanner#scan():
{quote}
i didn't capture profiles , but i could see dataset.getFinalizedBlocks(bpid) in 
stacktrace as it has taken more than 600ms on every iteration

 

> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15406.001.patch
>
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has taken nearly 5mins
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock identifier: 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
> lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>  | InstrumentedLock.java:143 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-16 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136570#comment-17136570
 ] 

Stephen O'Donnell commented on HDFS-15406:
--

Thinking about this some more, as `dataset.getFinalizedBlocks(bpid);` makes a 
new copy of all the finalized blocks in the block pool, do we even need to hold 
the DN lock while we compare the differences between on disk and in memory? 
From the scan step, we have captured a snapshot of what is on disk. After 
calling `dataset.getFinalizedBlocks(bpid);` we have taken a snapshot of in 
memory. The two snapshots are never 100% in sync as things are always changing 
as the disk is scanned.

We are only comparing finalized blocks, so they should not really change:

* If a block is deleted after our snapshot, our snapshot will not see it and 
that is OK.
* A finalized block could be appended. If that happens both the genstamp and 
length will change, but that should be handled by reconcile when it calls 
`FSDatasetImpl.checkAndUpdate()`, and there is nothing stopping blocks being 
appended after they have been scanned from disk, but before they have been 
compared with memory.

I am not 100% sure about this, but my suspicion is that we can do a lot of this 
work outside of the lock ad checkAndUpdate() re-checks any differences later 
under the lock on a block by block basis.

> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15406.001.patch
>
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has taken nearly 5mins
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock identifier: 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
> lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>  | InstrumentedLock.java:143 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-16 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136552#comment-17136552
 ] 

Stephen O'Donnell edited comment on HDFS-15406 at 6/16/20, 11:09 AM:
-

[~hemanthboyina], this is a good find. Can I just clarify:

1. After you started caching `getBaseURI()` did it improve the runtime of both 
the getDiskReport() step and compare with in-memory step?

2. Looking at the code on trunk, I don't think we create any scanInfo objects 
under the lock in the compare sections unless there is a difference. If this 
change improved your runtime under the lock from 6m -> 52 seconds, is this 
because there is a large number of differences between disk and memory on your 
cluster for some reason?

3. Did you do capture any profiles (flame chart or debug log messages) to see 
how long each part of the code under the lock runs for? I am interested in 
these lines in DirectoryScanner#scan():

{code}
  final List bl = dataset.getFinalizedBlocks(bpid);
  Collections.sort(bl); // Sort based on blockId
{code}

You mentioned this DN has 11M blocks, so I imagine forming this list of 
ReplicaInfo and then sorting it takes some time, several seconds at least. 
Based on tests I did here, sorting 11M blocks would probably take about 5 
seconds:

https://issues.apache.org/jira/browse/HDFS-15140?focusedCommentId=17023077=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17023077

Inside the replicaMap, the ReplicaInfo are stored in a FoldedTreeSet, which is 
a sorted structure. We should be able to get an iterator on it and avoid the 
need to create this new ReplicaInfo list and sort it. It would require some 
changes to the subsequent code if we used an Iterator, but I suspect we can 
just drop the sort with no further changes.

If you are able to test this on your real datanode, it would be interesting to 
see how long it takes for the getFinalizedBlocks and then for the sort to see 
if this takes shaves some more time off under the lock.


was (Author: sodonnell):
[~hemanthboyina], this is a good find. Can I just clarify:

1. After you started caching `getBaseURI()` did it improve the runtime of both 
the getDiskReport() step and compare with in-memory step?

2. Looking at the code on trunk, I don't think we create any scanInfo objects 
under the lock in the compare sections unless there is a difference. If this 
change improved your runtime under the lock from 6m -> 52 seconds, is this 
because there is a large number of differences between disk and memory on your 
cluster for some reason?

3. Did you do capture any profiles (flame chart or debug log messages) to see 
how long each part of the code under the lock runs for? I am interested in 
these lines:

{code}
  final List bl = dataset.getFinalizedBlocks(bpid);
  Collections.sort(bl); // Sort based on blockId
{code}

You mentioned this DN has 11M blocks, so I imagine forming this list of 
ReplicaInfo and then sorting it takes some time, several seconds at least. 
Based on tests I did here, sorting 11M blocks would probably take about 5 
seconds:

https://issues.apache.org/jira/browse/HDFS-15140?focusedCommentId=17023077=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17023077

Inside the replicaMap, the ReplicaInfo are stored in a FoldedTreeSet, which is 
a sorted structure. We should be able to get an iterator on it and avoid the 
need to create this new ReplicaInfo list and sort it. It would require some 
changes to the subsequent code if we used an Iterator, but I suspect we can 
just drop the sort with no further changes.

If you are able to test this on your real datanode, it would be interesting to 
see how long it takes for the getFinalizedBlocks and then for the sort to see 
if this takes shaves some more time off under the lock.

> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15406.001.patch
>
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has taken nearly 5mins
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock 

[jira] [Commented] (HDFS-15406) Improve the speed of Datanode Block Scan

2020-06-16 Thread Stephen O'Donnell (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136552#comment-17136552
 ] 

Stephen O'Donnell commented on HDFS-15406:
--

[~hemanthboyina], this is a good find. Can I just clarify:

1. After you started caching `getBaseURI()` did it improve the runtime of both 
the getDiskReport() step and compare with in-memory step?

2. Looking at the code on trunk, I don't think we create any scanInfo objects 
under the lock in the compare sections unless there is a difference. If this 
change improved your runtime under the lock from 6m -> 52 seconds, is this 
because there is a large number of differences between disk and memory on your 
cluster for some reason?

3. Did you do capture any profiles (flame chart or debug log messages) to see 
how long each part of the code under the lock runs for? I am interested in 
these lines:

{code}
  final List bl = dataset.getFinalizedBlocks(bpid);
  Collections.sort(bl); // Sort based on blockId
{code}

You mentioned this DN has 11M blocks, so I imagine forming this list of 
ReplicaInfo and then sorting it takes some time, several seconds at least. 
Based on tests I did here, sorting 11M blocks would probably take about 5 
seconds:

https://issues.apache.org/jira/browse/HDFS-15140?focusedCommentId=17023077=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17023077

Inside the replicaMap, the ReplicaInfo are stored in a FoldedTreeSet, which is 
a sorted structure. We should be able to get an iterator on it and avoid the 
need to create this new ReplicaInfo list and sort it. It would require some 
changes to the subsequent code if we used an Iterator, but I suspect we can 
just drop the sort with no further changes.

If you are able to test this on your real datanode, it would be interesting to 
see how long it takes for the getFinalizedBlocks and then for the sort to see 
if this takes shaves some more time off under the lock.

> Improve the speed of Datanode Block Scan
> 
>
> Key: HDFS-15406
> URL: https://issues.apache.org/jira/browse/HDFS-15406
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: hemanthboyina
>Assignee: hemanthboyina
>Priority: Major
> Attachments: HDFS-15406.001.patch
>
>
> In our customer cluster we have approx 10M blocks in one datanode 
> the Datanode to scans all the blocks , it has taken nearly 5mins
> {code:java}
> 2020-06-10 12:17:06,869 | INFO  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | BlockPool BP-1104115233-**.**.**.**-1571300215588 Total blocks: 
> 11149530, missing metadata files:472, missing block files:472, missing blocks 
> in memory:0, mismatched blocks:0 | DirectoryScanner.java:473
> 2020-06-10 12:17:06,869 | WARN  | 
> java.util.concurrent.ThreadPoolExecutor$Worker@3b4bea70[State = -1, empty 
> queue] | Lock held time above threshold: lock identifier: 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl 
> lockHeldTimeMs=329854 ms. Suppressed 0 lock warnings. The stack trace is: 
> java.lang.Thread.getStackTrace(Thread.java:1559)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032)
> org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148)
> org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186)
> org.apache.hadoop.util.InstrumentedLock.unlock(InstrumentedLock.java:133)
> org.apache.hadoop.util.AutoCloseableLock.release(AutoCloseableLock.java:84)
> org.apache.hadoop.util.AutoCloseableLock.close(AutoCloseableLock.java:96)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.scan(DirectoryScanner.java:475)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.reconcile(DirectoryScanner.java:375)
> org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.run(DirectoryScanner.java:320)
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> java.lang.Thread.run(Thread.java:748)
>  | InstrumentedLock.java:143 {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15346) DistCpFedBalance implementation

2020-06-16 Thread Yiqun Lin (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136480#comment-17136480
 ] 

Yiqun Lin commented on HDFS-15346:
--

LGTM, +1. Will commit this the day after tomorrow once there is no other 
comment.

> DistCpFedBalance implementation
> ---
>
> Key: HDFS-15346
> URL: https://issues.apache.org/jira/browse/HDFS-15346
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Jinglun
>Assignee: Jinglun
>Priority: Major
> Attachments: HDFS-15346.001.patch, HDFS-15346.002.patch, 
> HDFS-15346.003.patch, HDFS-15346.004.patch, HDFS-15346.005.patch, 
> HDFS-15346.006.patch, HDFS-15346.007.patch, HDFS-15346.008.patch, 
> HDFS-15346.009.patch, HDFS-15346.010.patch, HDFS-15346.011.patch, 
> HDFS-15346.012.patch
>
>
> Patch in HDFS-15294 is too big to review so we split it into 2 patches. This 
> is the second one. Detail can be found at HDFS-15294.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15346) DistCpFedBalance implementation

2020-06-16 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17136433#comment-17136433
 ] 

Hadoop QA commented on HDFS-15346:
--

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
22s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
1s{color} | {color:green} No case conflicting files found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green}  0m 
 0s{color} | {color:green} The patch appears to include 15 new or modified test 
files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m  
4s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 21m 
14s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 18m  
3s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  5m 
53s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
16m 21s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  4m 
34s{color} | {color:green} trunk passed {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  5m 
16s{color} | {color:blue} Used deprecated FindBugs config; considering 
switching to SpotBugs. {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
31s{color} | {color:blue} branch/hadoop-project no findbugs output file 
(findbugsXml.xml) {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
31s{color} | {color:blue} branch/hadoop-assemblies no findbugs output file 
(findbugsXml.xml) {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} branch/hadoop-tools/hadoop-tools-dist no findbugs 
output file (findbugsXml.xml) {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
30s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
49s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 17m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 17m 
32s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
55s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  6m 
37s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} shellcheck {color} | {color:green}  0m 
 0s{color} | {color:green} There were no new shellcheck issues. {color} |
| {color:green}+1{color} | {color:green} shelldocs {color} | {color:green}  0m 
34s{color} | {color:green} There were no new shelldocs issues. {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} xml {color} | {color:green}  0m  
6s{color} | {color:green} The patch has no ill-formed XML file. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m 39s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  5m 
14s{color} | {color:green} the patch passed {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} hadoop-project has no data from findbugs {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
32s{color} | {color:blue} hadoop-assemblies has no data from findbugs {color} |
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m 
33s{color} | {color:blue} hadoop-tools/hadoop-tools-dist has no