[jira] [Commented] (HDFS-12295) NameNode to support file path prefix /.reserved/bypassExtAttr
[ https://issues.apache.org/jira/browse/HDFS-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138521#comment-16138521 ] Yongjun Zhang commented on HDFS-12295: -- HI [~daryn], would really appreciate if you could review the discussion here and comment. Thanks a lot. > NameNode to support file path prefix /.reserved/bypassExtAttr > - > > Key: HDFS-12295 > URL: https://issues.apache.org/jira/browse/HDFS-12295 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-12295.001.patch, HDFS-12295.001.patch > > > Let NameNode to support prefix /.reserved/bypassExtAttr, so client can add > thisprefix to a path before calling getFileStatus, e.g. /ab/c becomes > /.reserved/bypassExtAttr/a/b/c. NN will parse the path at the very beginning, > and bypass external attribute provider if the prefix is there. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12295) NameNode to support file path prefix /.reserved/bypassExtAttr
[ https://issues.apache.org/jira/browse/HDFS-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138480#comment-16138480 ] Yongjun Zhang commented on HDFS-12295: -- Hi [~chris.douglas], One problem is, AccessControlEnforcer in INodeAttributeProvider does access external attribute provider for attributes for permission checking. This might be an issue, say, we have two users: userX is who issue the copy command, copyUser is the dedicated user that runs copy for userX. userX runs the copy, by sending the request to copyUser, the copy is going to be run as copyUser. If external provider disallows userX to access a certain file, but hdfs allows, then we still can copy the file. My original thinking about HDFS-12202 and HDFS-12295 approaches was, when we do permission checking, we still get the attributes from external provider, but when we copy the attributes, we get from HDFS. If we simply claim that external attribute provider does't control file access when we do copy, that would be fine. But from a user's perspective, is it ok to ignore external provider for permission checking? I'm also looking into how to effectively/correctly detect the copyUser at NN side. Thanks. > NameNode to support file path prefix /.reserved/bypassExtAttr > - > > Key: HDFS-12295 > URL: https://issues.apache.org/jira/browse/HDFS-12295 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-12295.001.patch, HDFS-12295.001.patch > > > Let NameNode to support prefix /.reserved/bypassExtAttr, so client can add > thisprefix to a path before calling getFileStatus, e.g. /ab/c becomes > /.reserved/bypassExtAttr/a/b/c. NN will parse the path at the very beginning, > and bypass external attribute provider if the prefix is there. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12295) NameNode to support file path prefix /.reserved/bypassExtAttr
[ https://issues.apache.org/jira/browse/HDFS-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16138002#comment-16138002 ] Yongjun Zhang commented on HDFS-12295: -- Thanks a lot for your comment [~chris.douglas]! and [~asuresh] for the earlier suggestion too! I gave further thought, I now think this is a very interesting idea and simplest to implement! Say, for a given dedicated user, we can just assume external attribute provider is disconnected (we can add a check at HDFS instead of letting the provider pass through), and the only thing this user can do is distcp. Some subtlety: There are three cases: 1. distcp from one cluster to another, where the source cluster has external attribute provider 2. distcp within the same cluster, where the source path is managed by the external attribute provider 3. hadoop fs -cp command , arguments like in 2 Any of these cases could have the problem we try to solve here. But we can require that if external attribute provider is enabled, files can only be copied by the dedicated user to be safe. Like you said, we could provider a service to do distcp, and "hadoop fs -cp", and run the service as a dedicated user. Let me explore this further. > NameNode to support file path prefix /.reserved/bypassExtAttr > - > > Key: HDFS-12295 > URL: https://issues.apache.org/jira/browse/HDFS-12295 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-12295.001.patch, HDFS-12295.001.patch > > > Let NameNode to support prefix /.reserved/bypassExtAttr, so client can add > thisprefix to a path before calling getFileStatus, e.g. /ab/c becomes > /.reserved/bypassExtAttr/a/b/c. NN will parse the path at the very beginning, > and bypass external attribute provider if the prefix is there. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12295) NameNode to support file path prefix /.reserved/bypassExtAttr
[ https://issues.apache.org/jira/browse/HDFS-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137674#comment-16137674 ] Chris Douglas commented on HDFS-12295: -- If the subtree under {{/.reserved/bypassExtAttr}} is read-only, that should address many of the issues that [~daryn] raised. As long as it's only the split generation that's using this API, that limits the cases that break when this feature is used. The requirements for this feature- any user can perform backup-style copies using distcp- may be too broad. Your objective is to avoid cluttering the destination namesystem with xattrs from the external attribute provider at the source. Relying on _all_ users to set this flag correctly is unlikely to achieve this. What you want is the opposite: copying data between these clusters, by default, should take the path that reads the raw xattrs. The less-invasive solutions attempt to relax the requirement that all users run distcp directly. While the user-facing solution satisfies all the requirements, it relies on cooperative users. Would it be feasible to add a layer of indirection in the deployments that need this functionality? If so, then we can make inter-cluster copies available to all users, without changing the internals of HDFS. [Repeating|https://issues.apache.org/jira/browse/HDFS-12202?focusedCommentId=16120861&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16120861] from HDFS-12202, the {{distcp}} command can be swapped out in 3.x. In deployments with this requirement, users can contact a service to schedule an inter-cluster transfer. That backup user could not only be a special-case in the NameNode plugin, it could also help users avoid copying data from encryption zones into unprotected clusters (HDFS-6509). If that's not feasible, can this use case be supported by extending MAPREDUCE-6007? If the src/dst are under {{/.reserved/raw}}, then omitting the external attribute provider is reasonable behavior. > NameNode to support file path prefix /.reserved/bypassExtAttr > - > > Key: HDFS-12295 > URL: https://issues.apache.org/jira/browse/HDFS-12295 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-12295.001.patch, HDFS-12295.001.patch > > > Let NameNode to support prefix /.reserved/bypassExtAttr, so client can add > thisprefix to a path before calling getFileStatus, e.g. /ab/c becomes > /.reserved/bypassExtAttr/a/b/c. NN will parse the path at the very beginning, > and bypass external attribute provider if the prefix is there. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12295) NameNode to support file path prefix /.reserved/bypassExtAttr
[ https://issues.apache.org/jira/browse/HDFS-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16131290#comment-16131290 ] Yongjun Zhang commented on HDFS-12295: -- About 2, User doesn't manually add the prefix at user command line parameters; rather, it's the implementation of distcp, "hadoop fs -cp" etc that adds the prefix (before calling getFileStatus and listStatus). So the path string "inconsistency" may only appear inside HDFS core code, it may not be too bad. what do you think [~daryn]? Thanks. > NameNode to support file path prefix /.reserved/bypassExtAttr > - > > Key: HDFS-12295 > URL: https://issues.apache.org/jira/browse/HDFS-12295 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-12295.001.patch, HDFS-12295.001.patch > > > Let NameNode to support prefix /.reserved/bypassExtAttr, so client can add > thisprefix to a path before calling getFileStatus, e.g. /ab/c becomes > /.reserved/bypassExtAttr/a/b/c. NN will parse the path at the very beginning, > and bypass external attribute provider if the prefix is there. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12295) NameNode to support file path prefix /.reserved/bypassExtAttr
[ https://issues.apache.org/jira/browse/HDFS-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129986#comment-16129986 ] Yongjun Zhang commented on HDFS-12295: -- Hi [~daryn], The proposed solution here tries to address distcp, your comment made me aware of that "hadoop fs -cp" would have the same problem to solve. Thanks again for that. There are several proposals so far: 1. HDFS-12202, add a new set of interface to getFileStatus and listStatus, call this set of interface when needed to solve the problem (distcp, "hadoop fs -cp" etc) Pros: clear interface, no confusion Cons: change is too wide. Have to introduce dummy implementation for FileSystems that don't support attribute provider. 2. HDFS-12294, encode the additional parameter to the path string itself, and extract the prefix from path string. And add the prefix when needed to solve the problem (distcp, "hadoop fs -cp" etc) Pros: no need to change FileSystem interface Cons: inconsistent path string at different places potentially. Since the prefix is only relevant to certain operations. 3. let the external attribute provider to fall through to HDFS if it's a certain user. This is discussed in HDFS-12202 comment. Pros: maybe simpler to implement Cons: potentially won't work (since the same user may want to get data from attribute provider, and other user need to run distcp and "hadoop fs -cp" too) [~daryn], [~chris.douglas], [~asuresh], [~andrew.wang], [~manojg], thanks for your comment earlier, do you think my summary above is reasonable? any better idea or further thoughts to share? Really appreciate it. > NameNode to support file path prefix /.reserved/bypassExtAttr > - > > Key: HDFS-12295 > URL: https://issues.apache.org/jira/browse/HDFS-12295 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-12295.001.patch, HDFS-12295.001.patch > > > Let NameNode to support prefix /.reserved/bypassExtAttr, so client can add > thisprefix to a path before calling getFileStatus, e.g. /ab/c becomes > /.reserved/bypassExtAttr/a/b/c. NN will parse the path at the very beginning, > and bypass external attribute provider if the prefix is there. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12295) NameNode to support file path prefix /.reserved/bypassExtAttr
[ https://issues.apache.org/jira/browse/HDFS-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129342#comment-16129342 ] Yongjun Zhang commented on HDFS-12295: -- Hi [~daryn], Thanks a lot for your comments! Good point about move etc operations. We can add that. For delete, the prefix is like an no op. There quite some other operations, This change will become ugly too. What's your view about HDFS-12202? It doesn't need to change all the other operations, but suggest to add a new set of API for getFileStatus and listStatus. This is a hard sell too because we have to introduce the set of APIs, and do dummy implementation for all FileSystems even if they don't care. Would you please share your thoughts about this anyways? Right now the only application is, distcp will decide whether to add the prefix before calling getFileStatus and listStatus methods. About the copy-n-paste, I did it as a quick dtraft to illustrate the idea for discussion, will definitety put in more centralized location. I'm not sure whether using super user to avoid consulting external permission will work. Would you please see my discussion with [~chris.douglas] in HDFS-12294? Many Thanks. > NameNode to support file path prefix /.reserved/bypassExtAttr > - > > Key: HDFS-12295 > URL: https://issues.apache.org/jira/browse/HDFS-12295 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-12295.001.patch, HDFS-12295.001.patch > > > Let NameNode to support prefix /.reserved/bypassExtAttr, so client can add > thisprefix to a path before calling getFileStatus, e.g. /ab/c becomes > /.reserved/bypassExtAttr/a/b/c. NN will parse the path at the very beginning, > and bypass external attribute provider if the prefix is there. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12295) NameNode to support file path prefix /.reserved/bypassExtAttr
[ https://issues.apache.org/jira/browse/HDFS-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126000#comment-16126000 ] Daryn Sharp commented on HDFS-12295: If this is to be implemented, which I'm not sure it should be, having only a subset of operations understand the path prefix is too fragile and creates semantic inconsistencies. Ex. getFileStatus on /.reserved/bypassExtAttr/dir/file effectively operates on /dir/file; however delete, rename, etc operates on /.reserved/bypassExtAttr/dir/file. That's a showstopper. Copy-n-paste of the same chunk of code should be a red flag that something is wrong and it needs to be implemented in a more centralized location. I'd almost rather the use case be handled via the superuser not being subject to the external permissions. > NameNode to support file path prefix /.reserved/bypassExtAttr > - > > Key: HDFS-12295 > URL: https://issues.apache.org/jira/browse/HDFS-12295 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-12295.001.patch, HDFS-12295.001.patch > > > Let NameNode to support prefix /.reserved/bypassExtAttr, so client can add > thisprefix to a path before calling getFileStatus, e.g. /ab/c becomes > /.reserved/bypassExtAttr/a/b/c. NN will parse the path at the very beginning, > and bypass external attribute provider if the prefix is there. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-12295) NameNode to support file path prefix /.reserved/bypassExtAttr
[ https://issues.apache.org/jira/browse/HDFS-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124991#comment-16124991 ] Hadoop QA commented on HDFS-12295: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 15s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} trunk Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 27s{color} | {color:blue} Maven dependency ordering for branch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 13m 55s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 41s{color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 28s{color} | {color:green} trunk passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 24s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs-client in trunk has 2 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 40s{color} | {color:red} hadoop-hdfs-project/hadoop-hdfs in trunk has 9 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s{color} | {color:green} trunk passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue} 0m 7s{color} | {color:blue} Maven dependency ordering for patch {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 21s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 23s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s{color} | {color:red} The patch has 3 line(s) that end in whitespace. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 15s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 12s{color} | {color:green} hadoop-hdfs-client in the patch passed. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 64m 8s{color} | {color:red} hadoop-hdfs in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 16s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 98m 29s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hdfs.TestDFSStripedInputStreamWithRandomECPolicy | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:14b5c93 | | JIRA Issue | HDFS-12295 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12881657/HDFS-12295.001.patch | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux e75ca4a98908 3.13.0-119-generic #166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 7769e96 | | Default Java | 1.8.0_144 | | findbugs | v3.1.0-RC1 | | findbugs | https://builds.apache.org/job/PreCommit-HDFS-Build/20675
[jira] [Commented] (HDFS-12295) NameNode to support file path prefix /.reserved/bypassExtAttr
[ https://issues.apache.org/jira/browse/HDFS-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16124754#comment-16124754 ] Hadoop QA commented on HDFS-12295: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s{color} | {color:blue} Docker mode activated. {color} | | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s{color} | {color:red} HDFS-12295 does not apply to trunk. Rebase required? Wrong Branch? See https://wiki.apache.org/hadoop/HowToContribute for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Issue | HDFS-12295 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12881633/HDFS-12295.001.patch | | Console output | https://builds.apache.org/job/PreCommit-HDFS-Build/20673/console | | Powered by | Apache Yetus 0.6.0-SNAPSHOT http://yetus.apache.org | This message was automatically generated. > NameNode to support file path prefix /.reserved/bypassExtAttr > - > > Key: HDFS-12295 > URL: https://issues.apache.org/jira/browse/HDFS-12295 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs, namenode >Reporter: Yongjun Zhang >Assignee: Yongjun Zhang > Attachments: HDFS-12295.001.patch > > > Let NameNode to support prefix /.reserved/bypassExtAttr, so client can add > thisprefix to a path before calling getFileStatus, e.g. /ab/c becomes > /.reserved/bypassExtAttr/a/b/c. NN will parse the path at the very beginning, > and bypass external attribute provider if the prefix is there. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org