[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved
[ https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089900#comment-14089900 ] Charles Lamb commented on MAPREDUCE-6007: - Thanks! So, I'll . Add a test like the one above . fix the doc as you mentioned . Add a makeQualified call . Put quotes around the paths in the exception messages > Create a new option for distcp -p which causes raw.* namespace extended > attributes to not be preserved > -- > > Key: MAPREDUCE-6007 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Affects Versions: fs-encryption >Reporter: Charles Lamb >Assignee: Charles Lamb > Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch, > MAPREDUCE-6007.003.patch > > > As part of the Data at Rest Encryption work (HDFS-6134), we need to create a > new option for distcp which causes raw.* namespace extended attributes to not > be preserved. See the doc in HDFS-6509 for details. The default for this > option will be to preserve raw.* xattrs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved
[ https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089890#comment-14089890 ] Andrew Wang commented on MAPREDUCE-6007: Okay, so I need to stop rushing this :) If you fix my above test by removing the "raw" path components, you'll see that the target path isn't being qualified before being checked. Try adding this near the top of SimpleCopyListing#validatePaths: {code} # Qualify the target path before checking targetPath = targetFS.makeQualified(targetPath); final boolean targetIsReservedRaw = Path.getPathWithoutSchemeAndAuthority(targetPath).toString(). startsWith(HDFS_RESERVED_RAW_DIRECTORY_NAME); {code} > Create a new option for distcp -p which causes raw.* namespace extended > attributes to not be preserved > -- > > Key: MAPREDUCE-6007 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Affects Versions: fs-encryption >Reporter: Charles Lamb >Assignee: Charles Lamb > Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch, > MAPREDUCE-6007.003.patch > > > As part of the Data at Rest Encryption work (HDFS-6134), we need to create a > new option for distcp which causes raw.* namespace extended attributes to not > be preserved. See the doc in HDFS-6509 for details. The default for this > option will be to preserve raw.* xattrs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved
[ https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089878#comment-14089878 ] Andrew Wang commented on MAPREDUCE-6007: Eh, I looked at the test output, and it's complaining about "raw/src doesn't exist". I guess distcp doesn't support relative paths? In that case, +1 pending the doc change. > Create a new option for distcp -p which causes raw.* namespace extended > attributes to not be preserved > -- > > Key: MAPREDUCE-6007 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Affects Versions: fs-encryption >Reporter: Charles Lamb >Assignee: Charles Lamb > Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch, > MAPREDUCE-6007.003.patch > > > As part of the Data at Rest Encryption work (HDFS-6134), we need to create a > new option for distcp which causes raw.* namespace extended attributes to not > be preserved. See the doc in HDFS-6509 for details. The default for this > option will be to preserve raw.* xattrs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved
[ https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14089876#comment-14089876 ] Andrew Wang commented on MAPREDUCE-6007: Nice work, this is way simpler. I think we're pretty close. * In the md.vm file, let's scratch the change to the table, I think the section is enough by itself. * Not sure we're fully qualifying relative paths correctly. I wrote a small test which I expected to work. Could you confirm? I think we just need to qualify the src paths with the src FileSystem first. {code} @Test public void testWorkingDir() throws Exception { final Path wd = fs.getWorkingDirectory(); try { fs.setWorkingDirectory(new Path("/.reserved/raw/")); doTestPreserveRawXAttrs("raw/src", "raw/dest", "-px", true, true, DistCpConstants.SUCCESS); } finally { fs.setWorkingDirectory(wd); } } {code} > Create a new option for distcp -p which causes raw.* namespace extended > attributes to not be preserved > -- > > Key: MAPREDUCE-6007 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Affects Versions: fs-encryption >Reporter: Charles Lamb >Assignee: Charles Lamb > Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch, > MAPREDUCE-6007.003.patch > > > As part of the Data at Rest Encryption work (HDFS-6134), we need to create a > new option for distcp which causes raw.* namespace extended attributes to not > be preserved. See the doc in HDFS-6509 for details. The default for this > option will be to preserve raw.* xattrs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved
[ https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14088300#comment-14088300 ] Andrew Wang commented on MAPREDUCE-6007: bq. the only typo above is the last line which should be "no raw xattrs are preserved" If none of these flags are specified, AFAIK neither non-raw or raw xattrs are preserved, i.e. no xattrs. Yes? bq. I convinced myself that a relative path could never be relative to /.reserved/raw since you can't set your working directory to that. AFAIK you can set your wd to whatever you want, and you can have ".." in absolute paths too. We need to make sure that this path is fully normalized if we're doing a prefix check. Paths from a FileStatus are normalized, but paths coming from the user (like the ones coming out of a DistCpOptions) are suspect. setTargetPathExists has one of these suspect checks. Doc * This is hard to read, could we expand this into a separate section and a new table? I'd particularly like to see a fuller explanation of what happens with different dst options. CopyListing * Let's improve the InvalidInputException message. Paths don't really "specify" something, you could say "starts with" or something instead. We should also print the target path. * I don't quite understand this error either, why is a {{/.r/r}} src and {{-pd}} not okay? The exception also mentions the target not starting with {{/.r/r}}, but that's not part of the if check. * Line longer than 80chars * I expected to see a check that was "if (-p || -px) && !-pd && src is /.r/r, then also check that the dst supports xattrs and is /.r/r". I wish there was a way to test that it's HDFS too, but looking for dest having /.r/r is probably good enough. CopyMapper * Can we expand the block comment to say that toCopyListingFileStatus is used to filter xattrs, and passing copyXAttrs in twice is okay because we already did it earlier? The double passing looks weird, though logically correct. DistCp: * I really don't like setting the DISABLERAWXATTRS flag in setTargetPathExists, since the expectation is that Options flags are set by the user. This method is also not named such that doing this there makes sense. We have the target path via the DistCpOptions, so let's be explicit and verbose with the checks instead. This is quite possibly why the CopyListing check is confusing to me. * To expand on the above, -px means preserving all xattrs, while -pxd means preserving non-raw xattrs. Then we have {{toCopyListingFileStatus}} where the {{preserveXAttrs}} parameter actually means "preserve non-raw xattrs". This is also definitely confusing... DistCpOptionSwitch: * XATTR is not a standard capitalization style, let's lower case it as xattr here. XAttr isn't standard either, but that ship has sailed. Test * I'd like tests for weird src and dst paths, i.e. relative or containing ".."s * We could also test the "no preserve flags" behavior, that no xattrs at all are preserved. > Create a new option for distcp -p which causes raw.* namespace extended > attributes to not be preserved > -- > > Key: MAPREDUCE-6007 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Affects Versions: fs-encryption >Reporter: Charles Lamb >Assignee: Charles Lamb > Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch > > > As part of the Data at Rest Encryption work (HDFS-6134), we need to create a > new option for distcp which causes raw.* namespace extended attributes to not > be preserved. See the doc in HDFS-6509 for details. The default for this > option will be to preserve raw.* xattrs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved
[ https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14087131#comment-14087131 ] Hadoop QA commented on MAPREDUCE-6007: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12659910/MAPREDUCE-6007.002.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:red}-1 javac{color:red}. The patch appears to cause the build to fail. Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/4789//console This message is automatically generated. > Create a new option for distcp -p which causes raw.* namespace extended > attributes to not be preserved > -- > > Key: MAPREDUCE-6007 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Affects Versions: fs-encryption >Reporter: Charles Lamb >Assignee: Charles Lamb > Attachments: MAPREDUCE-6007.001.patch, MAPREDUCE-6007.002.patch > > > As part of the Data at Rest Encryption work (HDFS-6134), we need to create a > new option for distcp which causes raw.* namespace extended attributes to not > be preserved. See the doc in HDFS-6509 for details. The default for this > option will be to preserve raw.* xattrs. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (MAPREDUCE-6007) Create a new option for distcp -p which causes raw.* namespace extended attributes to not be preserved
[ https://issues.apache.org/jira/browse/MAPREDUCE-6007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14085543#comment-14085543 ] Andrew Wang commented on MAPREDUCE-6007: Hi Charles, thanks for the patch, I had to take notes while reviewing this patch, the behavior is kind of complicated. We have a variety of flags that can be specified, and the destination FS can have different levels of support. It'd be very useful to specify this behavior in gory detail in the DistCp documentation. Check me on this though: Options: {noformat} -px : preserve raw and non-raw xattrs -pr : no xattrs are preserved -p : preserve raw xattrs -pxr: preserve non-raw xattrs : no xattrs are preserved {noformat} Behavior with a given src and dst, varying levels of dst support: * raw src, raw dst: the options apply as specified above * raw src, not-raw dst, dst supports xattrs but no {{/reserved/.raw}}: we will fail to set raw xattrs at runtime. * raw src, dst doesn't support xattrs: if {{-pX}} is specified, throws an exception. Else, silently discards raw xattrs. Some discussion on the above: * If the src is {{/reserved/.raw}}, the user is expecting preservation of raw xattrs when {{-p}} or {{-pX}} is specified. In this scenario, we should test that the dest is {{/.reserved/raw}} and that it's present on the dstFS. * There might be other weird cases, haven't thought through all of them Some code review comments: Misc: - We have both {{noPreserveRaw}} and {{preserveRaw}} booleans, can we standardize on one everywhere? I'd like a negative one, call it {{disableRaw}} or {{excludeRaw}} since it better captures the meaning of the flag. {{exclude}} feels a bit better IMO, but it looks like {{-pe}} is taken. - What's the expected behavior when the dest doesn't support xattrs or reserved raw, or supports xattrs but not reserved raw? - CopyListing, this is where we'd also test to see if the destFS has a /.reserved/raw directory - CopyMapper, two periods in the block comment Documentation: - I don't want to tie raw preservation just to encryption since we might also use it for compression, how about this instead: {quote} d: disable preservation of raw namespace extended attributes ... raw namespace extended attributes are preserved by default if supported. Specifying -pd disables preservation of these xattrs. {quote} - As noted above, it'd be good to have the expected preservation behavior laid out in the distcp documentation. DistCp: {code} if (!Path.getPathWithoutSchemeAndAuthority(target).toString(). {code} What if the target is a relative path here? Test: - Any reason this isn't part of the existing XAttr test? They seem pretty similar, and you also added a PXD test to the existing test. - Don't need to do makeFilesAndDirs inO the BeforeClass - Doesn't there need to be a non-raw attribute set so you can test some of these combinations? - Can we test what happens when the dest FS doesn't support xattrs or raw xattrs? > Create a new option for distcp -p which causes raw.* namespace extended > attributes to not be preserved > -- > > Key: MAPREDUCE-6007 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6007 > Project: Hadoop Map/Reduce > Issue Type: New Feature > Components: distcp >Affects Versions: fs-encryption >Reporter: Charles Lamb >Assignee: Charles Lamb > Attachments: MAPREDUCE-6007.001.patch > > > As part of the Data at Rest Encryption work (HDFS-6134), we need to create a > new option for distcp which causes raw.* namespace extended attributes to not > be preserved. See the doc in HDFS-6509 for details. The default for this > option will be to preserve raw.* xattrs. -- This message was sent by Atlassian JIRA (v6.2#6252)