[jira] [Commented] (MAPREDUCE-6749) MR AM should reuse containers for Map/Reduce Tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574159#comment-15574159 ] Devaraj K commented on MAPREDUCE-6749: -- Thanks [~Naganarasimha] for the responses and discussion. bq. hope we can create a new branch and get the things in there so that its easier for others to have a look before it gets into the trunk or main stream branches Sure, I will create a branch for this. bq. If there was a way for admin to enforce it it would be usefull. If its just client level configuration it just adds into already long list of configurations and users will not be clear what to configure for it. And besides would it be better to have just how many tasks can reuse a given container and try to avoid for Map and Reduce seperately ? I am thinking that we can provide comprehensive way to control this feature, I can convince that it will be an another configuration for the user to configure it. We can discuss this in MAPREDUCE-6772/MAPREDUCE-6773. bq. Btw it could be also good to introduce a metric for number of Map Tasks or Reduce tasks which has reused the containers Good thought, we can have metrics for this. bq. This was the problem which we generally faced and difficult for the customers to understand that entire log is not for the task attempt, so was wondering to have any better approach to this. We can think of displaying the part of the container log(logs generated for a task attempt) in JHS Web UI instead of the whole container logs by discarding the other task attempt logs. Do you think this sounds ok or any better way? > MR AM should reuse containers for Map/Reduce Tasks > -- > > Key: MAPREDUCE-6749 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6749 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Devaraj K >Assignee: Devaraj K > Attachments: MAPREDUCE-6749-Container Reuse-v0.pdf > > > It is with the continuation of MAPREDUCE-3902, MR AM should reuse containers > for Map/Reduce Tasks similar to the JVM Reuse feature we had in MRv1. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6749) MR AM should reuse containers for Map/Reduce Tasks
[ https://issues.apache.org/jira/browse/MAPREDUCE-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573131#comment-15573131 ] Naganarasimha G R commented on MAPREDUCE-6749: -- Hi [~devaraj.k], Thanks for the detailed explanation As it would involve considerable modifications and changes in core code, hope we can create a new branch and get the things in there so that its easier for others to have a look before it gets into the trunk or main stream branches bq. I think the limit configuration for no of map/reduce reuse containers would allow other applications to start running without waiting for the Job to be finished when reuse is enabled. If there is a big Job running which could occupy the entire cluster, and then any high priority application gets submitted this limit for maps/reduce container would probably give a room for high priority application to start running without preempting the containers of the previous Job. By default there is no limit for number of containers to be reused and if any user/Job wanted to have this constraint they can configure it. Yes i understand it thanks for the explanation, but issue would be how the application knows whats the right configuration for these, in application per se they would think it would be always right to run all the tasks in the given container than launching more containers. If there was a way for admin to enforce it it would be usefull. If its just client level configuration it just adds into already long list of configurations and users will not be clear what to configure for it. And besides would it be better to have just how many tasks can reuse a given container and try to avoid for Map and Reduce seperately ? Btw it could be also good to introduce a metric for number of Map Tasks or Reduce tasks which has reused the containers bq. If you want to have a try this feature, you can apply MAPREDUCE-6773, MAPREDUCE-6781, MAPREDUCE-6784, MAPREDUCE-6785, MAPREDUCE-6786 and then try this feature. Sure Deva will try it over the weekend and update you, anyway started to take look at them bq. Here we should note that the whole container log which is displaying for TaskAttempt is not applicable to the TaskAttempt and the log can be identified easily which part applicable to it. This was the problem which we generally faced and difficult for the customers to understand that entire log is not for the task attempt, so was wondering to have any better approach to this. > MR AM should reuse containers for Map/Reduce Tasks > -- > > Key: MAPREDUCE-6749 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6749 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: applicationmaster, mrv2 >Reporter: Devaraj K >Assignee: Devaraj K > Attachments: MAPREDUCE-6749-Container Reuse-v0.pdf > > > It is with the continuation of MAPREDUCE-3902, MR AM should reuse containers > for Map/Reduce Tasks similar to the JVM Reuse feature we had in MRv1. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571790#comment-15571790 ] Hadoop QA commented on MAPREDUCE-6792: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 41s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 22s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 14s {color} | {color:green} hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: The patch generated 0 new + 4 unchanged - 1 fixed = 4 total (was 5) {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 30s {color} | {color:green} hadoop-mapreduce-client-core in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s {color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 16m 47s {color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:9560f25 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12833057/MAPREDUCE-6792.1.patch | | JIRA Issue | MAPREDUCE-6792 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux dd6e11e806bd 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / 901eca0 | | Default Java | 1.8.0_101 | | findbugs | v3.0.0 | | Test Results | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6758/testReport/ | | modules | C: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core U: hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core | | Console output | https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6758/console | | Powered by | Apache Yetus 0.3.0 http://yetus.apache.org | This message was automatically generated. > Allow user's full principal name as owner of MapReduce staging directory in > JobSubmissionFiles#JobStagingDir() > -- > > Key:
[jira] [Updated] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6792: -- Target Version/s: 3.0.0-alpha1, 2.9.0 (was: 3.0.0-alpha1) > Allow user's full principal name as owner of MapReduce staging directory in > JobSubmissionFiles#JobStagingDir() > -- > > Key: MAPREDUCE-6792 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6792 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Santhosh G Nayak >Assignee: Santhosh G Nayak > Attachments: MAPREDUCE-6792.1.patch > > > Background - > Currently, {{JobSubmissionFiles#JobStagingDir()}} assumes that file owner > returned as part of {{FileSystem#getFileStatus()}} is always user's short > principal name, which is true for HDFS. But, some file systems which are HDFS > compatible like [Azure Data Lake Store (ADLS) > |https://azure.microsoft.com/en-in/services/data-lake-store/] and work in > multi tenant environment can have users with same names belonging to > different domains. For example, {{us...@company1.com}} and > {{us...@company2.com}}. It will be ambiguous, if > {{FileSystem#getFileStatus()}} returns only the user's short principal name > (without domain name) as the owner of the file/directory. > The following code block allows only short user principal name as owner. It > simply fails saying that ownership on the staging directory is not as > expected, if owner returned by the {{FileStatus#getOwner()}} is not equal to > short principal name of the current user. > {code} > String realUser; > String currentUser; > UserGroupInformation ugi = UserGroupInformation.getLoginUser(); > realUser = ugi.getShortUserName(); > currentUser = UserGroupInformation.getCurrentUser().getShortUserName(); > if (fs.exists(stagingArea)) { > FileStatus fsStatus = fs.getFileStatus(stagingArea); > String owner = fsStatus.getOwner(); > if (!(owner.equals(currentUser) || owner.equals(realUser))) { > throw new IOException("The ownership on the staging directory " + > stagingArea + " is not as expected. " + > "It is owned by " + owner + ". The directory must " + > "be owned by the submitter " + currentUser + " or " + > "by " + realUser); > } > {code} > The proposal is to remove the strict restriction on short principal name by > allowing the user's full principal name as owner of staging area directory in > {{JobSubmissionFiles#JobStagingDir()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6792: -- Status: Patch Available (was: Open) Submit the patch for kick off Jenkins' test. The patch looks good in overall. Several comments: 1. {{fileOwner.equalsIgnoreCase(currentUser.getUserName())}} - I think our current assumption in hadoop is user name should be case sensitive, so user and USER are treated as different user. In AzureFS or other similar cloud based FS, do we change the assumption here especially for domain name? If not, we should keep case sensitive check here. 2. The exception message include all possible usernames, it could be duplicated in case login user = real user (in case no proxy user get used). So we should do a quick check and only log both when login user != real user. Isn't it? 3. It would be great if we can figure out some way to add unit test for use case that we are adding here. > Allow user's full principal name as owner of MapReduce staging directory in > JobSubmissionFiles#JobStagingDir() > -- > > Key: MAPREDUCE-6792 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6792 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Santhosh G Nayak >Assignee: Santhosh G Nayak > Attachments: MAPREDUCE-6792.1.patch > > > Background - > Currently, {{JobSubmissionFiles#JobStagingDir()}} assumes that file owner > returned as part of {{FileSystem#getFileStatus()}} is always user's short > principal name, which is true for HDFS. But, some file systems which are HDFS > compatible like [Azure Data Lake Store (ADLS) > |https://azure.microsoft.com/en-in/services/data-lake-store/] and work in > multi tenant environment can have users with same names belonging to > different domains. For example, {{us...@company1.com}} and > {{us...@company2.com}}. It will be ambiguous, if > {{FileSystem#getFileStatus()}} returns only the user's short principal name > (without domain name) as the owner of the file/directory. > The following code block allows only short user principal name as owner. It > simply fails saying that ownership on the staging directory is not as > expected, if owner returned by the {{FileStatus#getOwner()}} is not equal to > short principal name of the current user. > {code} > String realUser; > String currentUser; > UserGroupInformation ugi = UserGroupInformation.getLoginUser(); > realUser = ugi.getShortUserName(); > currentUser = UserGroupInformation.getCurrentUser().getShortUserName(); > if (fs.exists(stagingArea)) { > FileStatus fsStatus = fs.getFileStatus(stagingArea); > String owner = fsStatus.getOwner(); > if (!(owner.equals(currentUser) || owner.equals(realUser))) { > throw new IOException("The ownership on the staging directory " + > stagingArea + " is not as expected. " + > "It is owned by " + owner + ". The directory must " + > "be owned by the submitter " + currentUser + " or " + > "by " + realUser); > } > {code} > The proposal is to remove the strict restriction on short principal name by > allowing the user's full principal name as owner of staging area directory in > {{JobSubmissionFiles#JobStagingDir()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Commented] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571558#comment-15571558 ] Junping Du commented on MAPREDUCE-6792: --- Hi [~snayak], thanks for reporting the issue and your patch contribution! I think this issue reported here is valid so assign the JIRA to you. Will review you patch and put up my comments soon. > Allow user's full principal name as owner of MapReduce staging directory in > JobSubmissionFiles#JobStagingDir() > -- > > Key: MAPREDUCE-6792 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6792 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Santhosh G Nayak >Assignee: Santhosh G Nayak > Attachments: MAPREDUCE-6792.1.patch > > > Background - > Currently, {{JobSubmissionFiles#JobStagingDir()}} assumes that file owner > returned as part of {{FileSystem#getFileStatus()}} is always user's short > principal name, which is true for HDFS. But, some file systems which are HDFS > compatible like [Azure Data Lake Store (ADLS) > |https://azure.microsoft.com/en-in/services/data-lake-store/] and work in > multi tenant environment can have users with same names belonging to > different domains. For example, {{us...@company1.com}} and > {{us...@company2.com}}. It will be ambiguous, if > {{FileSystem#getFileStatus()}} returns only the user's short principal name > (without domain name) as the owner of the file/directory. > The following code block allows only short user principal name as owner. It > simply fails saying that ownership on the staging directory is not as > expected, if owner returned by the {{FileStatus#getOwner()}} is not equal to > short principal name of the current user. > {code} > String realUser; > String currentUser; > UserGroupInformation ugi = UserGroupInformation.getLoginUser(); > realUser = ugi.getShortUserName(); > currentUser = UserGroupInformation.getCurrentUser().getShortUserName(); > if (fs.exists(stagingArea)) { > FileStatus fsStatus = fs.getFileStatus(stagingArea); > String owner = fsStatus.getOwner(); > if (!(owner.equals(currentUser) || owner.equals(realUser))) { > throw new IOException("The ownership on the staging directory " + > stagingArea + " is not as expected. " + > "It is owned by " + owner + ". The directory must " + > "be owned by the submitter " + currentUser + " or " + > "by " + realUser); > } > {code} > The proposal is to remove the strict restriction on short principal name by > allowing the user's full principal name as owner of staging area directory in > {{JobSubmissionFiles#JobStagingDir()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Junping Du updated MAPREDUCE-6792: -- Assignee: Santhosh G Nayak > Allow user's full principal name as owner of MapReduce staging directory in > JobSubmissionFiles#JobStagingDir() > -- > > Key: MAPREDUCE-6792 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6792 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Santhosh G Nayak >Assignee: Santhosh G Nayak > Attachments: MAPREDUCE-6792.1.patch > > > Background - > Currently, {{JobSubmissionFiles#JobStagingDir()}} assumes that file owner > returned as part of {{FileSystem#getFileStatus()}} is always user's short > principal name, which is true for HDFS. But, some file systems which are HDFS > compatible like [Azure Data Lake Store (ADLS) > |https://azure.microsoft.com/en-in/services/data-lake-store/] and work in > multi tenant environment can have users with same names belonging to > different domains. For example, {{us...@company1.com}} and > {{us...@company2.com}}. It will be ambiguous, if > {{FileSystem#getFileStatus()}} returns only the user's short principal name > (without domain name) as the owner of the file/directory. > The following code block allows only short user principal name as owner. It > simply fails saying that ownership on the staging directory is not as > expected, if owner returned by the {{FileStatus#getOwner()}} is not equal to > short principal name of the current user. > {code} > String realUser; > String currentUser; > UserGroupInformation ugi = UserGroupInformation.getLoginUser(); > realUser = ugi.getShortUserName(); > currentUser = UserGroupInformation.getCurrentUser().getShortUserName(); > if (fs.exists(stagingArea)) { > FileStatus fsStatus = fs.getFileStatus(stagingArea); > String owner = fsStatus.getOwner(); > if (!(owner.equals(currentUser) || owner.equals(realUser))) { > throw new IOException("The ownership on the staging directory " + > stagingArea + " is not as expected. " + > "It is owned by " + owner + ". The directory must " + > "be owned by the submitter " + currentUser + " or " + > "by " + realUser); > } > {code} > The proposal is to remove the strict restriction on short principal name by > allowing the user's full principal name as owner of staging area directory in > {{JobSubmissionFiles#JobStagingDir()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Updated] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()
[ https://issues.apache.org/jira/browse/MAPREDUCE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santhosh G Nayak updated MAPREDUCE-6792: Attachment: MAPREDUCE-6792.1.patch Attaching a patch containing the proposed changes. > Allow user's full principal name as owner of MapReduce staging directory in > JobSubmissionFiles#JobStagingDir() > -- > > Key: MAPREDUCE-6792 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6792 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: client >Reporter: Santhosh G Nayak > Attachments: MAPREDUCE-6792.1.patch > > > Background - > Currently, {{JobSubmissionFiles#JobStagingDir()}} assumes that file owner > returned as part of {{FileSystem#getFileStatus()}} is always user's short > principal name, which is true for HDFS. But, some file systems which are HDFS > compatible like [Azure Data Lake Store (ADLS) > |https://azure.microsoft.com/en-in/services/data-lake-store/] and work in > multi tenant environment can have users with same names belonging to > different domains. For example, {{us...@company1.com}} and > {{us...@company2.com}}. It will be ambiguous, if > {{FileSystem#getFileStatus()}} returns only the user's short principal name > (without domain name) as the owner of the file/directory. > The following code block allows only short user principal name as owner. It > simply fails saying that ownership on the staging directory is not as > expected, if owner returned by the {{FileStatus#getOwner()}} is not equal to > short principal name of the current user. > {code} > String realUser; > String currentUser; > UserGroupInformation ugi = UserGroupInformation.getLoginUser(); > realUser = ugi.getShortUserName(); > currentUser = UserGroupInformation.getCurrentUser().getShortUserName(); > if (fs.exists(stagingArea)) { > FileStatus fsStatus = fs.getFileStatus(stagingArea); > String owner = fsStatus.getOwner(); > if (!(owner.equals(currentUser) || owner.equals(realUser))) { > throw new IOException("The ownership on the staging directory " + > stagingArea + " is not as expected. " + > "It is owned by " + owner + ". The directory must " + > "be owned by the submitter " + currentUser + " or " + > "by " + realUser); > } > {code} > The proposal is to remove the strict restriction on short principal name by > allowing the user's full principal name as owner of staging area directory in > {{JobSubmissionFiles#JobStagingDir()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org
[jira] [Created] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()
Santhosh G Nayak created MAPREDUCE-6792: --- Summary: Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir() Key: MAPREDUCE-6792 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6792 Project: Hadoop Map/Reduce Issue Type: Improvement Components: client Reporter: Santhosh G Nayak Background - Currently, {{JobSubmissionFiles#JobStagingDir()}} assumes that file owner returned as part of {{FileSystem#getFileStatus()}} is always user's short principal name, which is true for HDFS. But, some file systems which are HDFS compatible like [Azure Data Lake Store (ADLS) |https://azure.microsoft.com/en-in/services/data-lake-store/] and work in multi tenant environment can have users with same names belonging to different domains. For example, {{us...@company1.com}} and {{us...@company2.com}}. It will be ambiguous, if {{FileSystem#getFileStatus()}} returns only the user's short principal name (without domain name) as the owner of the file/directory. The following code block allows only short user principal name as owner. It simply fails saying that ownership on the staging directory is not as expected, if owner returned by the {{FileStatus#getOwner()}} is not equal to short principal name of the current user. {code} String realUser; String currentUser; UserGroupInformation ugi = UserGroupInformation.getLoginUser(); realUser = ugi.getShortUserName(); currentUser = UserGroupInformation.getCurrentUser().getShortUserName(); if (fs.exists(stagingArea)) { FileStatus fsStatus = fs.getFileStatus(stagingArea); String owner = fsStatus.getOwner(); if (!(owner.equals(currentUser) || owner.equals(realUser))) { throw new IOException("The ownership on the staging directory " + stagingArea + " is not as expected. " + "It is owned by " + owner + ". The directory must " + "be owned by the submitter " + currentUser + " or " + "by " + realUser); } {code} The proposal is to remove the strict restriction on short principal name by allowing the user's full principal name as owner of staging area directory in {{JobSubmissionFiles#JobStagingDir()}}. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org