[jira] [Commented] (MAPREDUCE-6749) MR AM should reuse containers for Map/Reduce Tasks

2016-10-13 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15574159#comment-15574159
 ] 

Devaraj K commented on MAPREDUCE-6749:
--

Thanks [~Naganarasimha] for the responses and discussion.
bq. hope we can create a new branch and get the things in there so that its 
easier for others to have a look before it gets into the trunk or main stream 
branches
Sure, I will create a branch for this.

bq. If there was a way for admin to enforce it it would be usefull. If its just 
client level configuration it just adds into already long list of 
configurations and users will not be clear what to configure for it. And 
besides would it be better to have just how many tasks can reuse a given 
container and try to avoid for Map and Reduce seperately ?
I am thinking that we can provide comprehensive way to control this feature, I 
can convince that it will be an another configuration for the user to configure 
it. We can discuss this in MAPREDUCE-6772/MAPREDUCE-6773.

bq. Btw it could be also good to introduce a metric for number of Map Tasks or 
Reduce tasks which has reused the containers
Good thought, we can have metrics for this.


bq. This was the problem which we generally faced and difficult for the 
customers to understand that entire log is not for the task attempt, so was 
wondering to have any better approach to this.
We can think of displaying the part of the container log(logs generated for a 
task attempt) in JHS Web UI instead of the whole container logs by discarding 
the other task attempt logs. Do you think this sounds ok or any better way?


> MR AM should reuse containers for Map/Reduce Tasks
> --
>
> Key: MAPREDUCE-6749
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6749
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: MAPREDUCE-6749-Container Reuse-v0.pdf
>
>
> It is with the continuation of MAPREDUCE-3902, MR AM should reuse containers 
> for Map/Reduce Tasks similar to the JVM Reuse feature we had in MRv1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6749) MR AM should reuse containers for Map/Reduce Tasks

2016-10-13 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15573131#comment-15573131
 ] 

Naganarasimha G R commented on MAPREDUCE-6749:
--

Hi [~devaraj.k],
Thanks for the detailed explanation As it would involve considerable 
modifications and changes in core code, hope we can create a new branch and get 
the things in there so that its easier for others to have a look before it gets 
into the trunk or main stream branches

bq. I think the limit configuration for no of map/reduce reuse containers would 
allow other applications to start running without waiting for the Job to be 
finished when reuse is enabled. If there is a big Job running which could 
occupy the entire cluster, and then any high priority application gets 
submitted this limit for maps/reduce container would probably give a room for 
high priority application to start running without preempting the containers of 
the previous Job. By default there is no limit for number of containers to be 
reused and if any user/Job wanted to have this constraint they can configure it.
Yes i understand it thanks for the explanation, but issue would be how the 
application knows whats the right configuration for these, in application per 
se they would think it would be always right to run all the tasks in the given 
container than launching more containers. If there was a way for admin to 
enforce it it would be usefull. If its just client level configuration it just 
adds into already long list of configurations and users will not be clear what 
to configure for it. And besides would it be better to have just how many tasks 
can reuse a given container and try to avoid for Map and Reduce seperately ?

Btw it could be also good to introduce a metric for number of Map Tasks or 
Reduce tasks which has reused the containers

bq. If you want to have a try this feature, you can apply MAPREDUCE-6773, 
MAPREDUCE-6781, MAPREDUCE-6784, MAPREDUCE-6785, MAPREDUCE-6786 and then try 
this feature. 
Sure Deva will try it over the weekend and update you, anyway started to take 
look at them

bq. Here we should note that the whole container log which is displaying for 
TaskAttempt is not applicable to the TaskAttempt and the log can be identified 
easily which part applicable to it.
This was the problem which we generally faced and difficult for the customers 
to understand that entire log is not for the task attempt, so was wondering to 
have any better approach to this. 








> MR AM should reuse containers for Map/Reduce Tasks
> --
>
> Key: MAPREDUCE-6749
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6749
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: applicationmaster, mrv2
>Reporter: Devaraj K
>Assignee: Devaraj K
> Attachments: MAPREDUCE-6749-Container Reuse-v0.pdf
>
>
> It is with the continuation of MAPREDUCE-3902, MR AM should reuse containers 
> for Map/Reduce Tasks similar to the JVM Reuse feature we had in MRv1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()

2016-10-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571790#comment-15571790
 ] 

Hadoop QA commented on MAPREDUCE-6792:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 18s 
{color} | {color:blue} Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s 
{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s 
{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 
41s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 23s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
16s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 30s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
13s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
49s {color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s 
{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 
22s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 21s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 
14s {color} | {color:green} 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: 
The patch generated 0 new + 4 unchanged - 1 fixed = 4 total (was 5) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 0m 26s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 
11s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 
0s {color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 
57s {color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s 
{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 30s 
{color} | {color:green} hadoop-mapreduce-client-core in the patch passed. 
{color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 
18s {color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 16m 47s {color} 
| {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker |  Image:yetus/hadoop:9560f25 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12833057/MAPREDUCE-6792.1.patch
 |
| JIRA Issue | MAPREDUCE-6792 |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  findbugs  checkstyle  |
| uname | Linux dd6e11e806bd 3.13.0-95-generic #142-Ubuntu SMP Fri Aug 12 
17:00:09 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh 
|
| git revision | trunk / 901eca0 |
| Default Java | 1.8.0_101 |
| findbugs | v3.0.0 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6758/testReport/ |
| modules | C: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core 
U: 
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core |
| Console output | 
https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/6758/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Allow user's full principal name as owner of MapReduce staging directory in 
> JobSubmissionFiles#JobStagingDir()
> --
>
> Key: 

[jira] [Updated] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()

2016-10-13 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6792:
--
Target Version/s: 3.0.0-alpha1, 2.9.0  (was: 3.0.0-alpha1)

> Allow user's full principal name as owner of MapReduce staging directory in 
> JobSubmissionFiles#JobStagingDir()
> --
>
> Key: MAPREDUCE-6792
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6792
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: Santhosh G Nayak
>Assignee: Santhosh G Nayak
> Attachments: MAPREDUCE-6792.1.patch
>
>
> Background - 
> Currently, {{JobSubmissionFiles#JobStagingDir()}} assumes that file owner 
> returned as part of {{FileSystem#getFileStatus()}} is always user's short 
> principal name, which is true for HDFS. But, some file systems which are HDFS 
> compatible like [Azure Data Lake Store (ADLS) 
> |https://azure.microsoft.com/en-in/services/data-lake-store/] and work in 
> multi tenant environment can have users with same names belonging to 
> different domains. For example, {{us...@company1.com}} and 
> {{us...@company2.com}}. It will be ambiguous, if 
> {{FileSystem#getFileStatus()}} returns only the user's short principal name 
> (without domain name) as the owner of the file/directory. 
> The following code block allows only short user principal name as owner. It 
> simply fails saying that ownership on the staging directory is not as 
> expected, if owner returned by the {{FileStatus#getOwner()}} is not equal to 
> short principal name of the current user.
> {code}
> String realUser;
> String currentUser;
> UserGroupInformation ugi = UserGroupInformation.getLoginUser();
> realUser = ugi.getShortUserName();
> currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
> if (fs.exists(stagingArea)) {
>   FileStatus fsStatus = fs.getFileStatus(stagingArea);
>   String owner = fsStatus.getOwner();
>   if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>  throw new IOException("The ownership on the staging directory " +
>   stagingArea + " is not as expected. " +
>   "It is owned by " + owner + ". The directory must " +
>   "be owned by the submitter " + currentUser + " or " +
>   "by " + realUser);
>   }
> {code}
> The proposal is to remove the strict restriction on short principal name by 
> allowing the user's full principal name as owner of staging area directory in 
> {{JobSubmissionFiles#JobStagingDir()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()

2016-10-13 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6792:
--
Status: Patch Available  (was: Open)

Submit the patch for kick off Jenkins' test.
The patch looks good in overall. Several comments:
1. {{fileOwner.equalsIgnoreCase(currentUser.getUserName())}} - I think our 
current assumption in hadoop is user name should be case sensitive, so user and 
USER are treated as different user. In AzureFS or other similar cloud based FS, 
do we change the assumption here especially for domain name? If not, we should 
keep case sensitive check here.
2. The exception message include all possible usernames, it could be duplicated 
in case login user = real user (in case no proxy user get used). So we should 
do a quick check and only log both when login user != real user. Isn't it?
3. It would be great if we can figure out some way to add unit test for use 
case that we are adding here.

> Allow user's full principal name as owner of MapReduce staging directory in 
> JobSubmissionFiles#JobStagingDir()
> --
>
> Key: MAPREDUCE-6792
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6792
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: Santhosh G Nayak
>Assignee: Santhosh G Nayak
> Attachments: MAPREDUCE-6792.1.patch
>
>
> Background - 
> Currently, {{JobSubmissionFiles#JobStagingDir()}} assumes that file owner 
> returned as part of {{FileSystem#getFileStatus()}} is always user's short 
> principal name, which is true for HDFS. But, some file systems which are HDFS 
> compatible like [Azure Data Lake Store (ADLS) 
> |https://azure.microsoft.com/en-in/services/data-lake-store/] and work in 
> multi tenant environment can have users with same names belonging to 
> different domains. For example, {{us...@company1.com}} and 
> {{us...@company2.com}}. It will be ambiguous, if 
> {{FileSystem#getFileStatus()}} returns only the user's short principal name 
> (without domain name) as the owner of the file/directory. 
> The following code block allows only short user principal name as owner. It 
> simply fails saying that ownership on the staging directory is not as 
> expected, if owner returned by the {{FileStatus#getOwner()}} is not equal to 
> short principal name of the current user.
> {code}
> String realUser;
> String currentUser;
> UserGroupInformation ugi = UserGroupInformation.getLoginUser();
> realUser = ugi.getShortUserName();
> currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
> if (fs.exists(stagingArea)) {
>   FileStatus fsStatus = fs.getFileStatus(stagingArea);
>   String owner = fsStatus.getOwner();
>   if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>  throw new IOException("The ownership on the staging directory " +
>   stagingArea + " is not as expected. " +
>   "It is owned by " + owner + ". The directory must " +
>   "be owned by the submitter " + currentUser + " or " +
>   "by " + realUser);
>   }
> {code}
> The proposal is to remove the strict restriction on short principal name by 
> allowing the user's full principal name as owner of staging area directory in 
> {{JobSubmissionFiles#JobStagingDir()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Commented] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()

2016-10-13 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15571558#comment-15571558
 ] 

Junping Du commented on MAPREDUCE-6792:
---

Hi [~snayak], thanks for reporting the issue and your patch contribution! I 
think this issue reported here is valid so assign the JIRA to you. Will review 
you patch and put up my comments soon.

> Allow user's full principal name as owner of MapReduce staging directory in 
> JobSubmissionFiles#JobStagingDir()
> --
>
> Key: MAPREDUCE-6792
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6792
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: Santhosh G Nayak
>Assignee: Santhosh G Nayak
> Attachments: MAPREDUCE-6792.1.patch
>
>
> Background - 
> Currently, {{JobSubmissionFiles#JobStagingDir()}} assumes that file owner 
> returned as part of {{FileSystem#getFileStatus()}} is always user's short 
> principal name, which is true for HDFS. But, some file systems which are HDFS 
> compatible like [Azure Data Lake Store (ADLS) 
> |https://azure.microsoft.com/en-in/services/data-lake-store/] and work in 
> multi tenant environment can have users with same names belonging to 
> different domains. For example, {{us...@company1.com}} and 
> {{us...@company2.com}}. It will be ambiguous, if 
> {{FileSystem#getFileStatus()}} returns only the user's short principal name 
> (without domain name) as the owner of the file/directory. 
> The following code block allows only short user principal name as owner. It 
> simply fails saying that ownership on the staging directory is not as 
> expected, if owner returned by the {{FileStatus#getOwner()}} is not equal to 
> short principal name of the current user.
> {code}
> String realUser;
> String currentUser;
> UserGroupInformation ugi = UserGroupInformation.getLoginUser();
> realUser = ugi.getShortUserName();
> currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
> if (fs.exists(stagingArea)) {
>   FileStatus fsStatus = fs.getFileStatus(stagingArea);
>   String owner = fsStatus.getOwner();
>   if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>  throw new IOException("The ownership on the staging directory " +
>   stagingArea + " is not as expected. " +
>   "It is owned by " + owner + ". The directory must " +
>   "be owned by the submitter " + currentUser + " or " +
>   "by " + realUser);
>   }
> {code}
> The proposal is to remove the strict restriction on short principal name by 
> allowing the user's full principal name as owner of staging area directory in 
> {{JobSubmissionFiles#JobStagingDir()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()

2016-10-13 Thread Junping Du (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junping Du updated MAPREDUCE-6792:
--
Assignee: Santhosh G Nayak

> Allow user's full principal name as owner of MapReduce staging directory in 
> JobSubmissionFiles#JobStagingDir()
> --
>
> Key: MAPREDUCE-6792
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6792
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: Santhosh G Nayak
>Assignee: Santhosh G Nayak
> Attachments: MAPREDUCE-6792.1.patch
>
>
> Background - 
> Currently, {{JobSubmissionFiles#JobStagingDir()}} assumes that file owner 
> returned as part of {{FileSystem#getFileStatus()}} is always user's short 
> principal name, which is true for HDFS. But, some file systems which are HDFS 
> compatible like [Azure Data Lake Store (ADLS) 
> |https://azure.microsoft.com/en-in/services/data-lake-store/] and work in 
> multi tenant environment can have users with same names belonging to 
> different domains. For example, {{us...@company1.com}} and 
> {{us...@company2.com}}. It will be ambiguous, if 
> {{FileSystem#getFileStatus()}} returns only the user's short principal name 
> (without domain name) as the owner of the file/directory. 
> The following code block allows only short user principal name as owner. It 
> simply fails saying that ownership on the staging directory is not as 
> expected, if owner returned by the {{FileStatus#getOwner()}} is not equal to 
> short principal name of the current user.
> {code}
> String realUser;
> String currentUser;
> UserGroupInformation ugi = UserGroupInformation.getLoginUser();
> realUser = ugi.getShortUserName();
> currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
> if (fs.exists(stagingArea)) {
>   FileStatus fsStatus = fs.getFileStatus(stagingArea);
>   String owner = fsStatus.getOwner();
>   if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>  throw new IOException("The ownership on the staging directory " +
>   stagingArea + " is not as expected. " +
>   "It is owned by " + owner + ". The directory must " +
>   "be owned by the submitter " + currentUser + " or " +
>   "by " + realUser);
>   }
> {code}
> The proposal is to remove the strict restriction on short principal name by 
> allowing the user's full principal name as owner of staging area directory in 
> {{JobSubmissionFiles#JobStagingDir()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Updated] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()

2016-10-13 Thread Santhosh G Nayak (JIRA)

 [ 
https://issues.apache.org/jira/browse/MAPREDUCE-6792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh G Nayak updated MAPREDUCE-6792:

Attachment: MAPREDUCE-6792.1.patch

Attaching a patch containing the proposed changes.

> Allow user's full principal name as owner of MapReduce staging directory in 
> JobSubmissionFiles#JobStagingDir()
> --
>
> Key: MAPREDUCE-6792
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6792
> Project: Hadoop Map/Reduce
>  Issue Type: Improvement
>  Components: client
>Reporter: Santhosh G Nayak
> Attachments: MAPREDUCE-6792.1.patch
>
>
> Background - 
> Currently, {{JobSubmissionFiles#JobStagingDir()}} assumes that file owner 
> returned as part of {{FileSystem#getFileStatus()}} is always user's short 
> principal name, which is true for HDFS. But, some file systems which are HDFS 
> compatible like [Azure Data Lake Store (ADLS) 
> |https://azure.microsoft.com/en-in/services/data-lake-store/] and work in 
> multi tenant environment can have users with same names belonging to 
> different domains. For example, {{us...@company1.com}} and 
> {{us...@company2.com}}. It will be ambiguous, if 
> {{FileSystem#getFileStatus()}} returns only the user's short principal name 
> (without domain name) as the owner of the file/directory. 
> The following code block allows only short user principal name as owner. It 
> simply fails saying that ownership on the staging directory is not as 
> expected, if owner returned by the {{FileStatus#getOwner()}} is not equal to 
> short principal name of the current user.
> {code}
> String realUser;
> String currentUser;
> UserGroupInformation ugi = UserGroupInformation.getLoginUser();
> realUser = ugi.getShortUserName();
> currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
> if (fs.exists(stagingArea)) {
>   FileStatus fsStatus = fs.getFileStatus(stagingArea);
>   String owner = fsStatus.getOwner();
>   if (!(owner.equals(currentUser) || owner.equals(realUser))) {
>  throw new IOException("The ownership on the staging directory " +
>   stagingArea + " is not as expected. " +
>   "It is owned by " + owner + ". The directory must " +
>   "be owned by the submitter " + currentUser + " or " +
>   "by " + realUser);
>   }
> {code}
> The proposal is to remove the strict restriction on short principal name by 
> allowing the user's full principal name as owner of staging area directory in 
> {{JobSubmissionFiles#JobStagingDir()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org



[jira] [Created] (MAPREDUCE-6792) Allow user's full principal name as owner of MapReduce staging directory in JobSubmissionFiles#JobStagingDir()

2016-10-13 Thread Santhosh G Nayak (JIRA)
Santhosh G Nayak created MAPREDUCE-6792:
---

 Summary: Allow user's full principal name as owner of MapReduce 
staging directory in JobSubmissionFiles#JobStagingDir()
 Key: MAPREDUCE-6792
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6792
 Project: Hadoop Map/Reduce
  Issue Type: Improvement
  Components: client
Reporter: Santhosh G Nayak


Background - 
Currently, {{JobSubmissionFiles#JobStagingDir()}} assumes that file owner 
returned as part of {{FileSystem#getFileStatus()}} is always user's short 
principal name, which is true for HDFS. But, some file systems which are HDFS 
compatible like [Azure Data Lake Store (ADLS) 
|https://azure.microsoft.com/en-in/services/data-lake-store/] and work in multi 
tenant environment can have users with same names belonging to different 
domains. For example, {{us...@company1.com}} and {{us...@company2.com}}. It 
will be ambiguous, if {{FileSystem#getFileStatus()}} returns only the user's 
short principal name (without domain name) as the owner of the file/directory. 

The following code block allows only short user principal name as owner. It 
simply fails saying that ownership on the staging directory is not as expected, 
if owner returned by the {{FileStatus#getOwner()}} is not equal to short 
principal name of the current user.
{code}
String realUser;
String currentUser;
UserGroupInformation ugi = UserGroupInformation.getLoginUser();
realUser = ugi.getShortUserName();
currentUser = UserGroupInformation.getCurrentUser().getShortUserName();
if (fs.exists(stagingArea)) {
  FileStatus fsStatus = fs.getFileStatus(stagingArea);
  String owner = fsStatus.getOwner();
  if (!(owner.equals(currentUser) || owner.equals(realUser))) {
 throw new IOException("The ownership on the staging directory " +
  stagingArea + " is not as expected. " +
  "It is owned by " + owner + ". The directory must " +
  "be owned by the submitter " + currentUser + " or " +
  "by " + realUser);
  }
  {code}
The proposal is to remove the strict restriction on short principal name by 
allowing the user's full principal name as owner of staging area directory in 
{{JobSubmissionFiles#JobStagingDir()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org