[jira] [Commented] (YARN-10601) The Yarn client should use the UGI who created the Yarn client for obtaining a delegation token for the remote log dir

2021-01-31 Thread Prabhu Joseph (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276106#comment-17276106
 ] 

Prabhu Joseph commented on YARN-10601:
--

[~fritsi] Thanks for the details.

>> As you can see submitApplication is not invoked inside an ugi.doAs block

Why submitApplication is not invoked inside ugi.doAs block. If we need the log 
aggregation to happen as per submitterUser, the job also has to be submitted by 
submitterUser right?

> The Yarn client should use the UGI who created the Yarn client for obtaining 
> a delegation token for the remote log dir
> --
>
> Key: YARN-10601
> URL: https://issues.apache.org/jira/browse/YARN-10601
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: log-aggregation
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Daniel Fritsi
>Priority: Critical
>
> It seems there was a bug introduced in YARN-10333 in this section of 
> *{color:#0747A6}{{addLogAggregationDelegationToken}}{color}*:
> {code:java}
> Path remoteRootLogDir = fileController.getRemoteRootLogDir();
> FileSystem fs = remoteRootLogDir.getFileSystem(conf);
> final org.apache.hadoop.security.token.Token[] finalTokens =
> fs.addDelegationTokens(masterPrincipal, credentials);
> {code}
> *{color:#0747A6}{{remoteRootLogDir.getFileSystem}}{color}* simply does this:
> {code:java}
> public FileSystem getFileSystem(Configuration conf) throws IOException {
>   return FileSystem.get(this.toUri(), conf);
> }
> {code}
> As far as I know it's customary to create a YarnClient instance via 
> *{color:#0747A6}{{YarnClient.createYarnClient()}}{color}* in a 
> UserGroupInformation.doAs block if you would like to use it with a different 
> user then the current one. E.g.:
> {code:java}
> YarnClient yarnClient = ugi.doAs(new PrivilegedExceptionAction() {
> @Override
> public YarnClient run() throws Exception {
> YarnClient yarnClient = YarnClient.createYarnClient();
> yarnClient.init(conf);
> yarnClient.start();
> return yarnClient;
> }
> });
> {code}
> If this statement is correct then I think YarnClient should save the 
> *{color:#0747A6}{{UserGroupInformation.getCurrentUser()}}{color}* when the 
> YarnClient is being created and the 
> *{color:#0747A6}{{remoteRootLogDir.getFileSystem(conf)}}{color}* call should 
> be made inside an ugi.doAs block with that saved user.
> A more concrete example:
> {code:java}
> public YarnClient createYarnClient(UserGroupInformation ugi, Configuration 
> conf) throws Exception {
> return ugi.doAs((PrivilegedExceptionAction) () -> {
> // Her I am the submitterUser (see below)
> YarnClient yarnClient = YarnClient.createYarnClient();
> yarnClient.init(conf);
> yarnClient.start();
> return yarnClient;
> });
> }
> public void run() {
> // Here I am the serviceUser
> // ...
> Configuration conf = ...
> // ...
> UserGroupInformation ugi = getSubmitterUser();
> // ...
> YarnClient yarnClient = createYarnClient(ugi);
> // ...
> ApplicationSubmissionContext context = ...
> // ...
> yarnClient.submitApplication(context);
> }
> {code}
> As you can see *{color:#0747A6}{{submitApplication}}{color}* is not invoked 
> inside an ugi.doAs block and submitApplication is the one who will eventually 
> invoke *{color:#0747A6}{{addLogAggregationDelegationToken}}{color}*. That's 
> why we need to save the UGI during the YarnClient creation and create the 
> FileSystem instance inside an ugi.doAs with that saved user. Otherwise Yarn 
> will try to get a delegation token with an incorrect user (serviceUser) 
> instead of the submitterUser.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10603) Failed to reinitialize for recovered container

2021-01-31 Thread kyungwan nam (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276087#comment-17276087
 ] 

kyungwan nam edited comment on YARN-10603 at 2/1/21, 6:33 AM:
--

I've attached a patch. this patch works well in our cluster. 
Please review and comment.
Thanks.


was (Author: kyungwan nam):
I've attached a patch.
Please review and comment.
Thanks

> Failed to reinitialize for recovered container
> --
>
> Key: YARN-10603
> URL: https://issues.apache.org/jira/browse/YARN-10603
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-10603.001.patch
>
>
> Container reinitializing request does not work after restarting NM.
> I found some problem as below.
> - when a recovered container is terminated, exiting occurs because it makes 
> always either CONTAINER_EXITED_WITH_FAILURE or CONTAINER_EXITED_WITH_SUCCESS
> - container’s *recoveredStatus* is set at the time of NM recovery. and it is 
> never changed even though the container is terminated.
> as a result, newly reinitializing container will be launched as a recovered 
> container, but it doesn't work



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10603) Failed to reinitialize for recovered container

2021-01-31 Thread kyungwan nam (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kyungwan nam updated YARN-10603:

Attachment: YARN-10603.001.patch

> Failed to reinitialize for recovered container
> --
>
> Key: YARN-10603
> URL: https://issues.apache.org/jira/browse/YARN-10603
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: kyungwan nam
>Assignee: kyungwan nam
>Priority: Major
> Attachments: YARN-10603.001.patch
>
>
> Container reinitializing request does not work after restarting NM.
> I found some problem as below.
> - when a recovered container is terminated, exiting occurs because it makes 
> always either CONTAINER_EXITED_WITH_FAILURE or CONTAINER_EXITED_WITH_SUCCESS
> - container’s *recoveredStatus* is set at the time of NM recovery. and it is 
> never changed even though the container is terminated.
> as a result, newly reinitializing container will be launched as a recovered 
> container, but it doesn't work



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Created] (YARN-10603) Failed to reinitialize for recovered container

2021-01-31 Thread kyungwan nam (Jira)
kyungwan nam created YARN-10603:
---

 Summary: Failed to reinitialize for recovered container
 Key: YARN-10603
 URL: https://issues.apache.org/jira/browse/YARN-10603
 Project: Hadoop YARN
  Issue Type: Bug
Reporter: kyungwan nam
Assignee: kyungwan nam


Container reinitializing request does not work after restarting NM.

I found some problem as below.

- when a recovered container is terminated, exiting occurs because it makes 
always either CONTAINER_EXITED_WITH_FAILURE or CONTAINER_EXITED_WITH_SUCCESS
- container’s *recoveredStatus* is set at the time of NM recovery. and it is 
never changed even though the container is terminated.
as a result, newly reinitializing container will be launched as a recovered 
container, but it doesn't work



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used

2021-01-31 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275871#comment-17275871
 ] 

Hadoop QA commented on YARN-10532:
--

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
18s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
1s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 1 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
39s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
50s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 12s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
40s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
47s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
45s{color} | 
{color:red}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/564/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager-warnings.html{color}
 | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 in trunk has 1 extant findbugs warnings. {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
54s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
46s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
0m 45s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/564/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt{color}
 | {color:orange} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:
 The patch generated 19 new + 305 unchanged - 0 fixed = 324 total (was 305) 
{color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
48s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | 

[jira] [Commented] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used

2021-01-31 Thread zhuqi (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17275832#comment-17275832
 ] 

zhuqi commented on YARN-10532:
--

[~wangda] [~gandras]

I have updated a new patch to make the code more clear.

I have created a new Policy

"(maybe we can make it runnable by default so we don't have to create another 
config) as [~wangda]  suggested".

The policy just simply monitor queue last used time and delete queues when 
needed. We can enable this, by adding AutoDeletionForExpiredQueuePolicy to the 
conf : "scheduler.monitor.policies".

I also handled deletion of ParentQueues  which without child queues.

And i removed the reinitialize related logic, i think we don't need it when 
default enabled auto deletion.

If you any other thoughts.

Thanks.

 

> Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is 
> not being used
> 
>
> Key: YARN-10532
> URL: https://issues.apache.org/jira/browse/YARN-10532
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-10532.001.patch, YARN-10532.002.patch, 
> YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, 
> YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch
>
>
> It's better if we can delete auto-created queues when they are not in use for 
> a period of time (like 5 mins). It will be helpful when we have a large 
> number of auto-created queues (e.g. from 500 users), but only a small subset 
> of queues are actively used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10532) Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is not being used

2021-01-31 Thread zhuqi (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhuqi updated YARN-10532:
-
Attachment: YARN-10532.008.patch

> Capacity Scheduler Auto Queue Creation: Allow auto delete queue when queue is 
> not being used
> 
>
> Key: YARN-10532
> URL: https://issues.apache.org/jira/browse/YARN-10532
> Project: Hadoop YARN
>  Issue Type: Sub-task
>Reporter: Wangda Tan
>Assignee: zhuqi
>Priority: Major
> Attachments: YARN-10532.001.patch, YARN-10532.002.patch, 
> YARN-10532.003.patch, YARN-10532.004.patch, YARN-10532.005.patch, 
> YARN-10532.006.patch, YARN-10532.007.patch, YARN-10532.008.patch
>
>
> It's better if we can delete auto-created queues when they are not in use for 
> a period of time (like 5 mins). It will be helpful when we have a large 
> number of auto-created queues (e.g. from 500 users), but only a small subset 
> of queues are actively used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10602) YRAN job's state is FINISHED,the FinalStatus is UNDEFINED

2021-01-31 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

欧自力 updated YARN-10602:
---
Issue Type: Improvement  (was: Bug)

> YRAN job's state is FINISHED,the FinalStatus is UNDEFINED
> -
>
> Key: YARN-10602
> URL: https://issues.apache.org/jira/browse/YARN-10602
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: api, resourcemanager, restapi
>Affects Versions: 3.1.1
> Environment: ||ResourceManager version:|3.1.1.3.1.5.0-152 |
> ||Hadoop version:|3.1.1.3.1.5.0-152|
>Reporter: 欧自力
>Priority: Major
>  Labels: patch
> Attachments: UNDEFINED.png, UNDEFINED.txt
>
>
> when a tez task finished,But yarn api state is FINISHED,the FinalStatus is 
> UNDEFINED,The rest of you have had this problem
> please look like this,
> when i get status throuth  
> http://rm:8088/ws/v1/cluster/apps/application_1612017156073_24137
>  {color:#4c9aff}{color}
>  {color:#4c9aff}application_1612017156073_24137{color}
>  {color:#4c9aff}datadev{color}
>  {color:#4c9aff}HIVE-08babb5c-0a46-45db-892f-67aae26c4b57{color}
>  {color:#4c9aff}common{color}
>  {color:#4c9aff}{color:#de350b}FINISHED{color}{color}
>  {color:#de350b}UNDEFINED{color}
>  100.0
>  {color:#4c9aff}History{color}
>  
> {color:#4c9aff}[http://wx12-dsj-master002:8088/proxy/application_1612017156073_24137/]{color}
>  {color:#4c9aff}Session stats:submittedDAGs=1, successfulDAGs=1, 
> failedDAGs=0, killedDAGs=0 {color}
>  {color:#4c9aff}1612017156073{color}
>  {color:#4c9aff}TEZ{color}
>  
> {color:#4c9aff}hive_20210131142041_6adab368-2ffe-4469-ad96-58918b8f80a0,userid=datadev{color}
>  {color:#4c9aff}0{color}
>  {color:#4c9aff}1612074042309{color}
>  {color:#4c9aff}1612074064373{color}
>  {color:#4c9aff}22064{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10602) YRAN job's state is FINISHED,the FinalStatus is UNDEFINED

2021-01-31 Thread Jira


 [ 
https://issues.apache.org/jira/browse/YARN-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

欧自力 updated YARN-10602:
---
Remaining Estimate: (was: 1h)
 Original Estimate: (was: 1h)

> YRAN job's state is FINISHED,the FinalStatus is UNDEFINED
> -
>
> Key: YARN-10602
> URL: https://issues.apache.org/jira/browse/YARN-10602
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: api, resourcemanager, restapi
>Affects Versions: 3.1.1
> Environment: ||ResourceManager version:|3.1.1.3.1.5.0-152 |
> ||Hadoop version:|3.1.1.3.1.5.0-152|
>Reporter: 欧自力
>Priority: Major
>  Labels: patch
> Attachments: UNDEFINED.png, UNDEFINED.txt
>
>
> when a tez task finished,But yarn api state is FINISHED,the FinalStatus is 
> UNDEFINED,The rest of you have had this problem
> please look like this,
> when i get status throuth  
> http://rm:8088/ws/v1/cluster/apps/application_1612017156073_24137
>  {color:#4c9aff}{color}
>  {color:#4c9aff}application_1612017156073_24137{color}
>  {color:#4c9aff}datadev{color}
>  {color:#4c9aff}HIVE-08babb5c-0a46-45db-892f-67aae26c4b57{color}
>  {color:#4c9aff}common{color}
>  {color:#4c9aff}{color:#de350b}FINISHED{color}{color}
>  {color:#de350b}UNDEFINED{color}
>  100.0
>  {color:#4c9aff}History{color}
>  
> {color:#4c9aff}[http://wx12-dsj-master002:8088/proxy/application_1612017156073_24137/]{color}
>  {color:#4c9aff}Session stats:submittedDAGs=1, successfulDAGs=1, 
> failedDAGs=0, killedDAGs=0 {color}
>  {color:#4c9aff}1612017156073{color}
>  {color:#4c9aff}TEZ{color}
>  
> {color:#4c9aff}hive_20210131142041_6adab368-2ffe-4469-ad96-58918b8f80a0,userid=datadev{color}
>  {color:#4c9aff}0{color}
>  {color:#4c9aff}1612074042309{color}
>  {color:#4c9aff}1612074064373{color}
>  {color:#4c9aff}22064{color}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org