[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-12-07 Thread Konstantin Shvachko (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Shvachko updated YARN-6959:
--
Fix Version/s: (was: 2.7.1)
   2.7.5

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 3.0.0-beta1, 2.7.5
>
> Attachments: YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959-branch-2.8.002.patch, YARN-6959.005.patch, 
> YARN-6959.yarn_nm.log.zip, YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-31 Thread Andrew Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Wang updated YARN-6959:
--
Fix Version/s: (was: 3.0.0-alpha4)
   3.0.0-beta1

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-beta1
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959-branch-2.8.002.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-28 Thread Allen Wittenauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allen Wittenauer updated YARN-6959:
---
Target Version/s: 3.0.0-alpha4, 2.7.1, 2.8.0  (was: 2.8.0, 2.7.1, 
3.0.0-alpha4)
Release Note: ResourceManager will now record ResourceRequests from 
different attempts into different objects.

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959-branch-2.8.002.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-18 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959-branch-2.8.002.patch

Update 2.8 patch

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959-branch-2.8.002.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959-branch-2.7.006.patch

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.7.006.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959.yarn_nm.log.zip, YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959.003.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959-branch-2.7.003.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959-branch-2.7.001.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959.002.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959-branch-2.7.002.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959.004.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959-branch-2.7.004.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.005.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959.001.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.002.patch, YARN-6959.003.patch, 
> YARN-6959.004.patch, YARN-6959.005.patch, YARN-6959-branch-2.7.001.patch, 
> YARN-6959-branch-2.7.002.patch, YARN-6959-branch-2.7.003.patch, 
> YARN-6959-branch-2.7.004.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959-branch-2.7.005.patch

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.002.patch, YARN-6959.003.patch, 
> YARN-6959.004.patch, YARN-6959.005.patch, YARN-6959-branch-2.7.001.patch, 
> YARN-6959-branch-2.7.002.patch, YARN-6959-branch-2.7.003.patch, 
> YARN-6959-branch-2.7.004.patch, YARN-6959-branch-2.7.005.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-16 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959-branch-2.7.004.patch

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, 
> YARN-6959-branch-2.7.001.patch, YARN-6959-branch-2.7.002.patch, 
> YARN-6959-branch-2.7.003.patch, YARN-6959-branch-2.7.004.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-15 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959-branch-2.7.003.patch

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, 
> YARN-6959-branch-2.7.001.patch, YARN-6959-branch-2.7.002.patch, 
> YARN-6959-branch-2.7.003.patch, YARN-6959-branch-2.8.001.patch, 
> YARN-6959.yarn_nm.log.zip, YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-15 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959-branch-2.7.002.patch
YARN-6959-branch-2.8.001.patch

Add updated patch for 2.7 and new patch for 2.8.

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, 
> YARN-6959-branch-2.7.001.patch, YARN-6959-branch-2.7.002.patch, 
> YARN-6959-branch-2.8.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-15 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Target Version/s: 3.0.0-alpha4, 2.7.1, 2.8.0  (was: 2.7.1, 3.0.0-alpha4)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, 
> YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-15 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Fix Version/s: 2.8.0

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.8.0, 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, 
> YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-09 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959.yarn_nm.log.zip

Add NM log for this issue.

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, 
> YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_nm.log.zip, 
> YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-08 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959.yarn_rm.log.zip

RM log for this bug.

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, 
> YARN-6959-branch-2.7.001.patch, YARN-6959.yarn_rm.log.zip
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-08 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959.005.patch

UT fail is not due to the patch. Re-trigger Jenkins since the UT is not stable 

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959.005.patch, 
> YARN-6959-branch-2.7.001.patch
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: (was: YARN-6959.004.patch)

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959-branch-2.7.001.patch
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959.004.patch

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959-branch-2.7.001.patch
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Component/s: fairscheduler
 capacity scheduler

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: capacity scheduler, fairscheduler, scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959-branch-2.7.001.patch
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959.004.patch

Adjust Style

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959.004.patch, YARN-6959-branch-2.7.001.patch
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959.003.patch

Re-trigger QA use the same patch as 002

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959.003.patch, YARN-6959-branch-2.7.001.patch
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959.002.patch

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959.002.patch, 
> YARN-6959-branch-2.7.001.patch
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959.001.patch

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959.001.patch, YARN-6959-branch-2.7.001.patch
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Attachment: YARN-6959-branch-2.7.001.patch

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.7.1, 3.0.0-alpha4
>
> Attachments: YARN-6959-branch-2.7.001.patch
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Target Version/s: 3.0.0-alpha4, 2.7.1  (was: 2.7.1)
   Fix Version/s: 3.0.0-alpha4

> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.7.1, 3.0.0-alpha4
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may not be the corresponding attempt of the 
> attemptId.
> // Such as the attempt id is corresponding to the previous attempt.
> currentAttempt = scheduler.getApplicationAttempt(attemptId) ->
> // Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests
> currentAttempt.updateResourceRequests(ask) ->
> // RM may allocate wrong AM Container for the current attempt, because its 
> ResourceRequests
> // may come from previous attempt which can be any ResourceRequests previous 
> AM asked
> // and there is not matching logic for the original AM Container 
> ResourceRequest and 
> // the returned amContainerAllocation below.
> AMContainerAllocatedTransition.transition(...) ->
> amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
> {code}
> *Patch Correctness:*
> Because after this Patch, RM will definitely record ResourceRequests from 
> different attempt into different objects of 
> SchedulerApplicationAttempt.AppSchedulingInfo.
> So, even if RM still record ResourceRequests from old attempt at any time, 
> these ResourceRequests will be recorded in old AppSchedulingInfo object which 
> will not impact current attempt's resource requests and allocation.
> *Concerns:*
> The getApplicationAttempt function in AbstractYarnScheduler is so confusing, 
> we should better rename it to getCurrentApplicationAttempt. And reconsider 
> whether there are any other bugs related to getApplicationAttempt.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-6959) RM may allocate wrong AM Container for new attempt

2017-08-07 Thread Yuqi Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-6959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuqi Wang updated YARN-6959:

Description: 
*Issue Summary:*
Previous attempt ResourceRequest may be recorded into current attempt 
ResourceRequests. These mis-recorded ResourceRequests may confuse AM Container 
Request and Allocation for current attempt.


*Issue Pipeline:*

{code:java}
// Executing precondition check for the incoming attempt id.
ApplicationMasterService.allocate() ->

scheduler.allocate(attemptId, ask, ...) ->

// Previous precondition check for the attempt id may be outdated here, 
// i.e. the currentAttempt may not be the corresponding attempt of the 
attemptId.
// Such as the attempt id is corresponding to the previous attempt.
currentAttempt = scheduler.getApplicationAttempt(attemptId) ->

// Previous attempt ResourceRequest may be recorded into current attempt 
ResourceRequests
currentAttempt.updateResourceRequests(ask) ->

// RM may allocate wrong AM Container for the current attempt, because its 
ResourceRequests
// may come from previous attempt which can be any ResourceRequests previous AM 
asked
// and there is not matching logic for the original AM Container 
ResourceRequest and 
// the returned amContainerAllocation below.
AMContainerAllocatedTransition.transition(...) ->
amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
{code}


*Patch Correctness:*
Because after this Patch, RM will definitely record ResourceRequests from 
different attempt into different objects of 
SchedulerApplicationAttempt.AppSchedulingInfo.
So, even if RM still record ResourceRequests from old attempt at any time, 
these ResourceRequests will be recorded in old AppSchedulingInfo object which 
will not impact current attempt's resource requests and allocation.


*Concerns:*
The getApplicationAttempt function in AbstractYarnScheduler is so confusing, we 
should better rename it to getCurrentApplicationAttempt. And reconsider whether 
there are any other bugs related to getApplicationAttempt.

  was:
*Issue Summary:*
Previous attempt ResourceRequest may be recorded into current attempt 
ResourceRequests. These mis-recorded ResourceRequests may confuse AM Container 
Request and Allocation for current attempt.

*Issue Pipeline:*

{code:java}
// Executing precondition check for the incoming attempt id.
ApplicationMasterService.allocate() ->

scheduler.allocate(attemptId, ask, ...) ->

// Previous precondition check for the attempt id may be outdated here, 
// i.e. the currentAttempt may not be the corresponding attempt of the 
attemptId.
// Such as the attempt id is corresponding to the previous attempt.
currentAttempt = scheduler.getApplicationAttempt(attemptId) ->

// Previous attempt ResourceRequest may be recorded into current attempt 
ResourceRequests
currentAttempt.updateResourceRequests(ask) ->

// RM may allocate wrong AM Container for the current attempt, because its 
ResourceRequests
// may come from previous attempt which can be any ResourceRequests previous AM 
asked
// and there is not matching logic for the original AM Container 
ResourceRequest and 
// the returned amContainerAllocation below.
AMContainerAllocatedTransition.transition(...) ->
amContainerAllocation = scheduler.allocate(currentAttemptId, ...)
{code}

*Patch Correctness:*
Because after this Patch, RM will definitely record ResourceRequests from 
different attempt into different objects of 
SchedulerApplicationAttempt.AppSchedulingInfo.
So, even if RM still record ResourceRequests from old attempt at any time, 
these ResourceRequests will be recorded in old AppSchedulingInfo object which 
will not impact current attempt's resource requests and allocation.

*Concerns:*
The getApplicationAttempt function in AbstractYarnScheduler is so confusing, we 
should better rename it to getCurrentApplicationAttempt. And reconsider whether 
there are any other bugs related to getApplicationAttempt.


> RM may allocate wrong AM Container for new attempt
> --
>
> Key: YARN-6959
> URL: https://issues.apache.org/jira/browse/YARN-6959
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: scheduler
>Affects Versions: 2.7.1
>Reporter: Yuqi Wang
>Assignee: Yuqi Wang
>  Labels: patch
> Fix For: 2.7.1, 3.0.0-alpha4
>
>
> *Issue Summary:*
> Previous attempt ResourceRequest may be recorded into current attempt 
> ResourceRequests. These mis-recorded ResourceRequests may confuse AM 
> Container Request and Allocation for current attempt.
> *Issue Pipeline:*
> {code:java}
> // Executing precondition check for the incoming attempt id.
> ApplicationMasterService.allocate() ->
> scheduler.allocate(attemptId, ask, ...) ->
> // Previous precondition check for the attempt id may be outdated here, 
> // i.e. the currentAttempt may