[jira] [Comment Edited] (YARN-6920) Fix TestNMClient failure due to YARN-6706

2017-08-07 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16117367#comment-16117367
 ] 

Arun Suresh edited comment on YARN-6920 at 8/7/17 9:48 PM:
---

Kicked Jenkins off again for this patch - to verify that the last successful 
run was not just a one off thing.
Will commit it after, if successful - based on [~jianhe]'s lgtm 


was (Author: asuresh):
Kicked Jenkins off again for this patch - to verify that the last successful 
was not just a one off thing.
Will commit it after, if successful - based on [~jianhe]'s lgtm 

> Fix TestNMClient failure due to YARN-6706
> -
>
> Key: YARN-6920
> URL: https://issues.apache.org/jira/browse/YARN-6920
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-6920.001.patch, YARN-6920.002.patch, 
> YARN-6920.003.patch, YARN-6920.004.patch
>
>
> Looks like {{TestNMClient}} has been failing for a while. Opening this JIRA 
> to track the fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6920) Fix TestNMClient failure due to YARN-6706

2017-08-03 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113973#comment-16113973
 ] 

Arun Suresh edited comment on YARN-6920 at 8/4/17 5:46 AM:
---

Thanks for taking a look [~jianhe]
bq. ..it is possible that a different container gets started later on 
SCHEDULE_CONTAINER event ?
It is possible, but given the following invariant:
# *Total Resources of Guaranteed containers ALLOCATED on a Node cannot exceed 
the Node capacity*: The RM ensures that Guaranteed container are never 
over-allocated on an NM
# *Total (Opportunistic + Guaranteed) resources of RUNNING containers cannot 
exceed Node capacity*: The ContainerSchedulerenforces this.
# *Running Opportunistic containers will be preempted to make room for 
Guaranteed containers*: Also enforced by the ContainerScheduler

We don't really have to worry about if a different container starts in the 
meanwhile. If the new container that was started is a Guaranteed, then the Node 
should have the resources to begin with.. and if Opportunistic, then, it will 
probably be killed when our ReInitializing container is restarted.

bq.  And for service container, user should be expected to always use 
Guaranteed type.
Yup. There is already an {{enforceExecutionType}} field in the 
ResourceRequest::ExecutionTypeRequest that an AM can use to ensure that 
container it receives against this request is of Guaranteed type.


was (Author: asuresh):
bq. ..it is possible that a different container gets started later on 
SCHEDULE_CONTAINER event ?
It is possible, but given the following invariant:
# *Total Resources of Guaranteed containers ALLOCATED on a Node cannot exceed 
the Node capacity*: The RM ensures that Guaranteed container are never 
over-allocated on an NM
# *Total (Opportunistic + Guaranteed) resources of RUNNING containers cannot 
exceed Node capacity*: The ContainerSchedulerenforces this.
# *Running Opportunistic containers will be preempted to make room for 
Guaranteed containers*: Also enforced by the ContainerScheduler

We don't really have to worry about if a different container starts in the 
meanwhile. If the new container that was started is a Guaranteed, then the Node 
should have the resources to begin with.. and if Opportunistic, then, it will 
probably be killed when our ReInitializing container is restarted.

bq.  And for service container, user should be expected to always use 
Guaranteed type.
Yup. There is already an {{enforceExecutionType}} field in the 
ResourceRequest::ExecutionTypeRequest that an AM can use to ensure that 
container it receives against this request is of Guaranteed type.

> Fix TestNMClient failure due to YARN-6706
> -
>
> Key: YARN-6920
> URL: https://issues.apache.org/jira/browse/YARN-6920
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Arun Suresh
> Attachments: YARN-6920.001.patch, YARN-6920.002.patch, 
> YARN-6920.003.patch, YARN-6920.004.patch
>
>
> Looks like {{TestNMClient}} has been failing for a while. Opening this JIRA 
> to track the fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-6920) fix TestNMClient failure due to YARN-6706

2017-08-01 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-6920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109197#comment-16109197
 ] 

Arun Suresh edited comment on YARN-6920 at 8/1/17 4:31 PM:
---

It looks like its been failing after YARN-6706. [~haibochen] Can you take a 
look ?


was (Author: asuresh):
It looks like its been failing after YARN-6706. [~haibo.chen] Can you take a 
look ?

> fix TestNMClient failure due to YARN-6706
> -
>
> Key: YARN-6920
> URL: https://issues.apache.org/jira/browse/YARN-6920
> Project: Hadoop YARN
>  Issue Type: Bug
>Reporter: Arun Suresh
>Assignee: Haibo Chen
>
> Looks like {{TestNMClient}} has been failing for a while. Opening this JIRA 
> to track the fix.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org