[jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-11-15 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1585#comment-1585
 ] 

Arun Suresh edited comment on YARN-4597 at 11/15/16 9:51 AM:
-

Thanks again for the review [~kkaranasos],
I've updated the patch based on your suggestions except the following:

1.
bq. ..the fields of the OpportunisticContainersStatus() can still be updated 
during the getOpportunisticContainersStatus(). To avoid synchronization, we 
could set the fields using an event, and then in the 
getOpportunisticContainersStatus() we would just return the object.
Given that the opportunisticContainerStatus is meant for reporting, I think we 
can live with minor temporary inconsistencies. An event is probably an overkill.

2.
bq. In the SchedulerNode, I still suggest to put the ++numContainers and the 
--numContainers inside the if statements. If I remember well, these fields are 
used for the web UI, so there will be a disconnect between the resources used 
(referring only to guaranteed containers) and the number of containers 
(referring to both guaranteed and opportunistic at the moment). The stats for 
the opportunistic containers are carried by the opportunisticContainersStatus, 
so we are good with reporting them too.
I still feel numContainers SHOULD include the opportunistic containers. Else, 
we should change the 'numContainers' field name to 'numGuaranteedContainers' - 
but I am less inclined to make anymore changes in the RM for this patch. I am 
actually of the opinion that since users can see the actual opp. container 
count, by virtue of YARN-2995, the total count would be more valuable than just 
the guaranteed container count. Also in the future, we might have other 
container execution types, and I would not want a specific counter on the 
SchedulerNode for each of these.
Let's re-visit this if we do find the UI jarring. 



was (Author: asuresh):
Thanks again for the review [~kkaranasos],
I've updated the patch based on your suggestions except the following:

1.
bq. ..the fields of the OpportunisticContainersStatus() can still be updated 
during the getOpportunisticContainersStatus(). To avoid synchronization, we 
could set the fields using an event, and then in the 
getOpportunisticContainersStatus() we would just return the object.
Given that the opportunisticContainerStatus is meant for reporting, I think we 
can live with minor temporary inconsistencies. An event is probably an overkill.

2.
bq. In the SchedulerNode, I still suggest to put the ++numContainers and the 
--numContainers inside the if statements. If I remember well, these fields are 
used for the web UI, so there will be a disconnect between the resources used 
(referring only to guaranteed containers) and the number of containers 
(referring to both guaranteed and opportunistic at the moment). The stats for 
the opportunistic containers are carried by the opportunisticContainersStatus, 
so we are good with reporting them too.
I still feel numContainers SHOULD include the opportunistic containers. Else, 
we should change the 'numContainers' field name to 'numGuaranteedContainers' - 
but I am less inclined to make anymore changes in the RM for this patch. I am 
actually of the opinion that since users can see the actual opp. container 
count, by virtue of YARN-2995, the total count would be more valuable than just 
the guaranteed container count. Also in the future, we might have other 
container types, and I would not want a specific counter on the SchedulerNode 
for each of these.


> Add SCHEDULE to NM container lifecycle
> --
>
> Key: YARN-4597
> URL: https://issues.apache.org/jira/browse/YARN-4597
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chris Douglas
>Assignee: Arun Suresh
>  Labels: oct16-hard
> Attachments: YARN-4597.001.patch, YARN-4597.002.patch, 
> YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, 
> YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, 
> YARN-4597.009.patch, YARN-4597.010.patch, YARN-4597.011.patch, 
> YARN-4597.012.patch, YARN-4597.013.patch
>
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-11-13 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15661566#comment-15661566
 ] 

Arun Suresh edited comment on YARN-4597 at 11/13/16 2:31 PM:
-

Updating patch addressing [~kkaranasos]'s and [~kasha]'s comments.

Thanks both of you for the detailed reviews.

I've responded to Karthik's comments directly on github.
[~kkaranasos], I've address your comments except the following.

1.
bq. queuedOpportunisticContainers will have concurrency issues. We are updating 
it when containers arrive but also in the shedQueuedOpportunisticContainers.
Good catch. Ive fixed it in the latest patch by sending a 
'SHED_QUEUED_CONTAINERS' event to the ContainerScheduler when the Node HB 
response from the RM asks to shed queued containers. In addition to preserving 
the fact that ContainerScheduler operations are serialized, it also ensures 
that the Node HB thread is not blocked.

2.
bq. shedQueuedOpportunisticContainers: numAllowed is the number of allowed 
containers in the queue. Instead, we are killing numAllowed containers. In 
other words, we should not kill numAllowed, but 
queuedOpportunisticContainers.size() - numAllowed.
Even though I agreed with you offline, I took a look again, and actually the 
logic is correct. The {{numAllowed}} counter is initialized to maxAllowed and 
the decremented in the loop. Containers are killed only AFTER it's value goes 
<= 0. In any case, I've added a testcase in 'TestContainerSchedulerQueuing' to 
verify that this actually works.

3.
bq. line 252, indeed we can now do extraOpportContainersToKill -> 
opportContainersToKill, as Karthik mentioned at a comment.
I think 'extra' is still apt. Since (as I mentioned to Karthik), these are 
'extra' opp containers over and above what is already present in the 
'oppContainersToKill'.

4.
bq. queuedGuaranteedContainers and queuedOpportunisticContainers: I think we 
should use queues. I don't think we retrieve the container by the key anywhere 
either ways.
Karthik mentioned this in his comments too. LinkedHashMaps are essentially a 
indexed queue. Additionally, there is actually 1 case where we need to retrieve 
by the key: When the AM asks to kill a container that is queued. Furthermore, 
queue re-ordering etc. might be easier with a map... Lets keep it as a 
LinkedHashMap unless we find it is detrimental in some way. 

5.
bq. oppContainersMarkedForKill: could be a Set, right?
Technically yes, but I would have to modify Container to add equals and 
hashcode too, which I felt was too much of a hastle.. I prefer to keep it as it 
is.

6.
bq. fields of the opportunisticContainersStatus are set in different places. 
Due to that, when we call getOpportunisticContainersStatus() we may see an 
inconsistent object. Let's set the fields only in the 
getOpportunisticContainersStatus().
Agreed... I've also added the oppMem, oppCores & numOpp values in the 
NodeManagerMetrics.

7.
bq. There seem to be two redundant parameters at YarnConfiguration at the 
moment: NM_CONTAINER_QUEUING_MIN_QUEUE_LENGTH and 
NM_OPPORTUNISTIC_CONTAINERS_MAX_QUEUE_LENGTH. If I am not missing something, we 
should keep one of the two.
Actually they are bit different. NM_CONTAINER_QUEUING_MIN_QUEUE_LENGTH and the 
corresponding max value is used by the RM to calculate a limit value for the 
Queue. It is possible that the Queue can momentarily go above that. While 
NM_OPPORTUNISTIC_CONTAINERS_MAX_QUEUE_LENGTH is used in the NM to prevent 
queuing beyond that value. It is a configuration hard limit ([~kasha] had 
requested it)
 




was (Author: asuresh):
Updating patch addressing [~kkaranasos]'s and [~kasha]'s comments

I've responded to Karthik's comments directly on github.

[~kkaranasos],
bq. queuedOpportunisticContainers will have concurrency issues. We are updating 
it when containers arrive but also in the shedQueuedOpportunisticContainers.
Good catch. Ive fixed it in the latest patch by sending a 
'SHED_QUEUED_CONTAINERS' event to the ContainerScheduler when the Node HB 
response from the RM asks to shed queued containers. In addition to preserving 
the fact that ContainerScheduler operations are serialized, it also ensures 
that the Node HB thread is not blocked.

bq. shedQueuedOpportunisticContainers: numAllowed is the number of allowed 
containers in the queue. Instead, we are killing numAllowed containers. In 
other words, we should not kill numAllowed, but 
queuedOpportunisticContainers.size() - numAllowed.
Even though I agreed with you offline, I took a look again, and actually the 
logic is correct. The {{numAllowed}} counter is initialized to maxAllowed and 
the decremented in the loop. Containers are killed only AFTER it's value goes 
<= 0. In any case, I've added a testcase in 'TestContainerSchedulerQueuing' to 
verify that this actually works.

bq. line 252, indeed we can now do extraOpportContainersToKill -> 

[jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-11-11 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15656417#comment-15656417
 ] 

Arun Suresh edited comment on YARN-4597 at 11/11/16 3:11 PM:
-

Appreciate the review [~kkaranasos],

1.
bq. The Container has two new methods (sendLaunchEvent and sendKillEvent), 
which are public and are not following..
sendKillEvent is used by the Scheduler (which is in another package) to kill a 
container. Since this patch introduces an external entity that launches and 
kills a container, viz. the Scheduler, I feel it is apt to keep both as public 
methods. I prefer it to 'dispatcher.getEventHandler().handle..'. 

2.
The Container needs to be added to the {{nodeUpdateQueue}} if the container is 
to be move from ACQUIRED to RUNNING state (this is a state transition all 
containers should go thru). Regarding the {{launchedContainers}}, Lets have 
both Opportunistic and Guaranteed containers flow through a common code-path... 
and introduce specific behaviors if required in subsequent patches as and when 
required.

3.
bq. In the OpportunisticContainerAllocatorAMService we are now calling the 
SchedulerNode::allocate, and then we do not update the used resources but we do 
update some other counters, which leads to inconsistencies.
Hmmm... I do see that the numContainers are not decremented correctly when 
release. Thanks... but it looks like it would more likely just impact reporting 
 / UI, nothing functional (Will update the patch). Can you specify which other 
counters specifically ? Like I mentioned above.. lets run all containers thru 
as much of the common code path before we add new counters etc.

4.
bq. Maybe as part of a different JIRA, we should at some point extend the 
container.metrics in the ContainerImpl to keep track of the scheduled/queued 
containers.
Yup.. +1 to that.

The rest of your comments make sense... will update patch.

bq. let's stress-test the code in a cluster before committing to make sure 
everything is good
It has been tested on a 3 node cluster and MR Pi jobs (with opportunistic 
containers) and I didn't hit any major issues. We can always open follow-up 
JIRAs for specific performance related issues as and when we find it. Besides, 
stess-testing is not really a precondition to committing a patch.



was (Author: asuresh):
Appreciate the review [~kkaranasos],

1.
bq. The Container has two new methods (sendLaunchEvent and sendKillEvent), 
which are public and are not following..
sendKillEvent is used by the Scheduler (which is in another package) to kill a 
container. Since this patch introduces an external entity that launches and 
kills a container, viz. the Scheduler, I feel it is apt to keep both as public 
methods. I prefer it to 'dispatcher.getEventHandler().handle..'. 

2.
The Container needs to be added to the {{nodeUpdateQueue}} if the container is 
to be move from ACQUIRED to RUNNING state (this is a state transition all 
containers should go thru). Regarding the {{launchedContainers}}, Lets have 
both Opportunistic and Guaranteed containers flow through a common code-path... 
and introduce specific behaviors if required in subsequent patches as and when 
required.

3.
bq. In the OpportunisticContainerAllocatorAMService we are now calling the 
SchedulerNode::allocate, and then we do not update the used resources but we do 
update some other counters, which leads to inconsistencies.
Hmmm... I do see that the numContainers are not decremented correctly when 
release. Thanks... but it looks like it would more likely just impact reporting 
 / UI, nothing functional (Will update the patch). Can you specify which other 
counters specifically ? Like I mentioned in the previous patch.. lets run all 
containers thru as much of the common code path before we add new counters etc.

4.
bq. Maybe as part of a different JIRA, we should at some point extend the 
container.metrics in the ContainerImpl to keep track of the scheduled/queued 
containers.
Yup.. +1 to that.

The rest of your comments make sense... will update patch.

bq. let's stress-test the code in a cluster before committing to make sure 
everything is good
It has been tested on a 3 node cluster and MR Pi jobs (with opportunistic 
containers) and I didn't hit any major issues. We can always open follow-up 
JIRAs for specific performance related issues as and when we find it. Besides, 
stess-testing is not really a precondition to committing a patch.


> Add SCHEDULE to NM container lifecycle
> --
>
> Key: YARN-4597
> URL: https://issues.apache.org/jira/browse/YARN-4597
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chris Douglas
>Assignee: Arun Suresh
>  Labels: oct16-hard
> Attachments: YARN-4597.001.patch, 

[jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-11-09 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15651248#comment-15651248
 ] 

Arun Suresh edited comment on YARN-4597 at 11/9/16 3:38 PM:


The new test failures are caused by an issue with YARN-5833. Kicking Jenkins 
again after resolving it..


was (Author: asuresh):
The new test failures are caused by an issue with YARN-5833. Kicking it off 
again..

> Add SCHEDULE to NM container lifecycle
> --
>
> Key: YARN-4597
> URL: https://issues.apache.org/jira/browse/YARN-4597
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chris Douglas
>Assignee: Arun Suresh
>  Labels: oct16-hard
> Attachments: YARN-4597.001.patch, YARN-4597.002.patch, 
> YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, 
> YARN-4597.006.patch, YARN-4597.007.patch, YARN-4597.008.patch, 
> YARN-4597.009.patch
>
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-11-07 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645284#comment-15645284
 ] 

Arun Suresh edited comment on YARN-4597 at 11/7/16 8:58 PM:


Rebased patch after YARN-2995 and latest commits to trunk.

[~jianhe], thanks for the comments..
bq. maybe rename ContainerScheduler#runningContainers to scheduledContainers
Given that the SCHEDULED state is a state that comes before RUNNING. and the 
*runningContainers* collection actually holds containers that have been marked 
to run by the scheduler, am thinking scheduled containers may not be apt here. 
How about *scheduledToRunContainers* ?

bq. The ContainerLaunch#killedBeforeStart flag, looks like the exising flag 
'shouldLaunchContainer' serves the same purpose, can we reuse that ? if so, the 
container#isMarkedToKill is also not needed.
Hmm.. my understanding is that it is slightly different. The 
*shouldLaunchContainer* is IIUC, used during the recovery process to signal 
that the container should not be launched. What *killedBeforeStart* aims to do 
is to notify the *ContainerLaunch* (which runs in a different thread) that the 
Scheduler, which had requested to start the container earlier, in the last 
moment changed its mind and decided to kill it. Using shouldLaunchContainer 
also causes the CONTAINER_LAUNCH event to be fired which I do not want.

bq. NodeManager#containerScheduler variable not used, remove
Done

bq. I think this comment is not addressed ? "In case we exceed the max-queue 
length, we are killing the container directly instead of queueing the 
container, in this case, we should not store the container as queued?"
Yeah.. meant to comment on it. This is actually the desired behavior. Once 
queue limit is reached, no new opportunistic containers should also be queued. 
The AM is free to request it again. The MRAppMaster, for eg. re-requests the 
same task as a GUARANTEED container.

Hope this made sense ?


was (Author: asuresh):
Rebased patch after YARN-2995 and latest commits to trunk.

[~jianhe], thanks for the comments..
bq. maybe rename ContainerScheduler#runningContainers to scheduledContainers
Given that we the SCHEDULED state is a state that comes before RUNNING. and the 
*runningContainers* collection actually holds containers that have been marked 
to run by the scheduler, am thinking scheduled containers may not be apt here. 
How about *scheduledToRunContainers* ?

bq. The ContainerLaunch#killedBeforeStart flag, looks like the exising flag 
'shouldLaunchContainer' serves the same purpose, can we reuse that ? if so, the 
container#isMarkedToKill is also not needed.
Hmm.. my understanding is that it is slightly different. The 
*shouldLaunchContainer* is IIUC, used during the recovery process to signal 
that the container should not be launched. What *killedBeforeStart* aims to do 
is to notify the *ContainerLaunch* (which runs in a different thread) that the 
Scheduler, which had requested to start the container earlier, in the last 
moment changed its mind and decided to kill it. Using shouldLaunchContainer 
also causes the CONTAINER_LAUNCH event to be fired which I do not want.

bq. NodeManager#containerScheduler variable not used, remove
Done

bq. I think this comment is not addressed ? "In case we exceed the max-queue 
length, we are killing the container directly instead of queueing the 
container, in this case, we should not store the container as queued?"
Yeah.. meant to comment on it. This is actually the desired behavior. Once 
queue limit is reached, no new opportunistic containers should also be queued. 
The AM is free to request it again. The MRAppMaster, for eg. re-requests the 
same task as a GUARANTEED container.

Hope this made sense ?

> Add SCHEDULE to NM container lifecycle
> --
>
> Key: YARN-4597
> URL: https://issues.apache.org/jira/browse/YARN-4597
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chris Douglas
>Assignee: Arun Suresh
>  Labels: oct16-hard
> Attachments: YARN-4597.001.patch, YARN-4597.002.patch, 
> YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, 
> YARN-4597.006.patch
>
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-11-07 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15645284#comment-15645284
 ] 

Arun Suresh edited comment on YARN-4597 at 11/7/16 8:13 PM:


Rebased patch after YARN-2995 and latest commits to trunk.

[~jianhe], thanks for the comments..
bq. maybe rename ContainerScheduler#runningContainers to scheduledContainers
Given that we the SCHEDULED state is a state that comes before RUNNING. and the 
*runningContainers* collection actually holds containers that have been marked 
to run by the scheduler, am thinking scheduled containers may not be apt here. 
How about *scheduledToRunContainers* ?

bq. The ContainerLaunch#killedBeforeStart flag, looks like the exising flag 
'shouldLaunchContainer' serves the same purpose, can we reuse that ? if so, the 
container#isMarkedToKill is also not needed.
Hmm.. my understanding is that it is slightly different. The 
*shouldLaunchContainer* is IIUC, used during the recovery process to signal 
that the container should not be launched. What *killedBeforeStart* aims to do 
is to notify the *ContainerLaunch* (which runs in a different thread) that the 
Scheduler, which had requested to start the container earlier, in the last 
moment changed its mind and decided to kill it. Using shouldLaunchContainer 
also causes the CONTAINER_LAUNCH event to be fired which I do not want.

bq. NodeManager#containerScheduler variable not used, remove
Done

bq. I think this comment is not addressed ? "In case we exceed the max-queue 
length, we are killing the container directly instead of queueing the 
container, in this case, we should not store the container as queued?"
Yeah.. meant to comment on it. This is actually the desired behavior. Once 
queue limit is reached, no new opportunistic containers should also be queued. 
The AM is free to request it again. The MRAppMaster, for eg. re-requests the 
same task as a GUARANTEED container.

Hope this made sense ?


was (Author: asuresh):
Rebased patch after YARN-2995 and latest commits to trunk.

[~jianhe], thanks for the comments..
bq. maybe rename ContainerScheduler#runningContainers to scheduledContainers
Given that we the SCHEDULED state is a state that comes before RUNNING. and the 
*runningContainers* collection actually holds containers that have been marked 
to run by the scheduler, am thinking scheduled containers may not be apt here. 
How about *scheduledToRunContainers* ?

bq. The ContainerLaunch#killedBeforeStart flag, looks like the exising flag 
'shouldLaunchContainer' serves the same purpose, can we reuse that ? if so, the 
container#isMarkedToKill is also not needed.
Hmm.. my understanding is that it a slightly different. The 
*shouldLaunchContainer* is IIUC, used during the recovery process to signal 
that the container should be launched or not. What *killedBeforeStart* aims to 
do is to notify the *ContainerLaunch* (which runs in a different thread) that 
the Scheduler might have requested to start the container earlier, but in the 
last minute decided to kill it. Using shouldLaunchContainer also causes the 
CONTAINER_LAUNCH event to be fired which I do not want.

bq. NodeManager#containerScheduler variable not used, remove
Done

bq. I think this comment is not addressed ? "In case we exceed the max-queue 
length, we are killing the container directly instead of queueing the 
container, in this case, we should not store the container as queued?"
Yeah.. meant to comment on it. This is actually the desired behavior. Once 
queue limit is reached, no new opportunistic containers should also be queued. 
The AM is free to request it again. The MRAppMaster, for eg. re-requests the 
same task as a GUARANTEED container.

Hope this made sense ?

> Add SCHEDULE to NM container lifecycle
> --
>
> Key: YARN-4597
> URL: https://issues.apache.org/jira/browse/YARN-4597
> Project: Hadoop YARN
>  Issue Type: New Feature
>  Components: nodemanager
>Reporter: Chris Douglas
>Assignee: Arun Suresh
>  Labels: oct16-hard
> Attachments: YARN-4597.001.patch, YARN-4597.002.patch, 
> YARN-4597.003.patch, YARN-4597.004.patch, YARN-4597.005.patch, 
> YARN-4597.006.patch
>
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-10-23 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15599316#comment-15599316
 ] 

Arun Suresh edited comment on YARN-4597 at 10/23/16 8:22 AM:
-

Updated the patch based on [~jianhe]'s suggestions:
# Reverted the un-necessary state transistions
# Reverted test cases to wait for RUNNING
# Removed un-used imports.
# Fixed test- cases
# Added some javadoc for the {{ResourceUtilizationManager}}

I've also updated the pull request. Attached the consolidated patch here to 
kick of Jenkins.

bq. why do you need to add the additional check for SCHEDULED state ?
{code}
  // Process running containers
  if (remoteContainer.getState() == ContainerState.RUNNING ||
  remoteContainer.getState() == ContainerState.SCHEDULED) {
{code}
This is actually needed. We need the RM to move the container out of the 
ACQUIRED state. Currently, it happens only if container is in the RUNNING 
state. We need to do so for SCHEDULED state as well.




was (Author: asuresh):
Updated the patch based on [~jianhe]'s suggestions.
I've also updated the pull request. Attached the consolidated patch here to 
kick of Jenkins.

bq. why do you need to add the additional check for SCHEDULED state ?
{code}
  // Process running containers
  if (remoteContainer.getState() == ContainerState.RUNNING ||
  remoteContainer.getState() == ContainerState.SCHEDULED) {
{code}
This is actually needed. We need the RM to move the container out of the 
ACQUIRED state. Currently, it happens only if container is in the RUNNING 
state. We need to do so for SCHEDULED state as well.



> Add SCHEDULE to NM container lifecycle
> --
>
> Key: YARN-4597
> URL: https://issues.apache.org/jira/browse/YARN-4597
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Chris Douglas
>Assignee: Arun Suresh
> Attachments: YARN-4597.001.patch, YARN-4597.002.patch, 
> YARN-4597.003.patch, YARN-4597.004.patch
>
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-10-20 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593945#comment-15593945
 ] 

Arun Suresh edited comment on YARN-4597 at 10/21/16 4:31 AM:
-

[~jianhe], thanks again for taking a look.

bq. I think there might be some behavior change or bug for scheduling 
guaranteed containers when the oppotunistic-queue is enabled. Previously, when 
launching container, NM will not check for current vmem usage, and cpu usage. 
It assumes what RM allocated can be launched. Now, NM will check these limits 
and won't launch the container if hits the limit.
Yup, we do a *hasResources* check only at the start of a container and when a 
container is killed. We assumed that resources requested by a container is 
constant, essentially we considered only actual *allocated* resources which we 
assume will not varying during the lifetime of the container... which implies, 
there is no point in checking this at any other time other than start and kill 
of containers.
But like you stated, if we consider container resource *utilization*, based on 
the work [~kasha] is doing in YARN-1011, then yes, we should have a timer 
thread that periodically checks the vmem and cpu usage and starts (and kills) 
containers based on that.

bq. the ResourceUtilizationManager looks like only incorporated some utility 
methods, not sure how we will make this pluggable later.
Following on my point above, the idea was to have a 
{{ResourceUtilizationManager}} that can provide a different value of 
{{getCurrentUtilization}}, {{addResource}} and {{subtractResource}} which is 
used by the ContainerScheduler to calculate the resources to free up. For 
instance, the current default one only takes into account actual resource 
*allocated* to containers...  for YARN-1011, we might replace that with the 
resource *utilized* by running containers, and provide a different value for 
{{getCurrentUtilization}}. The timer thread I mentioned in the previous point, 
which can be apart of this new ResourceUtilizationManager, can send events to 
the scheduler to re-process queued containers when utilization has changed.

bq. The logic to select opportunisitic container: we may kill more 
opportunistic containers than required. e.g...
Good catch, in the {{resourcesToFreeUp}}, I needed to decrement any 
already-marked-for-kill opportunistic container. It was there earlier, Had 
removed it when I was testing something, but forgot to put it back :)

bq. we don't need to synchronize on the currentUtilization object? I don't see 
any other place it's synchronized
Yup, It isnt required. Varun did point out the same.. I thought I had fixed it, 
think I might have missed 'git add'ing the change

w.r.t Adding the new transitions, I was seeing some error messages in some 
testcases. Will rerun and see if they are required… but in anycase, having them 
there should be harmless right?
 
The rest of your comments makes sense.. will address them shortly.



was (Author: asuresh):
[~jianhe], thanks again for taking a look.

bq. I think there might be some behavior change or bug for scheduling 
guaranteed containers when the oppotunistic-queue is enabled. Previously, when 
launching container, NM will not check for current vmem usage, and cpu usage. 
It assumes what RM allocated can be launched.
Now, NM will check these limits and won't launch the container if hits the 
limit.
Yup, we do a *hasResources* check only at the start of a container and when a 
container is killed. We assumed that resources requested by a container is 
constant, essentially we considered only actual *allocated* resources which we 
assume will not varying during the lifetime of the container... which implies, 
there is no point in checking this at any other time other than start and kill 
of containers.
But like you stated, if we consider container resource *utilization*, based on 
the work [~kasha] is doing in YARN-1011, then yes, we should have a timer 
thread that periodically checks the vmem and cpu usage and starts (and kills) 
containers based on that.

bq. the ResourceUtilizationManager looks like only incorporated some utility 
methods, not sure how we will make this pluggable later.
Following on my point above, the idea was to have a 
{{ResourceUtilizationManager}} that can provide a different value of 
{{getCurrentUtilization}}, {{addResource}} and {{subtractResource}} which is 
used by the ContainerScheduler to calculate the resources to free up. For 
instance, the current default one only takes into account actual resource 
*allocated* to containers...  for YARN-1011, we might replace that with the 
resource *utilized* by running containers, and provide a different value for 
{{getCurrentUtilization}}. The timer thread I mentioned in the previous point, 
which can be apart of this new ResourceUtilizationManager, can send events to 
the scheduler to re-process 

[jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-10-20 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15593945#comment-15593945
 ] 

Arun Suresh edited comment on YARN-4597 at 10/21/16 4:31 AM:
-

[~jianhe], thanks again for taking a look.

bq. I think there might be some behavior change or bug for scheduling 
guaranteed containers when the oppotunistic-queue is enabled. Previously, when 
launching container, NM will not check for current vmem usage, and cpu usage. 
It assumes what RM allocated can be launched.
Now, NM will check these limits and won't launch the container if hits the 
limit.
Yup, we do a *hasResources* check only at the start of a container and when a 
container is killed. We assumed that resources requested by a container is 
constant, essentially we considered only actual *allocated* resources which we 
assume will not varying during the lifetime of the container... which implies, 
there is no point in checking this at any other time other than start and kill 
of containers.
But like you stated, if we consider container resource *utilization*, based on 
the work [~kasha] is doing in YARN-1011, then yes, we should have a timer 
thread that periodically checks the vmem and cpu usage and starts (and kills) 
containers based on that.

bq. the ResourceUtilizationManager looks like only incorporated some utility 
methods, not sure how we will make this pluggable later.
Following on my point above, the idea was to have a 
{{ResourceUtilizationManager}} that can provide a different value of 
{{getCurrentUtilization}}, {{addResource}} and {{subtractResource}} which is 
used by the ContainerScheduler to calculate the resources to free up. For 
instance, the current default one only takes into account actual resource 
*allocated* to containers...  for YARN-1011, we might replace that with the 
resource *utilized* by running containers, and provide a different value for 
{{getCurrentUtilization}}. The timer thread I mentioned in the previous point, 
which can be apart of this new ResourceUtilizationManager, can send events to 
the scheduler to re-process queued containers when utilization has changed.

bq. The logic to select opportunisitic container: we may kill more 
opportunistic containers than required. e.g...
Good catch, in the {{resourcesToFreeUp}}, I needed to decrement any 
already-marked-for-kill opportunistic container. It was there earlier, Had 
removed it when I was testing something, but forgot to put it back :)

bq. we don't need to synchronize on the currentUtilization object? I don't see 
any other place it's synchronized
Yup, It isnt required. Varun did point out the same.. I thought I had fixed it, 
think I might have missed 'git add'ing the change

w.r.t Adding the new transitions, I was seeing some error messages in some 
testcases. Will rerun and see if they are required… but in anycase, having them 
there should be harmless right?
 
The rest of your comments makes sense.. will address them shortly.



was (Author: asuresh):
[~jianhe], thanks again for taking a look.

bq. I think there might be some behavior change or bug for scheduling 
guaranteed containers when the oppotunistic-queue is enabled.
Previously, when launching container, NM will not check for current vmem usage, 
and cpu usage. It assumes what RM allocated can be launched.
Now, NM will check these limits and won't launch the container if hits the 
limit.
Yup, we do a *hasResources* check only at the start of a container and when a 
container is killed. We assumed that resources requested by a container is 
constant, essentially we considered only actual *allocated* resources which we 
assume will not varying during the lifetime of the container... which implies, 
there is no point in checking this at any other time other than start and kill 
of containers.
But like you stated, if we consider container resource *utilization*, based on 
the work [~kasha] is doing in YARN-1011, then yes, we should have a timer 
thread that periodically checks the vmem and cpu usage and starts (and kills) 
containers based on that.

bq. the ResourceUtilizationManager looks like only incorporated some utility 
methods, not sure how we will make this pluggable later.
Following on my point above, the idea was to have a 
{{ResourceUtilizationManager}} that can provide a different value of 
{{getCurrentUtilization}}, {{addResource}} and {{subtractResource}} which is 
used by the ContainerScheduler to calculate the resources to free up. For 
instance, the current default one only takes into account actual resource 
*allocated* to containers...  for YARN-1011, we might replace that with the 
resource *utilized* by running containers, and provide a different value for 
{{getCurrentUtilization}}. The timer thread I mentioned in the previous point, 
which can be apart of this new ResourceUtilizationManager, can send events to 
the scheduler to re-process 

[jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-10-12 Thread Arun Suresh (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15568792#comment-15568792
 ] 

Arun Suresh edited comment on YARN-4597 at 10/12/16 1:56 PM:
-

Thanks for taking a look [~jianhe],

bq. Wondering why KillWhileExitingTransition is added..
I had put it in there for debugging something... Left it there since I thought 
it was harmless... but, yeah looks like it does over-ride the exitcode. Will 
remove it. Good catch.

* w.r.t {{ContainerState#SCHEDULED}} : Actually, I think we should expose this. 
We currently club NEW, LOCALIZING, LOCALIZED etc. into RUNNING, but the 
container is actually not running, and is thus misleading. SCHEDULED implies 
that some of the containers dependencies (resources for localization + some 
internal queuing/scheduling policy) have not yet been met.
Prior to this, YARN-2877 had introduced the QUEUED return state. This would be 
visible to applications, if Queuing was enabled. This patch technically just 
renames QUEUED to SCHEDULED. Also, all containers will go thru the SCHEDULED 
state, not just the opportunistic ones (although, for guaranteed containers 
this will just be a pass-thru state)

Another thing I was hoping for some input was, currently, the 
{{ContainerScheduler}} runs in the same thread as the ContainerManager's 
AsyncDispatcher started by the ContainerManager. Also, the Scheduler is 
triggered only by events. I was wondering if there is any merit pushing these 
events into a blocking queue as they arrive and have a separate thread take 
care of them. This will preserve the serial nature of operation (and thereby 
keep the code simple by not needing synchronized collections) and will not hold 
up the dispatcher from delivering other events while the scheduler is 
scheduling.
A minor disadvantage, is that the NM will probably consume a thread that for 
the most part will be blocked on the queue. This thread could be used by one of 
the containers.


was (Author: asuresh):
Thanks for taking a look [~jianhe],

bq. Wondering why KillWhileExitingTransition is added..
I had put it in there for debugging something... Left it there since it thought 
its harmless... but, yeah looks like it does over-ride the exitcode. Will 
remove it. Good catch.

* w.r.t {{ContainerState#SCHEDULED}} : Actually, I think we should expose this. 
We currently club NEW, LOCALIZING, LOCALIZED etc. into RUNNING, but the 
container is actually not running, and is thus misleading. SCHEDULED implies 
that some of the containers dependencies (resources for localization + some 
internal queuing/scheduling policy) have not yet been met.
Prior to this, YARN-2877 had introduced the QUEUED return state. This would be 
visible to applications, if Queuing was enabled. This patch technically just 
renames QUEUED to SCHEDULED. Also, all containers will go thru the SCHEDULED 
state, not just the opportunistic ones (although, for guaranteed containers 
this will just be a pass-thru state)

Another thing I was hoping for some input was, currently, the 
{{ContainerScheduler}} runs in the same thread as the ContainerManager's 
AsyncDispatcher started by the ContainerManager. Also, the Scheduler is 
triggered only by events. I was wondering if there is any merit pushing these 
events into a blocking queue as they arrive and have a separate thread take 
care of them. This will preserve the serial nature of operation (and thereby 
keep the code simple by not needing synchronized collections) and will not hold 
up the dispatcher from delivering other events while the scheduler is 
scheduling.
A minor disadvantage, is that the NM will probably consume a thread that for 
the most part will be blocked on the queue. This thread could be used by one of 
the containers.

> Add SCHEDULE to NM container lifecycle
> --
>
> Key: YARN-4597
> URL: https://issues.apache.org/jira/browse/YARN-4597
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Chris Douglas
>Assignee: Arun Suresh
> Attachments: YARN-4597.001.patch, YARN-4597.002.patch
>
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-4597) Add SCHEDULE to NM container lifecycle

2016-01-16 Thread Chris Douglas (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15103350#comment-15103350
 ] 

Chris Douglas edited comment on YARN-4597 at 1/16/16 6:52 PM:
--

Thanks, Arun. Please feel free to take this over. It's only justified in 
context with these other changes.


was (Author: chris.douglas):
Thanks, Arun. Please feel free

> Add SCHEDULE to NM container lifecycle
> --
>
> Key: YARN-4597
> URL: https://issues.apache.org/jira/browse/YARN-4597
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Chris Douglas
>
> Currently, the NM immediately launches containers after resource 
> localization. Several features could be more cleanly implemented if the NM 
> included a separate stage for reserving resources.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)