[jira] [Updated] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

2013-09-05 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1070:
--

Attachment: YARN-1070.4.patch

Update the patch against the latest trunk.

bq. Taking a step back, this approach will work, though the code is hard to 
read for me. A very simple state machine should make this code a lot cleaner.

IMHO, the state machine will not help a lot here, because Callable is running 
on a separate thread, and is proceeding asynchronously compared to 
ContainerImpl. The container state will be changed to KILLING at any time: 
before Callable starts, when Callable is running, and after Callable is 
finished. We can check the state in many places, but the important one is the 
beginning of Callable. When the container is already at KILLING, there's no 
need to go through all the following logic. This actually behaves like 
canceling the Callable.

bq. Also, as part of ContainerLaunch.cleanupContainer(), we should try to 
cancel the Callable.

It's not necessary if we can terminate the Callable early, and will cause the 
bug in YARN-906. When cleanupContainer() is invoked, the container state is 
already KILLING, cancel will just cancel the Callable that is not started. On 
the other side, if the Callable is not started, while the container state is 
already KILLING, the Callable will terminate at very beginning. Meanwhile, a 
CONTAINER_KILLED_ON_REQUEST will be emitted. If we did cancel Callable(), we 
still need to check the container state there, and decide whether we need to 
emit a CONTAINER_KILLED_ON_REQUEST there as well, which returns to the initial 
problem of this ticket.



 ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1070
 URL: https://issues.apache.org/jira/browse/YARN-1070
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: YARN-1070.1.patch, YARN-1070.2.patch, YARN-1070.3.patch, 
 YARN-1070.4.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

2013-09-05 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1070:
--

Attachment: YARN-1070.5.patch

Fix the findbugs warning.

 ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1070
 URL: https://issues.apache.org/jira/browse/YARN-1070
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: YARN-1070.1.patch, YARN-1070.2.patch, YARN-1070.3.patch, 
 YARN-1070.4.patch, YARN-1070.5.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

2013-09-04 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1070:
--

Attachment: YARN-1070.3.patch

Thanks Vinod for your review. I've updated the patch accordingly. The important 
change in this patch is that I removed the logic of canceling 
ContainerLaunch.call(), and in call(), I checked the container state first, 
returned immediately if the container is not at LOCALIZED, and send 
CONTAINER_KILLED_ON_REQUEST if necessary.

The rationale of checking the container state is that the thread of 
ContainerLaunch.call() is scheduled and should be executed after the container 
enters LOCALIZED. As this thread can run parallel with the thread of 
ContainerImpl, the container is free to move on to some other state, which can 
be either RUNNING, EXIT_WITH_FAILURE or KILLING. The first two should be 
triggered by the event send from ContainerLaunch.call(), while KILLING is 
caused by a kill event.

Therefore, when ContainerLaunch.call() is started, we check the container 
state. If it is KILLING, ContainerLaunch.call() can stop immediately, which is 
equivalent to the cancel operation which is removed in ContainersLauncher. 
Actually, it should even be better, because Future.cancel will not terminate 
call() immediately.

On the other side, if at this point the container state is still LOCALIZED, 
call() will move on. Then, if the container state changes to KILLING in the 
midway, we just ignore it let call() finish as usual. It does no harm because 
when the container reaches KILLING, CLEANUP_CONTAINER is scheduled or is 
started.

 ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1070
 URL: https://issues.apache.org/jira/browse/YARN-1070
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: YARN-1070.1.patch, YARN-1070.2.patch, YARN-1070.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

2013-08-26 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1070:
--

Attachment: YARN-1070.2.patch

Revert the change in YARN-906 to TestContainerLaunch to fix the test failure

 ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1070
 URL: https://issues.apache.org/jira/browse/YARN-1070
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: YARN-1070.1.patch, YARN-1070.2.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

2013-08-16 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-1070:
--

Component/s: nodemanager

 ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1070
 URL: https://issues.apache.org/jira/browse/YARN-1070
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

2013-08-16 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated YARN-1070:
--

Component/s: nodemanager

 ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1070
 URL: https://issues.apache.org/jira/browse/YARN-1070
 Project: Hadoop YARN
  Issue Type: Bug
  Components: nodemanager
Reporter: Hitesh Shah



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

2013-08-16 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1070:
--

Issue Type: Sub-task  (was: Bug)
Parent: YARN-676

 ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1070
 URL: https://issues.apache.org/jira/browse/YARN-1070
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Zhijie Shen



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

2013-08-16 Thread Vinod Kumar Vavilapalli (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vinod Kumar Vavilapalli updated YARN-1070:
--

Target Version/s: 2.1.1-beta

 ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1070
 URL: https://issues.apache.org/jira/browse/YARN-1070
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Zhijie Shen



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (YARN-1070) ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at CONTAINER_CLEANEDUP_AFTER_KILL

2013-08-16 Thread Zhijie Shen (JIRA)

 [ 
https://issues.apache.org/jira/browse/YARN-1070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhijie Shen updated YARN-1070:
--

Attachment: YARN-1070.1.patch

Created the patch to fix the problem, which is that cancel() returning true 
whether call() is started or not. In fact the event needs to be emitted from 
ContainersLaunch only when call() is not started.

In addition, fix the bug bellow together in this patch.

{code}
  localResources = container.getLocalizedResources();
  if (localResources == null) {
!!!need throw here!!! RPCUtil.getRemoteException(
Unable to get local resources when Container  + containerID +
 is at  + container.getContainerState());
  }

Moreover, add the test case, which simulates that call() is started but 
!isDone().
{code}

 ContainerImpl State Machine: Invalid event: CONTAINER_KILLED_ON_REQUEST at 
 CONTAINER_CLEANEDUP_AFTER_KILL
 -

 Key: YARN-1070
 URL: https://issues.apache.org/jira/browse/YARN-1070
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: nodemanager
Reporter: Hitesh Shah
Assignee: Zhijie Shen
 Attachments: YARN-1070.1.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira