[ 
https://issues.apache.org/jira/browse/YARN-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702456#comment-13702456
 ] 

Omkar Vinit Joshi commented on YARN-299:
----------------------------------------

I guess the patch looks good overall .. however we need an additional fix which 
might also occur. The root cause for this is more evident in YARN-820 logs.. 
Container is requesting multiple resources and RESOURCE_LOCALIZED / 
RESOURCE_FAILED events might occur for one more more resources between 
container received first RESOURCE_FAILED event and it deregister itself from 
remaining resources...therefore we might see RESOURCE_FAILED / 
RESOURCE_LOCALIZED events sent to containerImpl when resource is in DONE state 
(for different resources).... Therefore like RESOURCE_FAILED we should also 
ignore RESOURCE_LOCALIZED event.
I could see one more issue in the logs... it would be great if we fix that too 
as a part of this jira.... looks like a quick change... here in LOG.info it is 
calling toString on LocalizedResource which is not threadsafe for ref 
(LinkedList used internally). I guess grabbing writelock inside toString will 
protect it from such exceptions.. we need to check other state machines as well.

{code}
            } catch (ExecutionException e) {
              LOG.info("Failed to download rsrc " + assoc.getResource(),
                  e.getCause());
              LocalResourceRequest req = assoc.getResource().getRequest();
              publicRsrc.handle(new ResourceFailedLocalizationEvent(req,
                  e.getMessage()));
              assoc.getResource().unlock();
{code}

any thoughts?
                
> Node Manager throws 
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> RESOURCE_FAILED at DONE
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-299
>                 URL: https://issues.apache.org/jira/browse/YARN-299
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.0.1-alpha, 2.0.0-alpha
>            Reporter: Devaraj K
>            Assignee: Mayank Bansal
>         Attachments: YARN-299-trunk-1.patch
>
>
> {code:xml}
> 2012-12-31 10:36:27,844 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Can't handle this event at current state: Current: [DONE], eventType: 
> [RESOURCE_FAILED]
> org.apache.hadoop.yarn.state.InvalidStateTransitonException: Invalid event: 
> RESOURCE_FAILED at DONE
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:301)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:43)
>       at 
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:443)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:819)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerImpl.handle(ContainerImpl.java:71)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:504)
>       at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher.handle(ContainerManagerImpl.java:497)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:126)
>       at 
> org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:75)
>       at java.lang.Thread.run(Thread.java:662)
> 2012-12-31 10:36:27,845 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
>  Container container_1356792558130_0002_01_000001 transitioned from DONE to 
> null
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to