[ 
https://issues.apache.org/jira/browse/YARN-2264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Li Lu updated YARN-2264:
------------------------

    Attachment: YARN-2264-070814.patch

The inconsistent state of the DrainDispatcher is generated since atomicity is 
broken between setting drained to false and enqueuing the event. Added 
synchronized statements to dispatcher thread, on setting drained flag, and user 
thread, on handling events. This enforces synchronization orders between the 
two operations. 

> Race in DrainDispatcher can cause random test failures
> ------------------------------------------------------
>
>                 Key: YARN-2264
>                 URL: https://issues.apache.org/jira/browse/YARN-2264
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>            Assignee: Li Lu
>         Attachments: YARN-2264-070814.patch
>
>
> This is what can happen.
> This is the potential race.
> DrainDispatcher is started via serviceStart() . As a last step, this starts 
> the actual dispatcher thread (eventHandlingThread.start() - and returns 
> immediately - which means the thread may or may not have started up by the 
> time start returns.
> Event sequence: 
> UserThread: calls dispatcher.getEventHandler().handle()
> This sets drained = false, and a context switch happens.
> DispatcherThread: starts running
> DispatcherThread drained = queue.isEmpty(); -> This sets drained to true, 
> since Thread1 yielded before putting anything into the queue.
> UserThread: actual.handle(event) - which puts the event in the queue for the 
> dispatcher thread to process, and returns control.
> UserThread: dispatcher.await() - Since drained is true, this returns 
> immediately - even though there is a pending event to process.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to