[jira] [Assigned] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed in TimelineCollector

Haibo Chen (JIRA) Wed, 22 Mar 2017 16:36:11 -0700

     [ 
https://issues.apache.org/jira/browse/YARN-6376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Haibo Chen reassigned YARN-6376:
--------------------------------

    Assignee: Haibo Chen

> Exceptions caused by synchronous putEntities requests can be swallowed in 
> TimelineCollector
> -------------------------------------------------------------------------------------------
>
>                 Key: YARN-6376
>                 URL: https://issues.apache.org/jira/browse/YARN-6376
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: ATSv2
>    Affects Versions: 3.0.0-alpha2
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>            Priority: Critical
>              Labels: yarn-5355-merge-blocker
>
> TimelineCollector.putEntitities() is currently implemented by calling 
> TimelineWriter.write() followed by TimelineWriter.flush(). Given 
> HBaseTimelineWriter.write() is an asynchronous operation, it is possible that 
> TimelineClient sends a synchronous putEntities() request for critical data, 
> but never gets back an exception even though the HBase write request to store 
> the entities may have failed. 
> This is due to a race condition between the WriterFlushThread in 
> TimelineCollectorManager and web threads handling synchronous putEntities() 
> requests. Entities are first put into the buffer by the web thread, it is 
> possible that before the web thread invokes writer.flush(), WriterFlushThread 
> is fired up to flush the writer. If the entities were not successfully 
> written to the backend during flush, the WriterFlushThread would just simply 
> log an error, whereas the web thread would never get an exception out from 
> its writer.flush() invocation. This is bad because the reason of 
> TimelineClient sending synchronously putEntities() is to retry upon any 
> exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (YARN-6376) Exceptions caused by synchronous putEntities requests can be swallowed in TimelineCollector

Reply via email to