[ https://issues.apache.org/jira/browse/YARN-3045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14642150#comment-14642150 ]
Naganarasimha G R commented on YARN-3045: ----------------------------------------- Hi [~djp] bq. 1.what we want to differentiate here is what kind of events are critical (so writer client in TimelineCollector could flush to backend storage after written them) and what kinds of events are not so critical Well was aware that priority was not to differentiate the containers but for the events of it, but i thought you mentioned for the purpose of better querying rather than the purpose of writing it. I have not gone through the writer code completely but is there any caching which you want to flush if the event priority is high ? Also was thinking whether we need to change the Writer/Collector API to mention the criticality of the event being published? bq. From an initiative thinking, some important app/container events include: INIT_APPLICATION, INIT_CONTAINER, FINISH_APPLICATION, APPLICATION_CONTAINER_FINISHED, APPLICATION_LOG_HANDLING_FAILED, while unimportant events could include: APPLICATION_INITED, APPLICATION_RESOURCES_CLEANEDUP, APPLICATION_LOG_HANDLING_INITED, APPLICATION_LOG_HANDLING_FINISHED, etc. So from NM side we want to publish events for ApplicationEntity and ContainerEntity, but based on the title of this jira i thought scope of this jira is to handle only ContainerEntities from NM side, is it better to handle events related Application entities specific to a given NM in another Jira? but i can try to ensure required foundation is done in NM side in this jira as part of your other comments, Thoughts? Also event has just id but NM related Application events will have the same event ID in different NM's so would it be something like {{INIT_APPLICATION_<NODE_ID>}} ? bq. 2. We should have some handy method to turn these app/container events to TimelineEvent and publish these events in a consensus way rather than publish one type of event with one method. bq. 3. We don't need to create new container events but should log existing YARN app/container events that happen in NM. If we really think some important events are missing in YARN, we can have futher discussions later after timeline service v2 in good shape. +1 for this thought, had the same initial hitch as in future if we add more events than unnecessary create event and methods in publisher, but for the initial version thought will have approach similar to RM and ATSV1. But i feel better to handle now than refactor later on. But i can think of couple of approaches here # Approach as you mentioned inside the app/container transitions in the NM side publish the event containing the container/app information. May be in some cases like creation of app or container caller can publish the events (like Container created so as to capture the creation time rather than ) # In ContainerEventDispatcher,ApplicationEventDispatcher & rsrcLocalizationSrvc after handling it can by default call different handlers of NMTimeLinePublisher(inner classes) to handle the respective events. Specific req events can be handled and others can be just ignored. # Source itself can create the entity and the event object and NMTimelinePublisher can expose a method to take timeline objects add it to Async Dispatcher and event handler will just call the client to publish the event/entity. bq. 4. It looks like NMTimelinePublisher should be used by ContainerManager, Container, ResourceLocalizationService and Log Handler. Move it to NMContext should be convenient to use for other components. Will take care based on the approach we take as per prev step. bq. 5. Container Resource Usage event may not be necessary given we already have metrics update and will do aggregation according to metrics update.bq. 1.what we want to differentiate here is what kind of events are critical (so writer client in TimelineCollector could flush to backend storage after written them) and what kinds of events are not so critical Was not clear about the comment, IIRC Zhijjie in the meeting also mentioned that i am handling removing threaded model of publishing container metrics statistics as part of this jira. May be i am missing some other jira which you are already working on, may be can you englighten me about it ? > [Event producers] Implement NM writing container lifecycle events to ATS > ------------------------------------------------------------------------ > > Key: YARN-3045 > URL: https://issues.apache.org/jira/browse/YARN-3045 > Project: Hadoop YARN > Issue Type: Sub-task > Components: timelineserver > Reporter: Sangjin Lee > Assignee: Naganarasimha G R > Attachments: YARN-3045-YARN-2928.002.patch, > YARN-3045-YARN-2928.003.patch, YARN-3045-YARN-2928.004.patch, > YARN-3045-YARN-2928.005.patch, YARN-3045-YARN-2928.006.patch, > YARN-3045.20150420-1.patch > > > Per design in YARN-2928, implement NM writing container lifecycle events and > container system metrics to ATS. -- This message was sent by Atlassian JIRA (v6.3.4#6332)