Naganarasimha G R commented on YARN-3044:

Thanks [~vinodkv],[~vrushalic], [~sjlee0] & [~zjshen] for reviewing and 
providing your view points :
1> {{"source of life-cycle events of container"}} is a debatable topic, to 
summarize pro's and cons when run in NM:

* Even though the load is not too high when compared to publishing of container 
metrics, life cycle events might have considerable load for a large cluster as 
explained by [~sjlee0]. So i feel better to get it distributed in this aspect
* if start and end time of life cycle events are logged from NM it will be 
easier to analyze flow of container as it is actual time when it was started
* IMO it would be good to have all the metrics and events are raised from NM 
itself as there might be a possibility of race condition if container entities 
are raised from RM and metrics and few other life cycle events from NM for ex. 
when RM is slow to dispatch the events and NM is faster in doing it. (though 
hbase as storage will be able to handle it well but not sure about the other 
storages we are planning to )
* start and end time of life cycle events might not match from what is 
displayed from RM (web ui etc..) 
* start and end time of life cycle events in terms of scheduling it might not 
be as accurate as it would have been done from RM.
Please correct me on these and add on if i have missed any.

2> ??But the life-cycle events of container should definitely originate at the 
RM; NMs don't even know many of them.??
Not much aware on this, can you please eloborate on what might be missed ?

3> ??Why would that be the case? Can the RM timeline collector not use specific 
subclasses of TimelineEntity??
Well its not the limitation at RM timeline collector which i am trying to 
mention, but the writer interface is like
Writer would not be aware whether client is writing ApplicationEntity or 
AppAttemptEntity.IIUC it will just try to write 
the fields of the TimelineEntity to the storage. May be if its just storing 
entity as an json object directly to storage it might not be an issue but it 
will not be the case in hbase column storage right ?

4> ??My suggestion is that we start with reimplementing what we provided in YTS 
v1, and add more timeline data on demand later??
true that to start of with this would be sufficent, but in future i would liked 
to capture all the events as currently to analyze/debug issues with container 
we usually start searching the NM and RM logs with container string to find 
what state the application/container is in. ur opinion ?

> [Event producers] Implement RM writing app lifecycle events to ATS
> ------------------------------------------------------------------
>                 Key: YARN-3044
>                 URL: https://issues.apache.org/jira/browse/YARN-3044
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Naganarasimha G R
>         Attachments: YARN-3044.20150325-1.patch
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.

This message was sent by Atlassian JIRA

Reply via email to