[
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381453#comment-14381453
]
Naganarasimha G R commented on YARN-3044:
-----------------------------------------
Thanks [~vinodkv],[~vrushalic], [~sjlee0] & [~zjshen] for reviewing and
providing your view points :
1> {{"source of life-cycle events of container"}} is a debatable topic, to
summarize pro's and cons when run in NM:
Pros
* Even though the load is not too high when compared to publishing of container
metrics, life cycle events might have considerable load for a large cluster as
explained by [~sjlee0]. So i feel better to get it distributed in this aspect
* if start and end time of life cycle events are logged from NM it will be
easier to analyze flow of container as it is actual time when it was started
* IMO it would be good to have all the metrics and events are raised from NM
itself as there might be a possibility of race condition if container entities
are raised from RM and metrics and few other life cycle events from NM for ex.
when RM is slow to dispatch the events and NM is faster in doing it. (though
hbase as storage will be able to handle it well but not sure about the other
storages we are planning to )
Cons
* start and end time of life cycle events might not match from what is
displayed from RM (web ui etc..)
* start and end time of life cycle events in terms of scheduling it might not
be as accurate as it would have been done from RM.
Please correct me on these and add on if i have missed any.
2> ??But the life-cycle events of container should definitely originate at the
RM; NMs don't even know many of them.??
Not much aware on this, can you please eloborate on what might be missed ?
3> ??Why would that be the case? Can the RM timeline collector not use specific
subclasses of TimelineEntity??
Well its not the limitation at RM timeline collector which i am trying to
mention, but the writer interface is like
{{TimelineWriter.write(TimelineEntities)}}
Writer would not be aware whether client is writing ApplicationEntity or
AppAttemptEntity.IIUC it will just try to write
the fields of the TimelineEntity to the storage. May be if its just storing
entity as an json object directly to storage it might not be an issue but it
will not be the case in hbase column storage right ?
4> ??My suggestion is that we start with reimplementing what we provided in YTS
v1, and add more timeline data on demand later??
true that to start of with this would be sufficent, but in future i would liked
to capture all the events as currently to analyze/debug issues with container
we usually start searching the NM and RM logs with container string to find
what state the application/container is in. ur opinion ?
> [Event producers] Implement RM writing app lifecycle events to ATS
> ------------------------------------------------------------------
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Sangjin Lee
> Assignee: Naganarasimha G R
> Attachments: YARN-3044.20150325-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)