Sangjin Lee commented on YARN-3044:

bq. But the life-cycle events of container should definitely originate at the 
RM; NMs don't even know many of them. And given it is just the life-cycle 
events for containers, I think we are good w.r.t scalability. More like 5K 
writes per second, let's include them here.

I'm very much concerned about the volume of writes that the RM collector would 
need to do, and that's why we initially said NMs (not RM) would originate the 
container lifecycle events and route them to the per-app collector. If we take 
5,000 containers/sec as the order of magnitude for large/busy clusters, we 
would emit multiple lifecycle events for each container (started/stopped/...). 
So the number of writes/sec for container lifecycle events could easily be in 
the order of ~ 10k writes/sec. This would put a lot of pressure on the RM 
timeline collector, and it might have some impact on the I/O of the RM machine. 
And we don't want to start interfering with critical things that the RM needs 
to do. Considering that, I would prefer offloading it to distributed collectors.

What are the state transitions that NMs do not know about? Thoughts?

bq. My 2 cents: as to a container, both RM and NM maintain the lifecycle, but 
they look at the container from different aspects. To me, RM's container 
lifecycle sounds more resource management oriented, while NM's container 
lifecycle sounds more container execution oriented. I'm not sure which one is 
more important to users.

I think the key info is really around start time, end time, and maybe the 
container state (initializing, running, complete, etc.).

bq. IIUC TimelineWriter only understands TimelineEntity to be written, so all 
the fields present in subclasses of TimelineEntity needs to be fit into 
TimelineEntity (as part of info/isRelatedToEntities/relatesToEntities fields ) 
and similarly while reading do the vice-versa (may be in the other jira). 
correct me if my understanding is wrong. If required will try to incorporate it 
as part of this jira.

Why would that be the case? Can the RM timeline collector not use specific 
subclasses of TimelineEntity?

> [Event producers] Implement RM writing app lifecycle events to ATS
> ------------------------------------------------------------------
>                 Key: YARN-3044
>                 URL: https://issues.apache.org/jira/browse/YARN-3044
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Naganarasimha G R
>         Attachments: YARN-3044.20150325-1.patch
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.

This message was sent by Atlassian JIRA

Reply via email to