[
https://issues.apache.org/jira/browse/YARN-3044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14485527#comment-14485527
]
Zhijie Shen commented on YARN-3044:
-----------------------------------
Before screening the patch details, I have some high level comments:
bq. IIUC you meant we will have RMContainerEntity having type as
YARN_RM_CONTAINER and NMContainerEntity having type as YARN_NM_CONTAINER right ?
Can we use ContainerEntity. The events from RM are RM_XXXX_EVENT, and those
from NM are NM_XXXX_EVENT.
bq. I'm very much concerned about the volume of writes that the RM collector
would need to do,
bq. I fully understand the concern from Sangjin Lee that RM may not afford tens
of thousands containers in large size cluster.
I also think publishing all container lifecycle events from NM is likely to be
a big cost in total, but I'd like to provide some point from other point of
view. Say we have a big cluster that can afford 5,000 concurrent containers. RM
have to maintain the lifecycle of these 5K containers, and I don't think a less
powerful server can manage it, right? Assume we have such a powerful server to
run the RM of a big cluster, will publishing lifecycle events be a big deal to
the server? I'm not sure, but I can provide some hints. Now each container will
write 2 events per lifecycle, and perhaps in the future we want to record each
state transition, and result in ~10 events per lifecycle. Therefore, we have 10
* 5K lifecycle events, and they won't be written at the same moment because
containers' lifecycles are usually async. Let's assume each container run for
1h and lifecycle events are uniformly distributed, in each second, there will
just be around 14 concurrent writes (for a powerful server).
I think we may overestimate the performance impact of writing NM lifecycles.
Perhaps a more reasonable performance metric is {{cost of writing lifecycle
events per container / cost of managing lifecycle per container * 100%}}. For
example, if it is 2%, I guess it will probably be acceptable.
bq. all configs will not be set as part of this so was there more planned for
this from the framework side or each application needs to take care of this on
their own to populate configuration information ?
bq. In that sense, how about letting frameworks (namely AMs) write the
configuration instead of RM?
I'm not sure if I understand this part correctly, but I incline that system
timeline data (RM/NM) is controlled by cluster config and per cluster, while
application data is controlled by framework or even per-application config. It
may have some problem if the user is able to change the former config. For
example, he can hide its application information from cluster admin.
bq. I have also incorporated the changes to support RMContainer metrics based
on configuration (Junping's comments).
Do you mean we should keep
{{yarn.resourcemanager.system-metrics-publisher.enabled}} to control RM SMP,
and and create {{yarn.nodemanager.system-metrics-publisher.enabled}} to control
NM SMP?
> [Event producers] Implement RM writing app lifecycle events to ATS
> ------------------------------------------------------------------
>
> Key: YARN-3044
> URL: https://issues.apache.org/jira/browse/YARN-3044
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: timelineserver
> Reporter: Sangjin Lee
> Assignee: Naganarasimha G R
> Attachments: YARN-3044.20150325-1.patch, YARN-3044.20150406-1.patch
>
>
> Per design in YARN-2928, implement RM writing app lifecycle events to ATS.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)