[
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385577#comment-14385577
]
Li Lu commented on YARN-3334:
-----------------------------
Hi [~djp], thanks for working on this! I checked the patch and it generally
LGTM. Here are a few comments and questions:
{code}
+ threadPool.shutdown();
+
+ while (!threadPool.isTerminated()) { // wait for all posting thread to
finish
+ try {
+ if (!threadPool.awaitTermination(30, TimeUnit.SECONDS)) {
+ threadPool.shutdownNow(); // send interrupt to hurry them along
+ }
+ } catch (InterruptedException e) {
+ LOG.warn("Timeline client service stop interrupted!");
+ break;
+ }
+ }
{code}
If I understand this correctly, we're granting non-terminating threadPools 30
seconds to wait them finish, and shutdown the thread pool afterwards. I'm not
sure if we need the big while loop here? This design would work but looks a
little bit weird compare to the sample two-phase shutdown discussed in Oracle's
official document
(http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html).
Could you please double check if the sample solution there works for us?
{code}
+ TimelineEntity entity = new TimelineEntity();
+ entity.setId(containerId.toString());
+ entity.setType(TimelineEntityType.YARN_CONTAINER.toString());
{code}
ContainerEntity is a HierarchicalTimelineEntity, which carries parent-children
relationships for timeline entities. I suspect we have some bugs with putting
HierarchicalTimelineEntity, but it's only my hunch. If you have time, it would
be great if you could look into it a little bit (and fix them separately), but
if you have some bandwidth limitations, I can take care of them when testing
the storage layer. Asking for [~zjshen]'s confirmation here.
{code}
+ // TODO move timeline metrics handling to a separate method
+ TimelineMetric cpuMetric = new TimelineMetric();
+ cpuMetric.setId(ContainerMetric.CPU.toString() + pId);
+ cpuMetric.addTimeSeriesData(currentTime,
+ cpuUsageTotalCoresPercentage);
{code}
Are we just adding one time series point to each new TimelineMetric, then add
them to the TimelineEntity? This sounds not quite economical because we're
creating redundant TimelineMetrics for the data points in the same
TimelineSeries. Are there any fundamental challenges to prevent us creating a
full timeline series here? A similar question applies to memory metrics, too.
Thanks!
> [Event Producers] NM TimelineClient life cycle handling and container metrics
> posting to new timeline service.
> --------------------------------------------------------------------------------------------------------------
>
> Key: YARN-3334
> URL: https://issues.apache.org/jira/browse/YARN-3334
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: nodemanager
> Affects Versions: YARN-2928
> Reporter: Junping Du
> Assignee: Junping Du
> Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch,
> YARN-3334-v2.patch, YARN-3334-v3.patch
>
>
> After YARN-3039, we have service discovery mechanism to pass app-collector
> service address among collectors, NMs and RM. In this JIRA, we will handle
> service address setting for TimelineClients in NodeManager, and put container
> metrics to the backend storage.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)