[ 
https://issues.apache.org/jira/browse/YARN-3334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14385577#comment-14385577
 ] 

Li Lu commented on YARN-3334:
-----------------------------

Hi [~djp], thanks for working on this! I checked the patch and it generally 
LGTM. Here are a few comments and questions:

{code}
+    threadPool.shutdown();
+    
+    while (!threadPool.isTerminated()) { // wait for all posting thread to 
finish
+      try {
+        if (!threadPool.awaitTermination(30, TimeUnit.SECONDS)) {
+          threadPool.shutdownNow(); // send interrupt to hurry them along
+        }
+      } catch (InterruptedException e) {
+        LOG.warn("Timeline client service stop interrupted!");
+        break;
+      }
+    }
{code}
If I understand this correctly, we're granting non-terminating threadPools 30 
seconds to wait them finish, and shutdown the thread pool afterwards. I'm not 
sure if we need the big while loop here? This design would work but looks a 
little bit weird compare to the sample two-phase shutdown discussed in Oracle's 
official document 
(http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html).
 Could you please double check if the sample solution there works for us? 

{code}
+          TimelineEntity entity = new TimelineEntity();
+          entity.setId(containerId.toString());
+          entity.setType(TimelineEntityType.YARN_CONTAINER.toString());
{code}
ContainerEntity is a HierarchicalTimelineEntity, which carries parent-children 
relationships for timeline entities. I suspect we have some bugs with putting 
HierarchicalTimelineEntity, but it's only my hunch. If you have time, it would 
be great if you could look into it a little bit (and fix them separately), but 
if you have some bandwidth limitations, I can take care of them when testing 
the storage layer. Asking for [~zjshen]'s confirmation here. 

{code}
+            // TODO move timeline metrics handling to a separate method
+            TimelineMetric cpuMetric = new TimelineMetric();
+            cpuMetric.setId(ContainerMetric.CPU.toString() + pId);
+            cpuMetric.addTimeSeriesData(currentTime, 
+                cpuUsageTotalCoresPercentage);
{code}
Are we just adding one time series point to each new TimelineMetric, then add 
them to the TimelineEntity? This sounds not quite economical because we're 
creating redundant TimelineMetrics for the data points in the same 
TimelineSeries. Are there any fundamental challenges to prevent us creating a 
full timeline series here? A similar question applies to memory metrics, too. 

Thanks! 

> [Event Producers] NM TimelineClient life cycle handling and container metrics 
> posting to new timeline service.
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-3334
>                 URL: https://issues.apache.org/jira/browse/YARN-3334
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: YARN-2928
>            Reporter: Junping Du
>            Assignee: Junping Du
>         Attachments: YARN-3334-demo.patch, YARN-3334-v1.patch, 
> YARN-3334-v2.patch, YARN-3334-v3.patch
>
>
> After YARN-3039, we have service discovery mechanism to pass app-collector 
> service address among collectors, NMs and RM. In this JIRA, we will handle 
> service address setting for TimelineClients in NodeManager, and put container 
> metrics to the backend storage.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to