[ 
https://issues.apache.org/jira/browse/YARN-3046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14494233#comment-14494233
 ] 

Junping Du commented on YARN-3046:
----------------------------------

Thanks [~zjshen] for review and comments!
bq. I'm not sure if we should have a MR config to determine is new or old 
timeline service. If this MR config is set to true, but YARN cluster is still 
setup with old timeline service. It still doesn't work.
Theoretically, the most beautiful solution is to let applications (MR, DS, 
etc.) doesn't aware any version of timeline service. However, we already 
decided to go with different methods/structures between v1 and v2 for 
TimelineClient, so application have to be aware of which version timeline 
service get used. 
The next option is to let application figure out timeline related info from 
YARN/RM, it can be done through registerApplicationMaster() in 
ApplicationMasterProtocol with return value for service "off", "v1_on", or 
"v2_on".
The last option is as v1 patch shows which along the existing way for v1 
service to enable timeline service in a separated configuration: 
MRJobConfig.MAPREDUCE_JOB_EMIT_TIMELINE_DATA.
Personally, I would prefer the 2nd option. The reason is just like you 
mentioned, application owner doesn't have to aware RM/YARN infrastructure 
details. However, this need change to YARN AM protocol, and changes on 
different applications (distributed shell, etc.) and mark existing MR 
configuration deprecated (or it would have conflict in principle of similar 
configurations). I would prefer to file a separated JIRA to track this more 
carefully as this is important but not the focus of this JIRA's scope. What do 
you think? 

bq. Node need to have JobHistoryEventUtils, you can move util method to 
JobHistoryUtils if you want.
I tried to do so before I created JobHistoryEventUtils. However, I found we 
cannot do it because JobHistoryUtils is in hadoop-mapreduce-client-common 
component, but some consumer of method is in hadoop-mapreduce-client-core 
component (like: ReduceAttemptFinishedEvent, TaskAttemptFinishedEvent, etc.). 
Currently, hadoop-mapreduce-client-common has dependency on 
hadoop-mapreduce-client-core, so we don't allow these events under 
hadoop-mapreduce-client-core to depend on JobHistoryUtils which will cause 
bidirectional dependency issue. The bad news is we cannot move JobHistoryUtils 
to  hadoop-mapreduce-client-core either, because it has reference to other 
classes (like: MRApps) that still in hadoop-mapreduce-client-common. That's why 
I create JobHistoryEventUtils for shared methods.

bq. In the current way of shutting down the threadpool, is it guaranteed that 
the pending entity is going to be published before shutting down?
It will have delay (60 secs) to wait pending entity get posted, and the delay 
is typically much larger than service discovery time (typically saying, 
heartbeat interval, not counting collector failed over case) and timeline 
entity REST posting time. It also larger than every entity posting time in case 
of failure with maximum retry (30 * 1 sec). So I think it could be safe to do 
so here.

I will address other comments in new patch.


> [Event producers] Implement MapReduce AM writing some MR metrics to ATS
> -----------------------------------------------------------------------
>
>                 Key: YARN-3046
>                 URL: https://issues.apache.org/jira/browse/YARN-3046
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Sangjin Lee
>            Assignee: Junping Du
>         Attachments: YARN-3046-no-test-v2.patch, YARN-3046-no-test.patch, 
> YARN-3046-v1-rebase.patch, YARN-3046-v1.patch
>
>
> Per design in YARN-2928, select a handful of MR metrics (e.g. HDFS bytes 
> written) and have the MR AM write the framework-specific metrics to ATS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to