OK I fixed this and verified locally. Patch is posted to YARN-4460. Patch is simple so a quick review is much appreciated. Anyone hitting this problem is more than welcome to verify. Thanks!
Li Lu On Dec 15, 2015, at 11:00, Li Lu <[email protected]<mailto:[email protected]>> wrote: Thanks Varun and Naga! I verified locally that the V2 publisher introduced in YARN-4129 caused this problem. I’ll open a JIRA and post a quick fix right away. Thanks for the information! Li Lu On Dec 14, 2015, at 21:38, Naganarasimha G R (Naga) <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: Hi Varun & Li, Yes Varun most possible reason would be what you mentioned and it has to be done in serviceInit which is taken care in V1 Publisher but missed in V2 Publisher. Entire logic present in serviceStart of V2Publisher should be moved to serviceInit. But was wondering for which event/entity ? was it in RM Recover mode ? Regards, + Naga ________________________________ From: Varun Saxena [[email protected]<mailto:[email protected]><mailto:[email protected]>] Sent: Tuesday, December 15, 2015 10:48 To: Li Lu Cc: [email protected]<mailto:[email protected]><mailto:[email protected]>; Sangjin Lee; Junping Du; Vrushali Channapattan; Joep Rottinghuis; Naganarasimha G R (Naga) Subject: Re: [Timeline V2 branch] Latest timeline v2 and SMP problem Hi Li, This is because we are registering the event in serviceStart() instead of serviceInit(). As SMP is the last service in the list, its started right in the end i.e. even after all the RPCs', UI related stuff. This can cause an app flow to start before the SMP/V2Publisher service has even started. This is what causes the issue. You want to raise JIRA for this issue or should I ? I can handle it. Regards, Varun Saxena. On Tue, Dec 15, 2015 at 8:35 AM, Li Lu <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: Thanks Sangjin. I’ll keep tracing this. Meanwhile, if anybody has reproduced the problem, please feel free to let me know. Thanks! Li Lu On Dec 14, 2015, at 18:16, Sangjin Lee <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: Can you bisect the commits to see if you can isolate which commit introduced the issue? On Mon, Dec 14, 2015 at 5:39 PM, Li Lu <[email protected]<mailto:[email protected]><mailto:[email protected]>> wrote: Hi YARN developers working on Timeline v2 (YARN-2928) branch, I just realized I’ve accidentally turned off SMP for my local Timeline v2 build. After I turned yarn.system-metrics-publisher.enabled back on, the RM fails to start with the following FATAL message: 2015-12-14 17:27:54,125 INFO ipc.Server (Server.java:run(797)) - IPC Server listener on 8033: starting 2015-12-14 17:27:54,127 FATAL event.AsyncDispatcher (AsyncDispatcher.java:dispatch(189)) - Error in dispatcher thread true java.lang.Exception: No handler for registered for class org.apache.hadoop.yarn.server.resourcemanager.metrics.AbstractSystemMetricsPubli sher$SystemMetricsEventType at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:185) at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:109) at java.lang.Thread.run(Thread.java:745) 2015-12-14 17:27:54,127 INFO event.AsyncDispatcher (AsyncDispatcher.java:register(208)) - Registering class org.apache.hadoop.yarn.serve r.resourcemanager.metrics.AbstractSystemMetricsPublisher$SystemMetricsEventType for class org.apache.hadoop.yarn.server.resourcemanager.m etrics.TimelineServiceV2Publisher$TimelineV2EventHandler Interestingly, we’re registering this class to timeline v2 handler in the next line of log. I’m wondering if this is caused by some of my missing configs, or a newly introduced issue? Has anybody on feature-YARN-2928 branch noticed this issue? Thanks! Li Lu
