[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336707#comment-14336707 ] Junping Du commented on YARN-3039: -- Thanks [~Naganarasimha] and [~rkanter] for review and comments! bq. I feel AM should be informed of AggregatorAddr as early as register itself than currently being done in ApplicationMasterService.allocate(). That's a good point. Another idea (from Vinod in offline discussion) is to add a blocking call in AMRMClient to get aggregator address directly from RM. AMRMClient can be wrapped into TimelineClient so no aggregator address or aggregator failure can be handled transparently. Thoughts? bq. For NM's too, would it be better to update during registering itself (may be recovered during recovery, not sure though) thoughts? I think NM case is slightly different here: NM need this knowledge whenever the first container of this app get allocated/launched, so get things updated in heartbeat sounds good enough. Isn't it? In addition, if adding a new API in AMRMClient can be accepted, NM will use TimelineClient too so can handle service discovery automatically. bq. Was not clear about source of RMAppEventType.AGGREGATOR_UPDATE. Based on YARN-3030 (Aggregators collection through NM's Aux service), PerNodeAggregatorServer(Aux service) launches AppLevelAggregatorService, so will AppLevelAggregatorService inform RM about the aggregator for the application? and then RM will inform NM about the appAggregatorAddr as part of heart beat response ? if this is the flow will there be chances of race condition where in before NM gets appAggregatorAddr from RM, NM might require to post some AM container Entities/events? I think we can discuss this flow in two scenarios, the first time launch of app aggregator and app aggregator failed over on another NM: For the first time launch of app aggregator, NM aux service will bind the app aggregator to perNodeAggregator when AM container get allocated (per YARN-3030). NM will notify RM that this new appAggregator is ready for use in next heartbeat to RM (missing in this patch). After received this messsage from NM, RM with update its aggregator list and send RMAppEventType.AGGREGATOR_UPDATE to trigger persistent of aggregator list updating in RMStateStore (for RM failed over). For app aggregator get failed over, AM or NMs (who called putEntities with timelineClient) will notify RM on this failure, RM verify the out of service for this app aggregator first and kick off rebind appAggregator to another NM's perNodeAggregatorService in next heartbeat comes. When hear back from this new NM, RM did the same thing as the 1st case. One gap here today is we launched appAggregatorService (by NM's auxiliary service) whenever AM container get launched, no matter first time launch or rescheduled as failed before. As my early comments above - AM container failed over with rescheduled to other NM may not have to cause rebind of aggregator service just like out of service for app's aggregator may not cause AM container get killed. So I think appAggregatorService should get launched by NM automatic only in first attemp and taken care by RM in next attempts. About rack condition between NM heartbeat with posting entities, I don't think posting entities should block any major logic especially NM heartbeat. In addition, if we make TimelineClient can handle service discovery automatically, this will never happen. What do you think? bq. Sorry for not commenting earlier. Thanks for taking this up Junping Du. No worry. Thanks! bq. Not using YARN-913 is fine if it's not going to make sense. I haven't looked too closely at it either; it just sounded like it might be helpful here. Agree. My feeling now is service discovery get couple tightly with service lifecycle management. Given our app aggregator service - not inside of a dedicated container, but have many options, and its consumer include YARN components but not only AM. So I think YARN-913 may not be the best fit at this moment. [~ste...@apache.org] is the main author of YARN-913. Steve, do you have any comments here? bq. Given that a particular NM is only interested in the Applications that are running on it, is there some way to have it only receive the aggregator info for those apps? This would decrease the amount of throw away data that gets sent. In current patch, RM only send NM the aggregator lists for active Apps on this container. Please check the code in ResourceTrackerService: {code} +ConcurrentMapApplicationId, String liveAppAggregatorsMap = new +ConcurrentHashMapApplicationId, String(); +ListApplicationId keepAliveApps = remoteNodeStatus.getKeepAliveApplications(); +if (keepAliveApps != null) { + ConcurrentMapApplicationId, RMApp rmApps = rmContext.getRMApps(); + for (ApplicationId appId : keepAliveApps) { +
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336219#comment-14336219 ] Robert Kanter commented on YARN-3039: - Sorry for not commenting earlier. Thanks for taking this up [~djp]. Not using YARN-913 is fine if it's not going to make sense. I haven't looked too closely at it either; it just sounded like it might be helpful here. One comment on the patch: - Given that a particular NM is only interested in the Applications that are running on it, is there some way to have it only receive the aggregator info for those apps? This would decrease the amount of throw away data that gets sent. Also, can you update the design doc? Looking at the patch, it seems like some things have changed. (e.g. it's using protobufs instead of REST; which I think makes more sense here anyway). [Aggregator wireup] Implement ATS writer service discovery -- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, YARN-3039-no-test.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335010#comment-14335010 ] Junping Du commented on YARN-3039: -- Thanks [~zjshen] for review and comments! bq. I think so, too. RM has its own builtin aggregator, and RM directly writes through it. I have a very basic question here: didn't we want a singleton app aggregator for all app related events, logs, etc.? Ideally, only this singleton aggregator can have magic to sort out app info in aggregation. If not, we can even give up current flow NM(s) - app aggregator(deployed on one NM) - backend and let NM to talk to backend directly for saving hop for traffic. Can you clarify more on this? bq. in the heartbeat, instead of always sending the snapshot of the aggregator address info, can we send the incremental information upon any change happens to the aggregator address table. Usually, the aggregator will not change it place often, such that we can avoid unnecessary additional traffic in most heartbeats. That's a very good point for discussion. The interesting thing here is only we can compare with info from client (NM), then we can know what is alternated in server (RM) since last heartbeat. Take token update for example (populateKeys() in ResourceTrackerService), our current implementation is: we encoded master keys (ContainerTokenMasterKey and NMTokenMasterKey) known by NM in request, then in response we can filter out old keys that already known by NM. IMO, this (put everything in request, and put something/nothing in response) doesn't have any optimization against the way we put nothing in request and put everything in response, but only turn outbound traffic into inbound and bring compare logic in server side. Isn't it? Another optimization we can think here is to let client express its interested app aggregators on the request (with adding them to a new optional field, e.g. InterestedApps) when it found these info are missing or stale, and server only loop related app aggregators info in. NM can maintain an interested app aggregator list, which get updated when first time app's container get launched or app's aggregator info get stale (may reported in writer/reader's retry logic) and items from list get removed when received from heartbeat response. Thoughts? bq. One addition issue related the rm state store: calling it in the update transition may break the app recovery. The current state instead of the final state will be written into the store. If RM stops and restarts at this moment, this app can't be recovered properly. Thanks for reminding on this. This is something I am not 100% sure. However, from recoverApplication() in RMAppManager, I didn't see we cannot recover app in RUNNING or other state (except final states, like: killed, finished, etc.). Do I miss anything on this? One missing piece of code indeed here is I forget to repopulate aggregatorAddr from store in RMAppImpl.recover(), will add it back in next patch. [Aggregator wireup] Implement ATS writer service discovery -- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, YARN-3039-no-test.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335118#comment-14335118 ] Naganarasimha G R commented on YARN-3039: - Hi [~djp] Thanks for the doc which gives better understanding of the flow now . Few queries : * I feel AM should be informed of AggregatorAddr as early as register itself than currently being done in ApplicationMasterService.allocate(). * For NM's too, would it be better to update during registering itself (may be recovered during recovery, not sure though) thoughts ? * Was not clear about source of RMAppEventType.AGGREGATOR_UPDATE. Based on YARN-3030 (Aggregators collection through NM's Aux service), PerNodeAggregatorServer(Aux service) launches AppLevelAggregatorService, so will AppLevelAggregatorService inform RM about the aggregator for the application? and then RM will inform NM about the appAggregatorAddr as part of heart beat response ? if this is the flow will there be chances of race condition where in before NM gets appAggregatorAddr from RM, NM might require to post some AM container Entities/events? [~zjshen], * bq. Ideally, only this singleton aggregator can have magic to sort out app info in aggregation. If not, we can even give up current flow NM(s) - app aggregator(deployed on one NM) - backend and let NM to talk to backend directly for saving hop for traffic. Can you clarify more on this? I also want some clarification on similar lines ; whats the goal in having one app one aggregator ? Is it for simple aggregation of metrics related to a application entity or any entity(flow, flow run, app specific etc...) ? If so do we require to aggregate for System entities ? May be based on this it will be more clear to get the complete picture * In one of the your's comments(not in this jira), you had mentioned that we might require to start per app aggregator only if app requests for it. In that case how will we capture container entities and its events if app does not request for per app aggregator ? [Aggregator wireup] Implement ATS writer service discovery -- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, YARN-3039-no-test.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1409#comment-1409 ] Junping Du commented on YARN-3039: -- Thanks [~sjlee0] for comments! bq. I'm also thinking that option 2 might be more feasible, mostly from the standpoint of limiting the risk. Having said that, I haven't followed YARN-913 closely enough to see how close it is... I was thinking the same. As discussed with [~vinodkv] offline, we prefer to start the work immediately based on current implemented features on YARN. [~rkanter], please let us know if you have different ideas here. bq. The service discovery needs to work across all these different modes: NM aux service, standalone per-node daemon, and standalone per-app daemon. That needs to be one of the primary considerations in this. Agree. I think things don't change here is still three counterparts - AM, NM and RM that need to know the service info (url for rest api), so we put RM here as a center point for registration. The things could be different in your modes mentioned above is who and how to do the registration. I would prefer some other JIRA, like: YARN-3033, could address these differences. Thoughts? bq. The RM will likely not use the service discovery. For example, for RM to write the app started event, the timeline aggregator may not even be initialized yet. That's a very good point. We need RM to write some initiative app info standalone. However, do we expect RM to write all app-specific info or just in the beginning? We have a similar case in launching app's container - the first AM container get launched by RM, but following containers get launched by AM. Do we want to follow this pattern if we want to consolidate all app info with only one app aggregator? bq. If the AM fails and starts in another node, the existing per-app aggregator should be shut down, and started on the new node. In fact, in the aux service setup, that comes most naturally. So I think we should try to keep that as much as possible. As I said in proposal, we should do the best effort to locate two things together. However, I think we also want to decouple the life cycle of these two things which could make things more robust. Beside case of aggregator live but AM die, another quick example is: AM container works fine, but aggregator on this NM cannot be bind/started (for some reason, e.g. port is banned, etc.). In those cases, we may not want to kill AM container (or aggregator service) for aggregation locality reason given these are rarely cases so keep simple should be better. bq. We're talking about the aggregator failing as a standalone daemon, correct? Yes and No. Even as auxiliary service of NM, aggregator could failed alone for some reasons, e.g. port is blocked, etc. Am I missing anything here? [Aggregator wireup] Implement ATS writer service discovery -- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Attachments: Service Binding for applicationaggregator of ATS (draft).pdf Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333709#comment-14333709 ] Hadoop QA commented on YARN-3039: - {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12700214/YARN-3039-no-test.patch against trunk revision fe7a302. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:red}-1 findbugs{color}. The patch appears to introduce 8 new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager: org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore org.apache.hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestRMNMRPCResponseId org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore org.apache.hadoop.yarn.server.resourcemanager.TestRMHA org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization Test results: https://builds.apache.org/job/PreCommit-YARN-Build/6700//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6700//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-YARN-Build/6700//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6700//console This message is automatically generated. [Aggregator wireup] Implement ATS writer service discovery -- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334159#comment-14334159 ] Zhijie Shen commented on YARN-3039: --- bq. The RM will likely not use the service discovery. For example, for RM to write the app started event, the timeline aggregator may not even be initialized yet. I think so, too. RM has its own builtin aggregator, and RM directly writes through it. Thanks for the patch, Junping! One suggestion: in the heartbeat, instead of always sending the snapshot of the aggregator address info, can we send the incremental information upon any change happens to the aggregator address table. Usually, the aggregator will not change it place often, such that we can avoid unnecessary additional traffic in most heartbeats. [Aggregator wireup] Implement ATS writer service discovery -- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, YARN-3039-no-test.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334206#comment-14334206 ] Zhijie Shen commented on YARN-3039: --- One addition issue related the rm state store: calling it in the update transition may break the app recovery. The current state instead of the final state will be written into the store. If RM stops and restarts at this moment, this app can't be recovered properly. [Aggregator wireup] Implement ATS writer service discovery -- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Junping Du Attachments: Service Binding for applicationaggregator of ATS (draft).pdf, YARN-3039-no-test.patch Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329737#comment-14329737 ] Sangjin Lee commented on YARN-3039: --- Thanks [~djp] for the doc! Some high level comments: - I'm also thinking that option 2 might be more feasible, mostly from the standpoint of limiting the risk. Having said that, I haven't followed YARN-913 closely enough to see how close it is... - The service discovery needs to work across all these different modes: NM aux service, standalone per-node daemon, and standalone per-app daemon. That needs to be one of the primary considerations in this. - The failure scenarios need more details in their own right; for this JIRA, I think it is sufficient to see how it may impact the service discovery and design just enough. {quote} We need a perĀapplication logical aggregator for ATS which provides aggregator service in form of REST API to: RM, AM and NMs, {quote} The RM will likely not use the service discovery. For example, for RM to write the app started event, the timeline aggregator may not even be initialized yet. {quote} However, AM container could be reschedule to other node for some reason (container failure, etc.), so we cannot guarantee the two are always together. {quote} If the AM fails and starts in another node, the existing per-app aggregator should be shut down, and started on the new node. In fact, in the aux service setup, that comes most naturally. So I think we should try to keep that as much as possible. {quote} Failure Cases: 3. Aggregator failed (only): {quote} We're talking about the aggregator failing as a standalone daemon, correct? [Aggregator wireup] Implement ATS writer service discovery -- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Attachments: Service Binding for applicationaggregator of ATS (draft).pdf Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery
[ https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329084#comment-14329084 ] Junping Du commented on YARN-3039: -- Hi [~rkanter], thanks for sharing your thoughts here. I think as a generic, external service for YARN, YARN-913 may not meet our particular requirements here, like: - timeline service will serve as build-in service, not necessary for application to register service explicitly - NM also need this aggregators info to aggregate info related to containers running on top of it. - We have preference to bind service to local node of AM container - Now, the launching of NM aggregators is not in way of YARN service container (see YARN-3033) Also, I think we may not want this built-in service (as a standalone feature) to depends on another big feature in progress when unnecessary. Thoughts? [Aggregator wireup] Implement ATS writer service discovery -- Key: YARN-3039 URL: https://issues.apache.org/jira/browse/YARN-3039 Project: Hadoop YARN Issue Type: Sub-task Components: timelineserver Reporter: Sangjin Lee Assignee: Robert Kanter Per design in YARN-2928, implement ATS writer service discovery. This is essential for off-node clients to send writes to the right ATS writer. This should also handle the case of AM failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)