[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-25 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336707#comment-14336707
 ] 

Junping Du commented on YARN-3039:
--

Thanks [~Naganarasimha] and [~rkanter] for review and comments!

bq. I feel AM should be informed of AggregatorAddr as early as register itself 
than currently being done in ApplicationMasterService.allocate().
That's a good point. Another idea (from Vinod in offline discussion) is to add 
a blocking call in AMRMClient to get aggregator address directly from RM. 
AMRMClient can be wrapped into TimelineClient so no aggregator address or 
aggregator failure can be handled transparently. Thoughts?

bq. For NM's too, would it be better to update during registering itself (may 
be recovered during recovery, not sure though) thoughts?
I think NM case is slightly different here: NM need this knowledge whenever the 
first container of this app get allocated/launched, so get things updated in 
heartbeat sounds good enough. Isn't it? In addition, if adding a new API in 
AMRMClient can be accepted, NM will use TimelineClient too so can handle 
service discovery automatically.


bq. Was not clear about source of RMAppEventType.AGGREGATOR_UPDATE. Based on 
YARN-3030 (Aggregators collection through NM's Aux service), 
PerNodeAggregatorServer(Aux service) launches AppLevelAggregatorService, so 
will AppLevelAggregatorService inform RM about the aggregator for the 
application? and then RM will inform NM about the appAggregatorAddr as part of 
heart beat response ? if this is the flow will there be chances of race 
condition where in before NM gets appAggregatorAddr from RM, NM might require 
to post some AM container Entities/events?
I think we can discuss this flow in two scenarios, the first time launch of app 
aggregator and app aggregator failed over on another NM:
For the first time launch of app aggregator, NM aux service will bind the app 
aggregator to perNodeAggregator when AM container get allocated (per 
YARN-3030). NM will notify RM that this new appAggregator is ready for use in 
next heartbeat to RM (missing in this patch). After received this messsage from 
NM, RM with update its aggregator list and send 
RMAppEventType.AGGREGATOR_UPDATE to trigger persistent of aggregator list 
updating in RMStateStore (for RM failed over).
For app aggregator get failed over, AM or NMs (who called putEntities with 
timelineClient) will notify RM on this failure, RM verify the out of service 
for this app aggregator first and kick off rebind appAggregator to another NM's 
perNodeAggregatorService in next heartbeat comes. When hear back from this new 
NM, RM did the same thing as the 1st case.
One gap here today is we launched appAggregatorService (by NM's auxiliary 
service) whenever AM container get launched, no matter first time launch or 
rescheduled as failed before. As my early comments above - AM container failed 
over with rescheduled to other NM may not have to cause rebind of aggregator 
service just like out of service for app's aggregator may not cause AM 
container get killed. So I think appAggregatorService should get launched by NM 
automatic only in first attemp and taken care by RM in next attempts. 
About rack condition between NM heartbeat with posting entities, I don't think 
posting entities should block any major logic especially NM heartbeat. In 
addition, if we make TimelineClient can handle service discovery automatically, 
this will never happen. What do you think?

bq. Sorry for not commenting earlier. Thanks for taking this up Junping Du.
No worry. Thanks!

bq. Not using YARN-913 is fine if it's not going to make sense. I haven't 
looked too closely at it either; it just sounded like it might be helpful here.
Agree. My feeling now is service discovery get couple tightly with service 
lifecycle management. Given our app aggregator service - not inside of a 
dedicated container, but have many options, and its consumer include YARN 
components but not only AM. So I think YARN-913 may not be the best fit at this 
moment.
 [~ste...@apache.org] is the main author of YARN-913. Steve, do you have any 
comments here?

bq. Given that a particular NM is only interested in the Applications that are 
running on it, is there some way to have it only receive the aggregator info 
for those apps? This would decrease the amount of throw away data that gets 
sent.
In current patch, RM only send NM the aggregator lists for active Apps on this 
container. Please check the code in ResourceTrackerService:  
{code}
+ConcurrentMapApplicationId, String liveAppAggregatorsMap = new 
+ConcurrentHashMapApplicationId, String();
+ListApplicationId keepAliveApps = 
remoteNodeStatus.getKeepAliveApplications();
+if (keepAliveApps != null) {
+  ConcurrentMapApplicationId, RMApp rmApps = rmContext.getRMApps();
+  for (ApplicationId appId : keepAliveApps) {
+

[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-25 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336219#comment-14336219
 ] 

Robert Kanter commented on YARN-3039:
-

Sorry for not commenting earlier.  Thanks for taking this up [~djp].

Not using YARN-913 is fine if it's not going to make sense.  I haven't looked 
too closely at it either; it just sounded like it might be helpful here.  

One comment on the patch:
- Given that a particular NM is only interested in the Applications that are 
running on it, is there some way to have it only receive the aggregator info 
for those apps?  This would decrease the amount of throw away data that gets 
sent.  

Also, can you update the design doc?  Looking at the patch, it seems like some 
things have changed.  (e.g. it's using protobufs instead of REST; which I think 
makes more sense here anyway).

 [Aggregator wireup] Implement ATS writer service discovery
 --

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, YARN-3039-no-test.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-24 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335010#comment-14335010
 ] 

Junping Du commented on YARN-3039:
--

Thanks [~zjshen] for review and comments!
bq. I think so, too. RM has its own builtin aggregator, and RM directly writes 
through it.
I have a very basic question here: didn't we want a singleton app aggregator 
for all app related events, logs, etc.? Ideally, only this singleton aggregator 
can have magic to sort out app info in aggregation. If not, we can even give up 
current flow NM(s) - app aggregator(deployed on one NM) - backend and let 
NM to talk to backend directly for saving hop for traffic. Can you clarify more 
on this?

bq.  in the heartbeat, instead of always sending the snapshot of the aggregator 
address info, can we send the incremental information upon any change happens 
to the aggregator address table. Usually, the aggregator will not change it 
place often, such that we can avoid unnecessary additional traffic in most 
heartbeats.
That's a very good point for discussion. 
The interesting thing here is only we can compare with info from client (NM), 
then we can know what is alternated in server (RM) since last heartbeat. Take 
token update for example (populateKeys() in ResourceTrackerService), our 
current implementation is: we encoded master keys (ContainerTokenMasterKey and 
NMTokenMasterKey) known by NM in request, then in response we can filter out 
old keys that already known by NM. IMO, this (put everything in request, and 
put something/nothing in response) doesn't have any optimization against the 
way we put nothing in request and put everything in response, but only turn 
outbound traffic into inbound and bring compare logic in server side. Isn't it? 
Another optimization we can think here is to let client express its interested 
app aggregators on the request (with adding them to a new optional field, e.g. 
InterestedApps) when it found these info are missing or stale, and server only 
loop related app aggregators info in. NM can maintain an interested app 
aggregator list, which get updated when first time app's container get launched 
or app's aggregator info get stale (may reported in writer/reader's retry 
logic) and items from list get removed when received from heartbeat response. 
Thoughts?

bq. One addition issue related the rm state store: calling it in the update 
transition may break the app recovery. The current state instead of the final 
state will be written into the store. If RM stops and restarts at this moment, 
this app can't be recovered properly.
Thanks for reminding on this. This is something I am not 100% sure. However, 
from recoverApplication() in RMAppManager, I didn't see we cannot recover app 
in RUNNING or other state (except final states, like: killed, finished, etc.). 
Do I miss anything on this? One missing piece of code indeed here is I forget 
to repopulate aggregatorAddr from store in RMAppImpl.recover(), will add it 
back in next patch.


 [Aggregator wireup] Implement ATS writer service discovery
 --

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, YARN-3039-no-test.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-24 Thread Naganarasimha G R (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335118#comment-14335118
 ] 

Naganarasimha G R commented on YARN-3039:
-

Hi [~djp]
Thanks for the doc which gives better understanding of the flow now .
Few queries :  
* I feel AM should be informed of AggregatorAddr as early as register itself 
than currently being done in ApplicationMasterService.allocate().
* For NM's too, would it be better to update during registering itself (may be 
recovered during recovery, not sure though) thoughts ?
* Was not clear about source of RMAppEventType.AGGREGATOR_UPDATE. Based on 
YARN-3030 (Aggregators collection through NM's Aux service), 
PerNodeAggregatorServer(Aux service) launches AppLevelAggregatorService, so 
will AppLevelAggregatorService inform RM about the aggregator for the 
application? and then RM will inform NM about the appAggregatorAddr as part of 
heart beat response ? if this is the flow will there be chances of race 
condition where in before NM gets appAggregatorAddr from RM, NM might require 
to post some AM container Entities/events?

[~zjshen], 
*  bq. Ideally, only this singleton aggregator can have magic to sort out app 
info in aggregation. If not, we can even give up current flow NM(s) - app 
aggregator(deployed on one NM) - backend and let NM to talk to backend 
directly for saving hop for traffic. Can you clarify more on this?
I also want some clarification on similar lines ; whats the goal in having one 
app one aggregator ? Is it for simple aggregation of metrics related to a 
application entity or any entity(flow, flow run, app specific etc...) ? If so 
do we require to aggregate for System entities ? May be based on this it will 
be more clear to get the complete picture
* In one of the your's comments(not in this jira), you had mentioned that we 
might require to start per app aggregator only if app requests for it. In that 
case how will we capture container entities and its events if app does not 
request for per app aggregator ?

 [Aggregator wireup] Implement ATS writer service discovery
 --

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, YARN-3039-no-test.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-23 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1409#comment-1409
 ] 

Junping Du commented on YARN-3039:
--

Thanks [~sjlee0] for comments! 
bq. I'm also thinking that option 2 might be more feasible, mostly from the 
standpoint of limiting the risk. Having said that, I haven't followed YARN-913 
closely enough to see how close it is...
I was thinking the same. As discussed with [~vinodkv] offline, we prefer to 
start the work immediately based on current implemented features on YARN. 
[~rkanter], please let us know if you have different ideas here.

bq. The service discovery needs to work across all these different modes: NM 
aux service, standalone per-node daemon, and standalone per-app daemon. That 
needs to be one of the primary considerations in this.
Agree. I think things don't change here is still three counterparts - AM, NM 
and RM that need to know the service info (url for rest api), so we put RM here 
as a center point for registration. The things could be different in your modes 
mentioned above is who and how to do the registration. I would prefer some 
other JIRA, like: YARN-3033, could address these differences. Thoughts?

bq. The RM will likely not use the service discovery. For example, for RM to 
write the app started event, the timeline aggregator may not even be 
initialized yet.
That's a very good point. We need RM to write some initiative app info 
standalone. However, do we expect RM to write all app-specific info or just in 
the beginning? We have a similar case in launching app's container - the first 
AM container get launched by RM, but following containers get launched by AM. 
Do we want to follow this pattern if we want to consolidate all app info with 
only one app aggregator?

bq. If the AM fails and starts in another node, the existing per-app aggregator 
should be shut down, and started on the new node. In fact, in the aux service 
setup, that comes most naturally. So I think we should try to keep that as much 
as possible.
As I said in proposal, we should do the best effort to locate two things 
together. However, I think we also want to decouple the life cycle of these two 
things which could make things more robust. Beside case of aggregator live but 
AM die, another quick example is: AM container works fine, but aggregator on 
this NM cannot be bind/started (for some reason, e.g. port is banned, etc.). In 
those cases, we may not want to kill AM container (or aggregator service) for 
aggregation locality reason given these are rarely cases so keep simple should 
be better.

bq. We're talking about the aggregator failing as a standalone daemon, correct?
Yes and No. Even as auxiliary service of NM, aggregator could failed alone for 
some reasons, e.g. port is blocked, etc. Am I missing anything here?

 [Aggregator wireup] Implement ATS writer service discovery
 --

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333709#comment-14333709
 ] 

Hadoop QA commented on YARN-3039:
-

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12700214/YARN-3039-no-test.patch
  against trunk revision fe7a302.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:red}-1 findbugs{color}.  The patch appears to introduce 8 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager:

  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.rmapp.TestRMAppTransitions
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCResponseId
  
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.TestRMContainerImpl
  
org.apache.hadoop.yarn.server.resourcemanager.ahs.TestRMApplicationHistoryWriter
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerDynamicBehavior
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestContainerAllocation
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestQueueMappings
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestDelegationTokenRenewer
  
org.apache.hadoop.yarn.server.resourcemanager.TestFifoScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacityScheduler
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRMRPCNodeUpdates
  
org.apache.hadoop.yarn.server.resourcemanager.TestWorkPreservingRMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairSchedulerQueueACLs
  
org.apache.hadoop.yarn.server.resourcemanager.applicationsmanager.TestAMRestart
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestZKRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestLeveldbRMStateStore
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.TestSchedulerUtils
  
org.apache.hadoop.yarn.server.resourcemanager.security.TestClientToAMTokens
  
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.TestCapacitySchedulerNodeLabelUpdate
  
org.apache.hadoop.yarn.server.resourcemanager.TestContainerResourceUsage
  
org.apache.hadoop.yarn.server.resourcemanager.resourcetracker.TestRMNMRPCResponseId
  
org.apache.hadoop.yarn.server.resourcemanager.TestSubmitApplicationWithRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore
  org.apache.hadoop.yarn.server.resourcemanager.TestRMHA
  
org.apache.hadoop.yarn.server.resourcemanager.TestAMAuthorization

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/6700//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6700//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-YARN-Build/6700//artifact/patchprocess/newPatchFindbugsWarningshadoop-yarn-server-resourcemanager.html
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/6700//console

This message is automatically generated.

 [Aggregator wireup] Implement ATS writer service discovery
 --

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  

[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-23 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334159#comment-14334159
 ] 

Zhijie Shen commented on YARN-3039:
---

bq. The RM will likely not use the service discovery. For example, for RM to 
write the app started event, the timeline aggregator may not even be 
initialized yet.

I think so, too. RM has its own builtin aggregator, and RM directly writes 
through it. 

Thanks for the patch, Junping! One suggestion: in the heartbeat, instead of 
always sending the snapshot of the aggregator address info, can we send the 
incremental information upon any change happens to the aggregator address 
table. Usually, the aggregator will not change it place often, such that we can 
avoid unnecessary additional traffic in most heartbeats.

 [Aggregator wireup] Implement ATS writer service discovery
 --

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, YARN-3039-no-test.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-23 Thread Zhijie Shen (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334206#comment-14334206
 ] 

Zhijie Shen commented on YARN-3039:
---

One addition issue related the rm state store: calling it in the update 
transition may break the app recovery. The current state instead of the final 
state will be written into the store. If RM stops and restarts at this moment, 
this app can't be recovered properly.

 [Aggregator wireup] Implement ATS writer service discovery
 --

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Junping Du
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf, YARN-3039-no-test.patch


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-20 Thread Sangjin Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329737#comment-14329737
 ] 

Sangjin Lee commented on YARN-3039:
---

Thanks [~djp] for the doc!

Some high level comments:
- I'm also thinking that option 2 might be more feasible, mostly from the 
standpoint of limiting the risk. Having said that, I haven't followed YARN-913 
closely enough to see how close it is...
- The service discovery needs to work across all these different modes: NM aux 
service, standalone per-node daemon, and standalone per-app daemon. That needs 
to be one of the primary considerations in this.
- The failure scenarios need more details in their own right; for this JIRA, I 
think it is sufficient to see how it may impact the service discovery and 
design just enough.

{quote}
We need a perĀ­application logical aggregator for ATS which provides aggregator 
service in
form of REST API to: RM, AM and NMs,
{quote}
The RM will likely not use the service discovery. For example, for RM to write 
the app started event, the timeline aggregator may not even be initialized yet.

{quote}
However, AM container could be reschedule to other
node for some reason (container failure, etc.), so we cannot guarantee the two 
are
always together.
{quote}
If the AM fails and starts in another node, the existing per-app aggregator 
should be shut down, and started on the new node. In fact, in the aux service 
setup, that comes most naturally. So I think we should try to keep that as much 
as possible.

{quote}
Failure Cases: 3. Aggregator failed (only):
{quote}
We're talking about the aggregator failing as a standalone daemon, correct?



 [Aggregator wireup] Implement ATS writer service discovery
 --

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter
 Attachments: Service Binding for applicationaggregator of ATS 
 (draft).pdf


 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-3039) [Aggregator wireup] Implement ATS writer service discovery

2015-02-20 Thread Junping Du (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329084#comment-14329084
 ] 

Junping Du commented on YARN-3039:
--

Hi [~rkanter], thanks for sharing your thoughts here. 
I think as a generic, external service for YARN, YARN-913 may not meet our 
particular requirements here, like: 
- timeline service will serve as build-in service, not necessary for 
application to register service explicitly
- NM also need this aggregators info to aggregate info related to containers 
running on top of it.
- We have preference to bind service to local node of AM container
- Now, the launching of NM aggregators is not in way of YARN service container 
(see YARN-3033)
Also, I think we may not want this built-in service (as a standalone feature) 
to depends on another big feature in progress when unnecessary. Thoughts?

 [Aggregator wireup] Implement ATS writer service discovery
 --

 Key: YARN-3039
 URL: https://issues.apache.org/jira/browse/YARN-3039
 Project: Hadoop YARN
  Issue Type: Sub-task
  Components: timelineserver
Reporter: Sangjin Lee
Assignee: Robert Kanter

 Per design in YARN-2928, implement ATS writer service discovery. This is 
 essential for off-node clients to send writes to the right ATS writer. This 
 should also handle the case of AM failures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)