[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-09-10 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13763470#comment-13763470
 ] 

Karthik Kambatla commented on YARN-1027:


Did some testing with several transitions to Standby and Active back and forth, 
and ran MR jobs when in Active mode.
# The Standby mode (389719 objects worth 46661952 bytes) indeed has fewer 
objects and uses less memory compared to the Active mode (399819 objects worth 
50104584 bytes).
# The applicationId has the same timestamp from when the RM started, and starts 
issuing ids starting from 1. This leads to issues ranging from client-side 
failures due to entries in .staging/ to jobs hanging. Once enough jobs are 
killed, subsequent jobs can be run as usual. To address this, I think it is 
safe to reset the timestamp to when the RM becomes Active.
# The WebUI behaves as expected.

Regarding more involved tests, I was thinking of writing a 
MiniYARNCluster-based one that checks if the RPC servers are shutdown in 
Standby mode. We can check if a client can request applicationId etc. Is it 
okay for these tests to live in hadoop-yarn-client. Or, would it make sense to 
create a separate module for such end-to-end tests, including future HA tests, 
stress tests etc.?

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, yarn-1027-5.patch, 
 yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-09-09 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13762557#comment-13762557
 ] 

Karthik Kambatla commented on YARN-1027:


bq. What happens if we call this method when the RM is in standby mode? I am 
wondering if we may be able to call this during that time and verify that the 
RM is indeed not active.

These particular MockRM methods work on any inited RM - even standby mode. The 
tests for the Standby mode should be on a MiniYARNCluster. Will try to work 
those in.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, yarn-1027-5.patch, 
 yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-09-08 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13761524#comment-13761524
 ] 

Bikas Saha commented on YARN-1027:
--

This code is confusing. The state shouldnt be initializing if the service is 
stopped. Can we open a jira to add a meaningful state to HAServiceState and 
refer to that jira in this code so that we can fix it in that jira too. 
{code}
+  public void serviceStop() throws Exception {
+// Stop all services
+transitionToStandby();
+
+// Update haState as RM can no longer be active
+haState = HAServiceState.INITIALIZING;
+super.serviceStop();
{code}

Lets not leave orphan TODOs in the code. Please refer to YARN-1068 or open a 
new jira.
{code}
+// TODO: When automatic failover is enabled, check if transition should be
+// allowed for this request
{code}

In transtionToStandby() we are changing state to STANDBY after stopping all 
services. This is fine for now. We must keep this in mind later on when we 
start having ha-aware alwaysOn services. They need to stop signalling the 
ActiveServices before we stop them. Eg. RPC services would need to start 
rejecting requests before we stop the activeServices.

createAndStartActiveServices() and related methods should be package visibility 
and not protected. Protected would mean that we intend a derived class to see 
these methods too.

Is the commented code going to be uncommented or removed? The code is valid and 
should work in an active state. So it should probably be uncommented. What 
happens if we call this method when the RM is in standby mode? I am wondering 
if we may be able to call this during that time and verify that the RM is 
indeed not active.
{code}
+  private void checkActiveRMisFunctional() {
+try {
+  rm.getNewAppId();
+//  rm.registerNode(node1, 2048);
+  rm.submitApp(1024);
{code}

Locking on the RMHAServiceProtocol is confusing. Some public methods are 
synchronized while others are not. Will these lead to race conditions in the 
future. How about we make them all public synchronized since they are not 
expected to be high performance and so heavy locking is fine. Caveat to this 
would be if ZKFC expects getServiceState()/monitorHealth() to work even while 
the service is transitioning to active/standby. Again, that probably doesnt 
matter if these operations happen in a reasonably short time.

Depending on your conclusion in YARN-1077 we can keep the approach in 1027-3 or 
1027-4 wrt RMHAServiceProtocol being always present or not.

Patch is close to being ready for commit! The main thing to verify (even if 
manually) is that the ActiveService objects are being GC'd correctly or not.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, 
 yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-09-06 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13759973#comment-13759973
 ] 

Bikas Saha commented on YARN-1027:
--

The patch look clean overall. 

I would suggest keeping haEnabled concept within the HAServiceProtocol service 
instead of mixing it between the ResourceManager and HAServiceProtocol. Thus 
the RM always addService(HAServiceProtocol). HAServiceProtocol is the one that 
checks if haEnabled in serviceStart(). If enabled then it transitions to 
standby and waits for active signal. If not, then it directly transitions to 
active.

Shouldn't we simply call transitionToStandby() here? That would ensure 
getServiceStatus() returns non active status for anyone that cares to know.
{code}
+  public void serviceStop() throws Exception {
+if (rm.haState == HAServiceState.ACTIVE) {
+  rm.stopActiveServices();
{code}

This is fine for now but we might have to invest in better health check in a 
different jira. Any ideas?
{code}
public synchronized void monitorHealth() throws HealthCheckFailedException {
+if (rm.haState == HAServiceState.ACTIVE  !rm.areActiveServicesRunning()) 
{
{code}

We probably want the log before the if stmt.
Should we change state to standby before we stop services? Assuming that HA 
aware services would need to know about this earlier rather than later so that 
they can stop signaling Active services and allow them to be drained/stopped.
{code}
+if (rm.haState == HAServiceState.ACTIVE) {
+  rm.stopActiveServices();
+}
+
+LOG.info(Transitioning to standby);
+rm.haState = HAServiceState.STANDBY;
{code}

Didnt quite get this comment. Is this do with change being requested by 
user/admin/ZKFC?
{code}
+  public void transitionToActive(StateChangeRequestInfo reqInfo) {
+// TODO: When automatic failover is enabled, check if transition should
+// be allowed for this request
{code}

What are the pros of making haState a member of ResourceManager instead of 
HAServiceProtocol? A pro of the latter is that it keeps all HA stuff in one 
place.

Why is there a lock used in ResourceManager.startActive() etc. Why are these 
methods protected. If testing, then lets add an @visiblefortesting annotation.

Is there a way to confirm that the active service objects are all being GC'd?

testStartAndTransitions() - How about calling getServiceStatus() and 
monitorHealth() in addition to checking the internal members, in all places 
where internal members are being checked. So we can test and exercise those 
methods too. How about completing 
Active-Standby-Active-Standby-Active-RM.serviceStop(). This would fully 
simulate multiple full cycles of transitions and also verify the shutdown case.
We can also issue some requests like createApplication() to the RM, when in 
active state, and verify that the RM is really working.

TestRMHADisabled. It confusing to read that the RM has started but its 
haState==INITIALIZING. Also, we can probably move this test in TestRMHA.java to 
keep related tests in one place. 


Minor nits

LOG instead of print?
{code}
+} catch (Exception e) {
+  e.printStackTrace();
{code}

RM_HA_PREFIX instead of HA_PREFIX

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-including-yarn-1098-3.patch, 
 yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-09-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760582#comment-13760582
 ] 

Karthik Kambatla commented on YARN-1027:


Thanks for the detailed review, [~bikas]. 

bq. What are the pros of making haState a member of ResourceManager instead of 
HAServiceProtocol? A pro of the latter is that it keeps all HA stuff in one 
place.
In the future, when individual external-facing services need to behave based on 
the HAState, having it in the RM might be useful. However, I think we should 
move it to RMHAProtocolService now, and move it to the RM or RMContext lazily.

bq. Why is there a lock used in ResourceManager.startActive() etc. Why are 
these methods protected. If testing, then lets add an @visiblefortesting 
annotation.
The lock is to protect against concurrent invocations of transitionToActive() 
and transitionToStandby() due to say user input. The methods are protected 
because they are being accessed from outside the RM - in this case, 
RMHAProtocolService.

bq. Is there a way to confirm that the active service objects are all being 
GC'd?
Not sure of a deterministic test. How about using Runtime.memory methods to 
measure memory usage before and after transitioning to Active and subsequently 
Standby? 
I can jmap a real RM on a pseudo-dist cluster and see if they are being cleaned 
up. 

bq. Didnt quite get this comment. Is this do with change being requested by 
user/admin/ZKFC?
If automatic failover is enabled and a user issues a transition command, it 
should take effect only when it is forced.

Agree with remaining comments. Will fix it in the next version.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-including-yarn-1098-3.patch, 
 yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-09-06 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760752#comment-13760752
 ] 

Bikas Saha commented on YARN-1027:
--

RMHAProtocolService can be made available via RMContext and thus accessible to 
everyone who has access to RMContext.

In that case we probably mean package and not protected since there is no 
inheritance story here.

I dont think we need a test (although that would be awesome). If we can 
manually verify then it should be sufficient for now I guess.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-including-yarn-1098-3.patch, 
 yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-09-06 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13760774#comment-13760774
 ] 

Karthik Kambatla commented on YARN-1027:


In yarn-1027-4.patch, the RM always addService(HAServiceProtocol). 
HAServiceProtocol is the one that checks if haEnabled in serviceStart(). If 
enabled then it transitions to standby and waits for active signal. If not, 
then it directly transitions to active.

However, post RM#init(), RM fields are not instantiated (e.g. TokenManagers) 
leading a bunch of test failures.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-3.patch, yarn-1027-4.patch, 
 yarn-1027-including-yarn-1098-3.patch, yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-09-03 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13756655#comment-13756655
 ] 

Karthik Kambatla commented on YARN-1027:


Just uploaded a patch (yarn-1027-2.patch) that builds on top of YARN-1098 
patch, and depends on it.

Patch outline:
# Implement RMHAProtocolService
# When HA is enabled, make this HA-service one of the services managed by the 
RM. RM no longer manages the activeStateServices directly, these are to be 
managed by the HA-service.
# Tests to check HA enable/disable and transitions when enabled.
# Included another patch (test-yarn-1027.patch) that I used to force 
transitionToActive() after RM starts in the Standby mode. Post 
transitionToActive, the RM behaves normally. I was able to run jobs.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: test-yarn-1027.patch, yarn-1027-1.patch, 
 yarn-1027-2.patch, yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-08-29 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13754306#comment-13754306
 ] 

Karthik Kambatla commented on YARN-1027:


Discussion with Bikas and Vinod offline:

The HA-in-RM approach doesn't seem to be much more disruptive than the 
wrapper/extension approaches, particularly given the changes in YARN-1098. The 
implementation can be along the lines of the proof-of-concept patch.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1027-1.patch, yarn-1027-in-rm-poc.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-08-26 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750240#comment-13750240
 ] 

Karthik Kambatla commented on YARN-1027:


Thanks for the review, Bikas. Sorry for the delay in response - was on vacation 
last week.

bq. Taking the hybrid approach drops the simplicity of the wrapper while at the 
same time making it complex to interact with the ResourceManager.
I see your point. IMO, the extension approach increases the flexibility of the 
wrapper approach without adding too much complexity. Keeping the HA code 
separate from the RM avoids complicating the RM, particularly during the period 
we are stabilizing the HA portion of the code. Once stable, if we think it is 
appropriate, it is simple enough to merge it all into the RM itself.

bq. Which one is the real ResourceManager. For example, there are many tests 
that use the ResourceManager but now since they dont use HAResourceManager they 
are probably not exercising some possibilities. Should they use 
HAResourceManager?
The HA specific tests can access the HAResourceManager, it should be okay for 
the remaining tests to access ResourceManager and not the HAResourceManager.

bq. This shows that adding HA awareness can be added without significant 
overhaul in the RM.
Most importantly, my fear is adding HA to the RM directly leads to a more 
significant overhaul.

Let me draft a patch implementing the same within RM itself instead of 
extending it. Any other ideas in the interim would also greatly help.


 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1027-1.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-08-26 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750258#comment-13750258
 ] 

Bikas Saha commented on YARN-1027:
--

Its a good idea to draft a path in which the HA protocol becomes another 
service within the RM. We should think through various 
startup/transitionToActive()/transitionToStandby() scenarios to determine the 
best approach to code this. 

E.g. repeated transitions from active-standby-active for the same RM without 
bringing the process down. This means that all apps in the RM (ie all internal 
stateful objects like appmanager, scheduler, rmappimpl etc etc) should all be 
completely cleaned up during transitionToStanbdy(). Currently the RM simply 
shuts down and hence that cleanup is not necessary.

This may also suggest that we logically divide RM internal objects into 2 
groups 1) stuff that can be started once and kept on until RM stops 2) stuff 
that needs to be cleaned every time the RM is standby and re-inited when the RM 
is active. The second group would contain things like the scheduler while the 
first would contain things like the RPC services. The first set would be 
transparent to HA while the second set would need to be aware of HA.

Perhaps before we tackle this jira to completion, we should open and commit 
another jira that identifies all stateful objects within the RM and adds 
support to clean them up during RM shutdown. Those cleanup methods can be 
re-used during transitionToStandby(). This jira can build on top of that.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1027-1.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-08-21 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13745841#comment-13745841
 ] 

Bikas Saha commented on YARN-1027:
--

First of all, thanks for the writing the patch and testing it. This shows that 
adding HA awareness can be added without significant overhaul in the RM.

I wish I could say that I like the hybrid approach, but after reading the patch 
unfortunately thats not the case. 

Having a pure wrapper approach that simply does a new ResourceManager() upon 
transitionToActive() has the virtue of being completely separate from the RM 
and being simple. Having HAService built into ResourceManager as a service 
integrates it completely with the ResourceManager flow and allows for features 
like RPC redirect in tandem with other RM services. Taking the hybrid approach 
drops the simplicity of the wrapper while at the same time making it complex to 
interact with the ResourceManager. Which one is the real ResourceManager.
For example, there are many tests that use the ResourceManager but now since 
they dont use HAResourceManager they are probably not exercising some 
possibilities. Should they use HAResourceManager?
Fundamentally, HA is going to be an integral part of the ResourceManager and to 
me it does not make sense to create a derive impl of the ResourceManager in 
order to add the HA logic. What other derivations are possible for the RM that 
motivate the use of inheritance and sub-classing? Why have 2 impls for 
essentially the same component.
Starting up and stopping services is not super fast and will add time to the 
failover. So unless there is an obstacle to that path, we should be looking at 
starting as many (if not all) services on the RM so that the only thing thats 
blocking failover is populating the state. Like discussed earlier, its not 
necessary for all services to be started in the first cut. We can choose to 
start the HA service only.
I would really encourage attempting to make HAService part of ResourceManager 
itself. I can help with the patch if needed.



 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1027-1.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-08-18 Thread Bikas Saha (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13743547#comment-13743547
 ] 

Bikas Saha commented on YARN-1027:
--

[~vinodkv] Do you have any suggestions? Would be great to have them early 
because this approach will be important for the remaining changes. So best to 
spend time on this now and make sure we are in the best position for future 
development.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1027-1.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-08-16 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742368#comment-13742368
 ] 

Hadoop QA commented on YARN-1027:
-

{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12598376/yarn-1027-1.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-YARN-Build/1732//testReport/
Console output: https://builds.apache.org/job/PreCommit-YARN-Build/1732//console

This message is automatically generated.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: Karthik Kambatla
 Attachments: yarn-1027-1.patch


 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-08-05 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729294#comment-13729294
 ] 

Karthik Kambatla commented on YARN-1027:


[~nemon], if you haven't started work on this already, do you mind if I take 
this up? I have been discussing this with Bikas on YARN-149 and offline and 
started working on.

 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: nemon lou

 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (YARN-1027) Implement RMHAServiceProtocol

2013-08-05 Thread nemon lou (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-1027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13729348#comment-13729348
 ] 

nemon lou commented on YARN-1027:
-

I have also started working on this since it was in unassigned.
It's ok to take it up,i will review the patch :)


 Implement RMHAServiceProtocol
 -

 Key: YARN-1027
 URL: https://issues.apache.org/jira/browse/YARN-1027
 Project: Hadoop YARN
  Issue Type: Sub-task
Reporter: Bikas Saha
Assignee: nemon lou

 Implement existing HAServiceProtocol from Hadoop common. This protocol is the 
 single point of interaction between the RM and HA clients/services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira