Re: [Cluster-devel] [PATCH] rgmanager: Retry when config is out of sync [RHEL5]

2012-02-29 Thread Fabio M. Di Nitto
ACK.

Fabio

On 03/01/2012 12:53 AM, Lon Hohberger wrote:
> [This patch is already in RHEL5]
> 
> If you add a service to rgmanager v1 or v2 and that
> service fails to start on the first node but succeeds
> in its initial stop operation, there is a chance that
> the remote instance of rgmanager has not yet reread
> the configuration, causing the service to be placed
> into the 'recovering' state without further action.
> 
> This patch causes the originator of the request to
> retry the operation.
> 
> Later versions of rgmanager (ex STABLE3 branch and
> derivatives) are unlikely to have this problem since
> configuration updates are not polled, but rather
> delivered to clients.
> 
> Update 22-Feb-2012: The above is incorrect, this was
> reproduced a rgmanager v3 installation.
> 
> Resolves: rhbz#796272
> 
> Signed-off-by: Lon Hohberger 
> ---
>  rgmanager/src/daemons/rg_state.c |   19 +++
>  1 files changed, 19 insertions(+), 0 deletions(-)
> 
> diff --git a/rgmanager/src/daemons/rg_state.c 
> b/rgmanager/src/daemons/rg_state.c
> index 23a4bec..8c5af5b 100644
> --- a/rgmanager/src/daemons/rg_state.c
> +++ b/rgmanager/src/daemons/rg_state.c
> @@ -1801,6 +1801,7 @@ handle_relocate_req(char *svcName, int orig_request, 
> int preferred_target,
>   rg_state_t svcStatus;
>   int target = preferred_target, me = my_id();
>   int ret, x, request = orig_request;
> + int retries;
>   
>   get_rg_state_local(svcName, &svcStatus);
>   if (svcStatus.rs_state == RG_STATE_DISABLED ||
> @@ -1933,6 +1934,8 @@ handle_relocate_req(char *svcName, int orig_request, 
> int preferred_target,
>   if (target == me)
>   goto exhausted;
>  
> + retries = 0;
> +retry:
>   ret = svc_start_remote(svcName, request, target);
>   switch (ret) {
>   case RG_ERUN:
> @@ -1942,6 +1945,22 @@ handle_relocate_req(char *svcName, int orig_request, 
> int preferred_target,
>   *new_owner = svcStatus.rs_owner;
>   free_member_list(allowed_nodes);
>   return 0;
> + case RG_ENOSERVICE:
> + /*
> +  * Configuration update pending on remote node?  Give it
> +  * a few seconds to sync up.  rhbz#568126
> +  *
> +  * Configuration updates are synchronized in later 
> releases
> +  * of rgmanager; this should not be needed.
> +  */
> + if (retries++ < 4) {
> + sleep(3);
> + goto retry;
> + }
> + logt_print(LOG_WARNING, "Member #%d has a different "
> +"configuration than I do; trying next "
> +"member.", target);
> + /* Deliberate */
>   case RG_EDEPEND:
>   case RG_EFAIL:
>   /* Uh oh - we failed to relocate to this node.



[Cluster-devel] [PATCH] rgmanager: Retry when config is out of sync [RHEL5]

2012-02-29 Thread Lon Hohberger
[This patch is already in RHEL5]

If you add a service to rgmanager v1 or v2 and that
service fails to start on the first node but succeeds
in its initial stop operation, there is a chance that
the remote instance of rgmanager has not yet reread
the configuration, causing the service to be placed
into the 'recovering' state without further action.

This patch causes the originator of the request to
retry the operation.

Later versions of rgmanager (ex STABLE3 branch and
derivatives) are unlikely to have this problem since
configuration updates are not polled, but rather
delivered to clients.

Update 22-Feb-2012: The above is incorrect, this was
reproduced a rgmanager v3 installation.

Resolves: rhbz#796272

Signed-off-by: Lon Hohberger 
---
 rgmanager/src/daemons/rg_state.c |   19 +++
 1 files changed, 19 insertions(+), 0 deletions(-)

diff --git a/rgmanager/src/daemons/rg_state.c b/rgmanager/src/daemons/rg_state.c
index 23a4bec..8c5af5b 100644
--- a/rgmanager/src/daemons/rg_state.c
+++ b/rgmanager/src/daemons/rg_state.c
@@ -1801,6 +1801,7 @@ handle_relocate_req(char *svcName, int orig_request, int 
preferred_target,
rg_state_t svcStatus;
int target = preferred_target, me = my_id();
int ret, x, request = orig_request;
+   int retries;

get_rg_state_local(svcName, &svcStatus);
if (svcStatus.rs_state == RG_STATE_DISABLED ||
@@ -1933,6 +1934,8 @@ handle_relocate_req(char *svcName, int orig_request, int 
preferred_target,
if (target == me)
goto exhausted;
 
+   retries = 0;
+retry:
ret = svc_start_remote(svcName, request, target);
switch (ret) {
case RG_ERUN:
@@ -1942,6 +1945,22 @@ handle_relocate_req(char *svcName, int orig_request, int 
preferred_target,
*new_owner = svcStatus.rs_owner;
free_member_list(allowed_nodes);
return 0;
+   case RG_ENOSERVICE:
+   /*
+* Configuration update pending on remote node?  Give it
+* a few seconds to sync up.  rhbz#568126
+*
+* Configuration updates are synchronized in later 
releases
+* of rgmanager; this should not be needed.
+*/
+   if (retries++ < 4) {
+   sleep(3);
+   goto retry;
+   }
+   logt_print(LOG_WARNING, "Member #%d has a different "
+  "configuration than I do; trying next "
+  "member.", target);
+   /* Deliberate */
case RG_EDEPEND:
case RG_EFAIL:
/* Uh oh - we failed to relocate to this node.
-- 
1.7.7.6