Re: [devel] [PATCH 1/1] imm: prioritize to elect IMMND on the active node [#2862]

2018-06-25 Thread Hans Nordeback

Hi Vu,

ack, review only. (The indentation is still not correct). /Thanks HansN


On 06/20/2018 08:28 AM, Vu Minh Nguyen wrote:

The coordinator IMMND on PL-3 was crashed, the active IMMD then elected the new
coordinator on the standby node, SC-2, but failed because IMMND on the SC-2 was
restarted also. As the result, the active IMMD exited and failure-over
happened. After that, SC-2 took active role and found no candidate for new IMMND
coordinator, so cluster was rebooted.

We can prevent this happen if the active IMMD prioritizes to elect the
coordinator which is located on the same site with himself if the IMMND database
is up-to-date.
---
  src/imm/immd/immd_proc.c | 16 +---
  1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/imm/immd/immd_proc.c b/src/imm/immd/immd_proc.c
index 1882eef..34c1415 100644
--- a/src/imm/immd/immd_proc.c
+++ b/src/imm/immd/immd_proc.c
@@ -331,24 +331,34 @@ bool immd_proc_elect_coord(IMMD_CB *cb, bool new_active)
 */
} else {
/* Try to elect a new coord. */
+   IMMD_IMMND_INFO_NODE *candidate_coord_node = NULL;
cb->payload_coord_dest = 0LL;
memset(, 0, sizeof(MDS_DEST));
immd_immnd_info_node_getnext(>immnd_tree, ,
 _info_node);
+
+   // Election priority:
+   // 1) Coordinator on active node
+   // 2) Coordinator on standby node
+   // 3) Coordinator on PL node if SC absence is allowed.
while (immnd_info_node) {
key = immnd_info_node->immnd_dest;
if ((immnd_info_node->isOnController) &&
(immnd_info_node->epoch == cb->mRulingEpoch)) {
-   /*We found a new candidate for cordinator */
+   candidate_coord_node = immnd_info_node;
immnd_info_node->isCoord = true;
-   break;
+   if (immnd_info_node->immnd_key == cb->node_id) {
+   /* Found a new candidate on active SC */
+   break;
+   }
}
  
  			immd_immnd_info_node_getnext(>immnd_tree, ,

 _info_node);
}
  
-		if (!immnd_info_node && cb->mScAbsenceAllowed) {

+   immnd_info_node = candidate_coord_node;
+if (!immnd_info_node && cb->mScAbsenceAllowed) {
/* If SC absence is allowed and no SC based IMMND is
   available then elect an IMMND coord at a payload.
   Note this means that an IMMND at a payload may be



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] imm: prioritize to elect IMMND on the active node [#2862]

2018-06-20 Thread Vu Minh Nguyen
The coordinator IMMND on PL-3 was crashed, the active IMMD then elected the new
coordinator on the standby node, SC-2, but failed because IMMND on the SC-2 was
restarted also. As the result, the active IMMD exited and failure-over
happened. After that, SC-2 took active role and found no candidate for new IMMND
coordinator, so cluster was rebooted.

We can prevent this happen if the active IMMD prioritizes to elect the
coordinator which is located on the same site with himself if the IMMND database
is up-to-date.
---
 src/imm/immd/immd_proc.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/src/imm/immd/immd_proc.c b/src/imm/immd/immd_proc.c
index 1882eef..34c1415 100644
--- a/src/imm/immd/immd_proc.c
+++ b/src/imm/immd/immd_proc.c
@@ -331,24 +331,34 @@ bool immd_proc_elect_coord(IMMD_CB *cb, bool new_active)
 */
} else {
/* Try to elect a new coord. */
+   IMMD_IMMND_INFO_NODE *candidate_coord_node = NULL;
cb->payload_coord_dest = 0LL;
memset(, 0, sizeof(MDS_DEST));
immd_immnd_info_node_getnext(>immnd_tree, ,
 _info_node);
+
+   // Election priority:
+   // 1) Coordinator on active node
+   // 2) Coordinator on standby node
+   // 3) Coordinator on PL node if SC absence is allowed.
while (immnd_info_node) {
key = immnd_info_node->immnd_dest;
if ((immnd_info_node->isOnController) &&
(immnd_info_node->epoch == cb->mRulingEpoch)) {
-   /*We found a new candidate for cordinator */
+   candidate_coord_node = immnd_info_node;
immnd_info_node->isCoord = true;
-   break;
+   if (immnd_info_node->immnd_key == cb->node_id) {
+   /* Found a new candidate on active SC */
+   break;
+   }
}
 
immd_immnd_info_node_getnext(>immnd_tree, ,
 _info_node);
}
 
-   if (!immnd_info_node && cb->mScAbsenceAllowed) {
+   immnd_info_node = candidate_coord_node;
+if (!immnd_info_node && cb->mScAbsenceAllowed) {
/* If SC absence is allowed and no SC based IMMND is
   available then elect an IMMND coord at a payload.
   Note this means that an IMMND at a payload may be
-- 
1.9.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] imm: prioritize to elect IMMND on the active node [#2862]

2018-06-19 Thread Vu Minh Nguyen
Hi Hans,

 

Thanks for your comments. See my responses inline.

- the indentation is not correct after the first while stmt.

[Vu] Yes, you are right. I will correct it in V2 patch.

- why adding a new similar while loop? Instead remove the newly added while loop

and change the first while loop to something like this (not tested though):

[Vu] Good idea. I will go with that in V2 patch.

 

Regards, Vu

 

From: Hans Nordeback  
Sent: Friday, June 15, 2018 8:49 PM
To: Vu Minh Nguyen ; anders.wid...@ericsson.com; 
lennart.l...@ericsson.com; ravisekhar.ko...@oracle.com
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] imm: prioritize to elect IMMND on the active node 
[#2862]

 

Hi Vu, the formatting of the code was not so good, a new attempt below:

} else {
  /* Try to elect a new coord. */
  IMMD_IMMND_INFO_NODE *candidate_coord_node = 0;
  cb->payload_coord_dest = 0LL;
  memset(, 0, sizeof(MDS_DEST));
  immd_immnd_info_node_getnext(>immnd_tree, ,
   _info_node);

  // Prioritize to elect the new coordinator which is
  // located at the active node (same site with the active IMMD).
  while (immnd_info_node) {
  key = immnd_info_node->immnd_dest;
  if ((immnd_info_node->isOnController) &&
  (immnd_info_node->epoch == cb->mRulingEpoch)) {
  if (immnd_info_node->immnd_key == cb->node_id) { 
  /*We found a new candidate for cordinator */
  candidate_coord_node = immnd_info_node;
  break;
  } else {
  candidate_coord_node = immnd_info_node;
  }
  }

  immd_immnd_info_node_getnext(>immnd_tree, ,
   _info_node);
  }
  immnd_info_node = candidate_coord_node;
  if (immnd_info_node != 0)
  immnd_info_node->isCoord = true;

/Thanks HansN

On 06/15/2018 03:43 PM, Hans Nordeback wrote:

Hi Vu,

a few comments,

- the indentation is not correct after the first while stmt.

- why adding a new similar while loop? Instead remove the newly added while loop

and change the first while loop to something like this (not tested though):

 

} else {

/* Try to elect a new coord. */

IMMD_IMMND_INFO_NODE *candidate_coord_node = 0;

cb->payload_coord_dest = 0LL;

memset(, 0, sizeof(MDS_DEST));

immd_immnd_info_node_getnext(>immnd_tree, ,

 _info_node);

 

// Prioritize to elect the new coordinator which is

// located at the active node (same site with the active IMMD).

while (immnd_info_node) {

key = immnd_info_node->immnd_dest;

if ((immnd_info_node->isOnController) &&

(immnd_info_node->epoch == cb->mRulingEpoch)) {

if (immnd_info_node->immnd_key == cb->node_id) { 

/*We found a new candidate for cordinator */



candidate_coord_node = immnd_info_node;

break;

} else {

candidate_coord_node = immnd_info_node;

}

}

 

immd_immnd_info_node_getnext(>immnd_tree, ,

 _info_node);

}

immnd_info_node = candidate_coord_node;

if (immnd_info_node != 0)

immnd_info_node->isCoord = true;

 

 

 

/Thanks HansN

 

On 05/22/2018 10:59 AM, Vu Minh Nguyen wrote:

Here is the case:
The coordinator IMMND on PL-3 was crashed, the active IMMD then elected the new
coordinator on the standby node SC-2 but failed because IMMND on the SC-2 was
also restarted. As the result, the active IMMD exited and failure-over
happened. After that, SC-2 took active role and found no candidate for new IMMND
coordinator, so cluster was reboot.
 
We can prevent this happen if the active IMMD prioritizes to elect the
coordinator which is located on the same site with himself if the IMMND database
is up-to-date.
---
 src/imm/immd/immd_proc.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)
 
diff --git a/src/imm/immd/immd_proc.c b/src/imm/immd/immd_proc.c
index 1882eef..e80f2db 100644
--- a/src/imm/immd/immd_proc.c
+++ b/src/imm/immd/immd_proc.c
@@ -331,16 +331,22 @@ bool immd_proc_elect_coord(IMMD_CB *cb, bool new_active)
   */
  } else {
  /* Try to elect a new coord. */
+ bool has_coord_candidate = false;
  cb->payload_coord_dest = 0LL;
  memset(, 0, sizeof(MDS_DEST));
  immd_immnd_info_node_getnext(>immnd_tree, ,
  _info_node);
+
+ // Prioritize to elect the new coordinator which is
+ // located at the active node (same site with the active IMMD).
  while (immnd_info_node) {
  key = immnd_info_node->immnd_dest;
  if ((immnd_info_node->isOnController) &&
+

Re: [devel] [PATCH 1/1] imm: prioritize to elect IMMND on the active node [#2862]

2018-06-15 Thread Hans Nordeback

Hi Vu, the formatting of the code was not so good, a new attempt below:

} else {
      /* Try to elect a new coord. */
      IMMD_IMMND_INFO_NODE *candidate_coord_node = 0;
      cb->payload_coord_dest = 0LL;
      memset(, 0, sizeof(MDS_DEST));
      immd_immnd_info_node_getnext(>immnd_tree, ,
                   _info_node);

      // Prioritize to elect the new coordinator which is
      // located at the active node (same site with the active IMMD).
      while (immnd_info_node) {
          key = immnd_info_node->immnd_dest;
          if ((immnd_info_node->isOnController) &&
          (immnd_info_node->epoch == cb->mRulingEpoch)) {
              if (immnd_info_node->immnd_key == cb->node_id) {
                  /*We found a new candidate for cordinator */
                  candidate_coord_node = immnd_info_node;
                  break;
              } else {
                  candidate_coord_node = immnd_info_node;
              }
          }

          immd_immnd_info_node_getnext(>immnd_tree, ,
                       _info_node);
      }
      immnd_info_node = candidate_coord_node;
      if (immnd_info_node != 0)
          immnd_info_node->isCoord = true;

/Thanks HansN
On 06/15/2018 03:43 PM, Hans Nordeback wrote:


Hi Vu,

a few comments,

- the indentation is not correct after the first while stmt.

- why adding a new similar while loop? Instead remove the newly added 
while loop


and change the first while loop to something like this (not tested 
though):



} else{
/* Try to elect a new coord. */
IMMD_IMMND_INFO_NODE *candidate_coord_node =0;
cb->payload_coord_dest=0LL;
memset(, 0, sizeof(MDS_DEST));
immd_immnd_info_node_getnext(>immnd_tree, ,
_info_node);
// Prioritize to elect the new coordinator which is
// located at the active node (same site with the active IMMD).
while(immnd_info_node) {
key =immnd_info_node->immnd_dest;
if((immnd_info_node->isOnController) &&
 (immnd_info_node->epoch==cb->mRulingEpoch)) {
if(immnd_info_node->immnd_key==cb->node_id) {
/*We found a new candidate for cordinator */
candidate_coord_node =immnd_info_node;
break;
} else{
candidate_coord_node =immnd_info_node;
}
}
immd_immnd_info_node_getnext(>immnd_tree, ,
_info_node);
}
immnd_info_node =candidate_coord_node;
if(immnd_info_node !=0)
immnd_info_node->isCoord=true;


/Thanks HansN


On 05/22/2018 10:59 AM, Vu Minh Nguyen wrote:

Here is the case:
The coordinator IMMND on PL-3 was crashed, the active IMMD then elected the new
coordinator on the standby node SC-2 but failed because IMMND on the SC-2 was
also restarted. As the result, the active IMMD exited and failure-over
happened. After that, SC-2 took active role and found no candidate for new IMMND
coordinator, so cluster was reboot.

We can prevent this happen if the active IMMD prioritizes to elect the
coordinator which is located on the same site with himself if the IMMND database
is up-to-date.
---
  src/imm/immd/immd_proc.c | 27 ++-
  1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/src/imm/immd/immd_proc.c b/src/imm/immd/immd_proc.c
index 1882eef..e80f2db 100644
--- a/src/imm/immd/immd_proc.c
+++ b/src/imm/immd/immd_proc.c
@@ -331,16 +331,22 @@ bool immd_proc_elect_coord(IMMD_CB *cb, bool new_active)
 */
} else {
/* Try to elect a new coord. */
+   bool has_coord_candidate = false;
cb->payload_coord_dest = 0LL;
memset(, 0, sizeof(MDS_DEST));
immd_immnd_info_node_getnext(>immnd_tree, ,
 _info_node);
+
+   // Prioritize to elect the new coordinator which is
+   // located at the active node (same site with the active IMMD).
while (immnd_info_node) {
key = immnd_info_node->immnd_dest;
if ((immnd_info_node->isOnController) &&
+   (immnd_info_node->immnd_key == cb->node_id) &&
(immnd_info_node->epoch == cb->mRulingEpoch)) {
/*We found a new candidate for cordinator */
immnd_info_node->isCoord = true;
+   has_coord_candidate = true;
break;
}
  
@@ -348,7 +354,26 @@ bool immd_proc_elect_coord(IMMD_CB *cb, bool new_active)

 _info_node);
}
  
-		if (!immnd_info_node && cb->mScAbsenceAllowed) {

+if (!has_coord_candidate) {
+  memset(, 0, sizeof(MDS_DEST));
+  immd_immnd_info_node_getnext(>immnd_tree, ,
+   _info_node);
+
+  while (immnd_info_node) {
+   

Re: [devel] [PATCH 1/1] imm: prioritize to elect IMMND on the active node [#2862]

2018-06-15 Thread Hans Nordeback

Hi Vu,

a few comments,

- the indentation is not correct after the first while stmt.

- why adding a new similar while loop? Instead remove the newly added 
while loop


and change the first while loop to something like this (not tested though):


} else{
/* Try to elect a new coord. */
IMMD_IMMND_INFO_NODE *candidate_coord_node =0;
cb->payload_coord_dest=0LL;
memset(, 0, sizeof(MDS_DEST));
immd_immnd_info_node_getnext(>immnd_tree, ,
_info_node);
// Prioritize to elect the new coordinator which is
// located at the active node (same site with the active IMMD).
while(immnd_info_node) {
key =immnd_info_node->immnd_dest;
if((immnd_info_node->isOnController) &&
 (immnd_info_node->epoch==cb->mRulingEpoch)) {
if(immnd_info_node->immnd_key==cb->node_id) {
/*We found a new candidate for cordinator */
candidate_coord_node =immnd_info_node;
break;
} else{
candidate_coord_node =immnd_info_node;
}
}
immd_immnd_info_node_getnext(>immnd_tree, ,
_info_node);
}
immnd_info_node =candidate_coord_node;
if(immnd_info_node !=0)
immnd_info_node->isCoord=true;


/Thanks HansN


On 05/22/2018 10:59 AM, Vu Minh Nguyen wrote:

Here is the case:
The coordinator IMMND on PL-3 was crashed, the active IMMD then elected the new
coordinator on the standby node SC-2 but failed because IMMND on the SC-2 was
also restarted. As the result, the active IMMD exited and failure-over
happened. After that, SC-2 took active role and found no candidate for new IMMND
coordinator, so cluster was reboot.

We can prevent this happen if the active IMMD prioritizes to elect the
coordinator which is located on the same site with himself if the IMMND database
is up-to-date.
---
  src/imm/immd/immd_proc.c | 27 ++-
  1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/src/imm/immd/immd_proc.c b/src/imm/immd/immd_proc.c
index 1882eef..e80f2db 100644
--- a/src/imm/immd/immd_proc.c
+++ b/src/imm/immd/immd_proc.c
@@ -331,16 +331,22 @@ bool immd_proc_elect_coord(IMMD_CB *cb, bool new_active)
 */
} else {
/* Try to elect a new coord. */
+   bool has_coord_candidate = false;
cb->payload_coord_dest = 0LL;
memset(, 0, sizeof(MDS_DEST));
immd_immnd_info_node_getnext(>immnd_tree, ,
 _info_node);
+
+   // Prioritize to elect the new coordinator which is
+   // located at the active node (same site with the active IMMD).
while (immnd_info_node) {
key = immnd_info_node->immnd_dest;
if ((immnd_info_node->isOnController) &&
+   (immnd_info_node->immnd_key == cb->node_id) &&
(immnd_info_node->epoch == cb->mRulingEpoch)) {
/*We found a new candidate for cordinator */
immnd_info_node->isCoord = true;
+   has_coord_candidate = true;
break;
}
  
@@ -348,7 +354,26 @@ bool immd_proc_elect_coord(IMMD_CB *cb, bool new_active)

 _info_node);
}
  
-		if (!immnd_info_node && cb->mScAbsenceAllowed) {

+if (!has_coord_candidate) {
+  memset(, 0, sizeof(MDS_DEST));
+  immd_immnd_info_node_getnext(>immnd_tree, ,
+   _info_node);
+
+  while (immnd_info_node) {
+key = immnd_info_node->immnd_dest;
+if ((immnd_info_node->isOnController) &&
+(immnd_info_node->epoch == cb->mRulingEpoch)) {
+  /*We found a new candidate for cordinator */
+  immnd_info_node->isCoord = true;
+  break;
+}
+
+immd_immnd_info_node_getnext(>immnd_tree, ,
+ _info_node);
+  }
+}
+
+if (!immnd_info_node && cb->mScAbsenceAllowed) {
/* If SC absence is allowed and no SC based IMMND is
   available then elect an IMMND coord at a payload.
   Note this means that an IMMND at a payload may be


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] imm: prioritize to elect IMMND on the active node [#2862]

2018-05-22 Thread Vu Minh Nguyen
Here is the case:
The coordinator IMMND on PL-3 was crashed, the active IMMD then elected the new
coordinator on the standby node SC-2 but failed because IMMND on the SC-2 was
also restarted. As the result, the active IMMD exited and failure-over
happened. After that, SC-2 took active role and found no candidate for new IMMND
coordinator, so cluster was reboot.

We can prevent this happen if the active IMMD prioritizes to elect the
coordinator which is located on the same site with himself if the IMMND database
is up-to-date.
---
 src/imm/immd/immd_proc.c | 27 ++-
 1 file changed, 26 insertions(+), 1 deletion(-)

diff --git a/src/imm/immd/immd_proc.c b/src/imm/immd/immd_proc.c
index 1882eef..e80f2db 100644
--- a/src/imm/immd/immd_proc.c
+++ b/src/imm/immd/immd_proc.c
@@ -331,16 +331,22 @@ bool immd_proc_elect_coord(IMMD_CB *cb, bool new_active)
 */
} else {
/* Try to elect a new coord. */
+   bool has_coord_candidate = false;
cb->payload_coord_dest = 0LL;
memset(, 0, sizeof(MDS_DEST));
immd_immnd_info_node_getnext(>immnd_tree, ,
 _info_node);
+
+   // Prioritize to elect the new coordinator which is
+   // located at the active node (same site with the active IMMD).
while (immnd_info_node) {
key = immnd_info_node->immnd_dest;
if ((immnd_info_node->isOnController) &&
+   (immnd_info_node->immnd_key == cb->node_id) &&
(immnd_info_node->epoch == cb->mRulingEpoch)) {
/*We found a new candidate for cordinator */
immnd_info_node->isCoord = true;
+   has_coord_candidate = true;
break;
}
 
@@ -348,7 +354,26 @@ bool immd_proc_elect_coord(IMMD_CB *cb, bool new_active)
 _info_node);
}
 
-   if (!immnd_info_node && cb->mScAbsenceAllowed) {
+if (!has_coord_candidate) {
+  memset(, 0, sizeof(MDS_DEST));
+  immd_immnd_info_node_getnext(>immnd_tree, ,
+   _info_node);
+
+  while (immnd_info_node) {
+key = immnd_info_node->immnd_dest;
+if ((immnd_info_node->isOnController) &&
+(immnd_info_node->epoch == cb->mRulingEpoch)) {
+  /*We found a new candidate for cordinator */
+  immnd_info_node->isCoord = true;
+  break;
+}
+
+immd_immnd_info_node_getnext(>immnd_tree, ,
+ _info_node);
+  }
+}
+
+if (!immnd_info_node && cb->mScAbsenceAllowed) {
/* If SC absence is allowed and no SC based IMMND is
   available then elect an IMMND coord at a payload.
   Note this means that an IMMND at a payload may be
-- 
1.9.1


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel