Hi Zoran,

When sync process sends A2ND_SYNC_FINALIZE to the coord,
it will enter immnd_evt_proc_sync_finalize().
In that function, the coord generates ND2ND_SYNC_FINALIZE_2 message and sends 
over fevs.
Please note that immnd_evt_proc_sync_finalize() never be called by veterans or 
sync-clients.

All IMMND will later receive that ND2ND_SYNC_FINALIZE_2 message.
(The coordinator will also receive that message.)
All IMMND will enter immnd_evt_proc_finalize_sync().
The coord doesn't call immModel_finalizeSync() in 
immnd_evt_proc_finalize_sync().
You can check the 'if (cb->mIsCoord)' block.

So:
1. Coord calls immModel_finalizeSync() in immnd_evt_proc_sync_finalize().
2. Veterans and sync-clients call immModel_finalizeSync() in 
immnd_evt_proc_finalize_sync().

I don't think we need any code change for veterans and sync-clients.
The sync-finalize message sent out by coord is the same as before applying the 
patch.


BR,


Hung Nguyen - DEK Technologies


--------------------------------------------------------------------------------
From: Zoran Milinkovic [email protected]
Sent: Wednesday, April 20, 2016 3:43PM
To: Hung Nguyen, Neelakanta Reddy
     [email protected], [email protected]
Cc: Opensaf-devel
     [email protected]
Subject: RE: [PATCH 1 of 1] imm: Handle error when sending sync-finalize 
message over fevs [#1735]


Hi Hung,

As I understand, you are talking about immnd_evt_proc_sync_finalize() 
function (1 old call to immModel_finalizeSync() + 1 new call to 
immModel_finalizeSync()).

There are 2 more calls to immModel_finalizeSync() in 
immnd_evt_proc_finalize_sync().

Your implementation with postponed cleaning of dying admin owners is ok 
(with minor changes) in immnd_evt_proc_sync_finalize(), but I don’t see 
that it works in immnd_evt_proc_finalize_sync().

Thanks,

Zoran

*From:*Hung Duc Nguyen
*Sent:* Wednesday, April 20, 2016 6:12 AM
*To:* Zoran Milinkovic; [email protected]
*Cc:* [email protected]
*Subject:* Re: [PATCH 1 of 1] imm: Handle error when sending 
sync-finalize message over fevs [#1735]

Hi Zoran,

Please find my answer inline.

BR,

Hung Nguyen - DEK Technologies

--------------------------------------------------------------------------------

From: Zoran [email protected] 
<mailto:[email protected]>

Sent: Tuesday, April 19, 2016 8:22PM

To: Hung Nguyen, Neelakanta Reddy

     [email protected] 
<mailto:[email protected]>,[email protected] 
<mailto:[email protected]>

Cc: Opensaf-devel

     [email protected] 
<mailto:[email protected]>

Subject: RE: [PATCH 1 of 1] imm: Handle error when sending sync-finalize 
message over fevs [#1735]

Hi Hung,

Find my comment inline.

-----Original Message-----

From: Hung Nguyen [mailto:[email protected]]

Sent: Thursday, April 14, 2016 1:45 PM

To: Zoran Milinkovic;[email protected] 
<mailto:[email protected]>

Cc:[email protected] 
<mailto:[email protected]>

Subject: [PATCH 1 of 1] imm: Handle error when sending sync-finalize message 
over fevs [#1735]

  osaf/services/saf/immsv/immnd/ImmModel.cc |  120 ++++++++++++++++-------------

  osaf/services/saf/immsv/immnd/immnd_evt.c |    6 +-

  2 files changed, 72 insertions(+), 54 deletions(-)

This patch delays setting node state and deleting dead admin-owners until 
sync-finalize message is successfully sent.

If sync server fails to send, sync will be aborted (broadcasting sync-abort 
message).

In sync server and veteran nodes, admin-owners that are in demise will also be 
deleted when sync is aborted.

This will prevent IMM from having zombie admin-owners when sync is aborted.

diff --git a/osaf/services/saf/immsv/immnd/ImmModel.cc 
b/osaf/services/saf/immsv/immnd/ImmModel.cc

--- a/osaf/services/saf/immsv/immnd/ImmModel.cc

+++ b/osaf/services/saf/immsv/immnd/ImmModel.cc

@@ -2731,6 +2731,7 @@ ImmModel::pbePrtoPurgeMutations(unsigned

  void

  ImmModel::abortSync()

  {

+    AdminOwnerVector::iterator i;

      switch(sImmNodeState){

          

          case IMM_NODE_R_AVAILABLE:

@@ -2738,6 +2739,23 @@ ImmModel::abortSync()

              sNodesDeadDuringSync.clear();

              sImplsDeadDuringSync.clear();

  

+            /* Normally, admin-owners which are dead during sync

+             * will be cleared in finalizeSync().

+             * When sync is aborted, we don't enter finalizeSync().

+             * So we have to get rid of them here.

+             * This is done in sync server and veteran nodes. */

+            for (i = sOwnerVector.begin(); i != sOwnerVector.end();) {

+                if ((*i)->mDying) {

+                    LOG_WA("Removing admin owner %u %s which is in demise ",

+                           (*i)->mId, (*i)->mAdminOwnerName.c_str());

+                    osafassert(adminOwnerDelete((*i)->mId, true) == SA_AIS_OK);

+                    /* Above does a lookup of admin owner again. */

+                    i = sOwnerVector.begin(); /* Restart of iteration */

+                } else {

+                    ++i;

+                }

+            }

+

              LOG_NO("NODE STATE-> IMM_NODE_FULLY_AVAILABLE (%u)",

                  __LINE__);

              break;

@@ -18066,48 +18084,58 @@ ImmModel::finalizeSync(ImmsvOmFinalizeSy

      

      if(isCoord) {//Produce the checkpoint

          CcbVector::iterator ccbItr;

-

-        sImmNodeState = IMM_NODE_FULLY_AVAILABLE;

-        LOG_NO("NODE STATE-> IMM_NODE_FULLY_AVAILABLE %u", __LINE__);

-        /*WARNING the controller node here goes to writable state

-          based on a NON FEVS message, directly from the sync-process.

-          Other nodes go to writable based on the reception of the

-          finalizeSync message over FEVS. This means that the controller

-          actually opens up for writes before having received the

-          finalize sync message.

-

-          This should still be safe based on the following reasoning:

-          Any writable operation now allowed at this node will be

-          sent over fevs AFTER the finalizeSync message that this node

-          just sent. This at least is the case for messages arriving at

-          this node. So we know that ALL nodes, including this coord node,

-          will receive any mutating messages AFTER ALL nodes, including this

-          coord node, have received the finalizeSync message. This at least

-          for messages arriving from clients at this node.

-

-          At the other nodes. Any mutating message does a local check on 
writability.

-          If currently not writable, the request is rejected locally.

-          Shift to writability is only by finalize sync or abortSync.

-        */

+        AdminOwnerVector::iterator i;

+

+        /* Use 'req' to check if we enter here after sending sync-finalize 
message successfully. */

+        if (!req) {

+            /* Now we can change the node sate to FULLY_AVAILABLE,

+             * and clear the admin-owners which are in demise */

+

+            sImmNodeState = IMM_NODE_FULLY_AVAILABLE;

+            LOG_NO("NODE STATE-> IMM_NODE_FULLY_AVAILABLE %u", __LINE__);

+            /*WARNING the controller node here goes to writable state

+              based on a NON FEVS message, directly from the sync-process.

+              Other nodes go to writable based on the reception of the

+              finalizeSync message over FEVS. This means that the controller

+              actually opens up for writes before having received the

+              finalize sync message.

+

+              This should still be safe based on the following reasoning:

+              Any writable operation now allowed at this node will be

+              sent over fevs AFTER the finalizeSync message that this node

+              just sent. This at least is the case for messages arriving at

+              this node. So we know that ALL nodes, including this coord node,

+              will receive any mutating messages AFTER ALL nodes, including 
this

+              coord node, have received the finalizeSync message. This at least

+              for messages arriving from clients at this node.

+

+              At the other nodes. Any mutating message does a local check on 
writability.

+              If currently not writable, the request is rejected locally.

+              Shift to writability is only by finalize sync or abortSync.

+            */

+

+            for (i = sOwnerVector.begin(); i != sOwnerVector.end();) {

+                if ((*i)->mDying) {

+                    LOG_WA("Removing admin owner %u %s which is in demise ",

+                           (*i)->mId, (*i)->mAdminOwnerName.c_str());

+                    osafassert(adminOwnerDelete((*i)->mId, true) == SA_AIS_OK);

+                    /* Above does a lookup of admin owner again. */

+                    i = sOwnerVector.begin(); /* Restart of iteration */

+                } else {

+                    ++i;

+                }

+            }

+

+            /* We are not generating sync-finalize message */

+            goto done;

+        }

          

          req->lastContinuationId = sLastContinuationId;

          req->adminOwners = NULL;

-        AdminOwnerVector::iterator i;

-

-        for(i=sOwnerVector.begin(); i!=sOwnerVector.end();) {

-            if((*i)->mDying && !((*i)->mReleaseOnFinalize)) {

-                LOG_WA("Removing admin owner %u %s (ROF==FALSE) which is in 
demise, "

-                       "BEFORE generating finalize sync message", (*i)->mId,

-                    (*i)->mAdminOwnerName.c_str());

-                osafassert(adminOwnerDelete((*i)->mId, true) == SA_AIS_OK);

-                //Above does a lookup of admin owner again.

-                i=sOwnerVector.begin();//Restart of iteration.

-            } else {

-                ++i;

-            }

-        }

-

-        for(i=sOwnerVector.begin(); i!=sOwnerVector.end(); ++i) {

+

+        for (i = sOwnerVector.begin();

+             i != sOwnerVector.end() && (!(*i)->mDying || 
((*i)->mReleaseOnFinalize)); /* Exclude mDying with ROF==FALSE */

[Zoran] The upper code does not exclude ROF == FALSE. It actually stops the 
loop on the first admin owner who is dying and ROF == true;

If the code above 'for' loop is removed, then it's very likely that the 'for' 
loop may exit before the end of sOwnerVector. Without the upper removed code, 
there might be a dying admin owner in sOwnerVector.

[Hung] Yes, that's loop is wrong.

The condition used to exclude the "dying+ROF=TRUE" should be put inside 
the for loop.

Something like this:

for (i = sOwnerVector.begin(); i != sOwnerVector.end(); i++) {

     if ((*i)->mDying && !(*i)->mReleaseOnFinalize) continue; /* Exclude 
mDying with ROF==FALSE */

     ...

}

I also want to explain more what I did in the patch.

Here's what coord IMMND does in the original code:

1. Set node state to FULLY_AVAILABLE

2. Loop through sOwnerVector and remove all "dying+ROF=FALSE" admo.

3. Loop through sOwnerVector and add all to the message (req->adminOwners).

    That means both "healthy" admo and "dying+ROF=TRUE" admo will be 
synced to other nodes.

4. Loop through sOwnerVector and remove all "dying+ROF=TRUE" admo.

5. Add implementers, classes, ccbs to messsage.

6. Send message over fevs.

I changed the code to:

1. Loop through sOwnerVector and only add "healthy" admo and 
"dying+ROF=TRUE" admo to the message.

    All "dying+ROF=FALSE" admo will be excluded from the message.

    We don't actually delete any admo at this moment.

2. Add implementers, classes, ccbs to messsage.

3. Send message over fevs.

4. If it succeed to send, set node state to FULLY_AVAILABLE and remove 
all "dying" admo.

    If not, the state will remain R_AVAILABLE to trigger a sync abort 
broadcasted to all IMMND.

    The "dying" admo will be cleared when abort sync message arrives.

The other IMMNDs (sync-clients and veterans) will receive the same 
sync-finalize message (containing only "healthy" admo and 
"dying+ROF=TRUE" admo).

No code change is needed for them.

The main purpose of the patch is to postpone setting node-sate and 
removing dying admo until successfully sending sync-finalize message.

Cleaning dying admo in ::abortSync() also fixes resource leak problem.

In the current code dying admo only are cleaned up in ::finalizeSync().

If the sync is aborted, the admo marked as dying will not be removed 
until there's an successful sync.

Sync abort can happen even before finalizing sync (e.g sync process 
crashes).

Even if this ticket doesn't exist, we still need to fix that resource 
leak problem.

I don't want to create another ticket for it. I think it's fine to fix 
it here in this ticket.

With this patch, this function is called from 4 places, and from only one place 
where 'req' is NULL.

[Hung] I don't get your point here.

About 'req' being NULL.

Since setting node-state and removing dying admo are postponed, 
::finalizeSync() needed to be called twice.

The 1st time is for generating message, the 2nd time (after sending) is 
for setting node-state and removing admo.

I don't want to add another argument for ::finalizeSync() or to create 
new function.

That's why I used 'req' to check. When req is NULL, it will just set 
node-state and remove dying admo and then "goto done".

I will resend the patch for review when you and Neel are done reviewing.

Thanks,

Zoran

+             ++i) {

              ImmsvAdmoList* ai = (ImmsvAdmoList *)

                  calloc(1, sizeof(ImmsvAdmoList));

              ai->id = (*i)->mId;

@@ -18156,20 +18184,6 @@ ImmModel::finalizeSync(ImmsvOmFinalizeSy

          LOG_IN("finalizeSync message contains %u admin-owners",

              (unsigned int) sOwnerVector.size());

  

-        for(i=sOwnerVector.begin(); i!=sOwnerVector.end();) {

-            if((*i)->mDying) {

-                osafassert((*i)->mReleaseOnFinalize);

-                LOG_WA("Removing admin owner %u %s (ROF==TRUE) which is in 
demise, "

-                       "AFTER generating finalize sync message", (*i)->mId,

-                    (*i)->mAdminOwnerName.c_str());

-                osafassert(adminOwnerDelete((*i)->mId, true) == SA_AIS_OK);

-                //Above does a lookup of admin owner again.

-                i=sOwnerVector.begin();//Restart of iteration.

-            } else {

-                ++i;

-            }

-        }

-

          /* Done with generate Admo */

  

          req->implementers = NULL;

diff --git a/osaf/services/saf/immsv/immnd/immnd_evt.c 
b/osaf/services/saf/immsv/immnd/immnd_evt.c

--- a/osaf/services/saf/immsv/immnd/immnd_evt.c

+++ b/osaf/services/saf/immsv/immnd/immnd_evt.c

@@ -5763,10 +5763,14 @@ static uint32_t immnd_evt_proc_sync_fina

                        if (proc_rc != NCSCC_RC_SUCCESS) {

                                TRACE_2("Failed send fevs message");  /*Error 
already logged in fevs_fo */

                                err = SA_AIS_ERR_NO_RESOURCES;

+                              /* Sync will be aborted in immnd_proc_server() */

+                      } else {

+                              /* Change the node sate to FULLY_AVAILABLE and 
clear the admin-owners which are in demise */

+                              immModel_finalizeSync(cb, NULL, SA_TRUE, 
SA_FALSE);

+                              cb->mSyncFinalizing = 0x1;

                        }

  

                        free(tmpData);

-                      cb->mSyncFinalizing = 0x1;

                }

         } else {

                LOG_ER("Will not allow sync messages from any process except 
sync process");


------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Opensaf-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-devel

Reply via email to