Re: [devel] [PATCH 1/1] ntf: fix ntfd remove client in standby node while not finalize in active node [#2705]

2017-12-07 Thread Minh Hon Chau

Hi Canh,

I have an idea but not sure if you like it.

At the time we receive NTFA_DOWN inside mds_svc_event(), if we can 
access the client database, find all client id having mds_dest and 
attach a range of client id to be deleted into the event which is about 
to sent to mailbox. Then, at the time we process 
proc_ntfa_updn_mds_msg(), if we only delete the client instance that has 
both mds_dest and client id drops in the range of to-be-deleted client 
id, otherwise we know this updn_mds_msg is lingering, so the client of 
next successful initialize() won't be deleted. It is because the client 
id is unique and incremental for each new client.


Thanks,
Minh
On 07/12/17 14:38, Minh Hon Chau wrote:


Hi Canh,

Please see my comment marked with [Minh]

Thanks,

Minh


On 06/12/17 21:21, Canh Van Truong wrote:


Hi aMinh,

Please see my comment.

Regards

Canh

*From:*Minh Hon Chau [mailto:minh.c...@dektech.com.au]
*Sent:* Wednesday, December 6, 2017 7:20 AM
*To:* Canh Van Truong 
*Cc:* opensaf-devel@lists.sourceforge.net
*Subject:* Re: [PATCH 1/1] ntf: fix ntfd remove client in standby 
node while not finalize in active node [#2705]


Hi Canh,

I got a build error with 32 bit

src/ntf/ntfd/ntfs_com.c: In function ‘sendNtfaDownUpdate’:
src/ntf/ntfd/ntfs_com.c:562:2: error: format ‘%ld’ expects argument of type 
‘long int’, but argument 6 has type ‘MDS_DEST’ [-Werror=format=]
   TRACE_ENTER2("mdsDest: %ld", mdsDest);
   ^
cc1: all warnings being treated as errors
make[2]: *** [src/ntf/ntfd/bin_osafntfd-ntfs_com.o] Error 1
And another question, with this patch the standby NTFD will no longer delete 
client in proc_ntfa_updn_mds_msg(), it will memorize the client to down_list. 
So in the situation of this ticket, the standby will have sequence of msg as 
below:
1 -> receive checkpoint of finalize
2 -> receive checkpoint of client_down
3 -> receive checkpoint of initialize
4 -> process proc_ntfa_updn_mds_msg, which comes from MDS and sent to mailbox 
before checkpoint of finalize. So we just add the client to down_list, that will 
avoid the error in this ticket, but we are adding a client that had been down?
[Canh] Yes, the agent down will be add to list and no longer removed 
from list in this case. But the client belong to agents down always 
are removed from database(clientMap…) in right way. The down_list is 
just helpful in case failover. I will increase the priority of 
MDS_DOWN to VERY_HIGH before putting in to mailbox. And I keep the 
standby ntfd remove the client at checkpoint of client_down, not in 
process proc_ntfa_updn_mds_msg

How do you think about it?
[Minh] Since you introduce the down_list, so when client instance is 
removed, its item in down_list must not existed. Let's see the above 
sequence, if next step there will be a failover, the patch will remove 
the client (this is new client from next initialize() but same 
mds_dest) because it is present in down_list. The result is that the 
client will be removed unexpectedly.
If you increase the priority of MDS_DOWN before putting in mailbox, if 
it works, proc_ntfa_updn_mds_msg should be processed before checkpoint 
of finalize(). So then with just increasing the priority, the original 
problem would be solved without this patch?

Thanks,
Minh

On 01/12/17 17:17, Canh Van Truong wrote:

The issue happen because the clients are removed in both active and standby 
node when getting

NCSMDS_DOWN event. In standby node, ntfd get NCSMDS_DOWN event is slower 
than next initialize request.

This cause the ntfd will removed all client from data base including new 
client of next initialze.

Any action relate to this client will fail.

The fixing is that when getting NCSMDS_DOWN event, ntfd remove client in 
active node but does not remove

client in standby node. At standby node, ntfd will remove client when 
process the checkpoint of NCSMDS_DOWN event.

---

  src/ntf/ntfd/NtfAdmin.cc  | 89 
+++

  src/ntf/ntfd/NtfAdmin.h   | 11 --

  src/ntf/ntfd/ntfs_com.c   | 15 

  src/ntf/ntfd/ntfs_com.h   |  4 +++

  src/ntf/ntfd/ntfs_evt.c   | 17 +++--

  src/ntf/ntfd/ntfs_mbcsv.c | 39 +++--

  6 files changed, 169 insertions(+), 6 deletions(-)

diff --git a/src/ntf/ntfd/NtfAdmin.cc b/src/ntf/ntfd/NtfAdmin.cc

index dad00383d..aa616d7ee 100644

--- a/src/ntf/ntfd/NtfAdmin.cc

+++ b/src/ntf/ntfd/NtfAdmin.cc

@@ -467,6 +467,80 @@ void NtfAdmin::clientRemoveMDS(MDS_DEST mds_dest) {

  }

  


  /**

+ * Checking if the ntf agent with MDS_DEST is valid

+ *

+ * @param agent_dest

+ */

+bool NtfAdmin::is_valid_ntf_agent(MDS_DEST agent_dest) {

+  TRACE_ENTER();

+  ClientMap::iterator it;

+  bool valid = false;

+  for (it = clientMap.begin(); it != clientMap.end(); it++) {

+    NtfClient *client = it->second;

+    if 

Re: [devel] [PATCH 1/1] ntf: fix ntfd remove client in standby node while not finalize in active node [#2705]

2017-12-06 Thread Minh Hon Chau

Hi Canh,

Please see my comment marked with [Minh]

Thanks,

Minh


On 06/12/17 21:21, Canh Van Truong wrote:


Hi aMinh,

Please see my comment.

Regards

Canh

*From:*Minh Hon Chau [mailto:minh.c...@dektech.com.au]
*Sent:* Wednesday, December 6, 2017 7:20 AM
*To:* Canh Van Truong 
*Cc:* opensaf-devel@lists.sourceforge.net
*Subject:* Re: [PATCH 1/1] ntf: fix ntfd remove client in standby node 
while not finalize in active node [#2705]


Hi Canh,

I got a build error with 32 bit

src/ntf/ntfd/ntfs_com.c: In function ‘sendNtfaDownUpdate’:
src/ntf/ntfd/ntfs_com.c:562:2: error: format ‘%ld’ expects argument of type 
‘long int’, but argument 6 has type ‘MDS_DEST’ [-Werror=format=]
   TRACE_ENTER2("mdsDest: %ld", mdsDest);
   ^
cc1: all warnings being treated as errors
make[2]: *** [src/ntf/ntfd/bin_osafntfd-ntfs_com.o] Error 1
And another question, with this patch the standby NTFD will no longer delete 
client in proc_ntfa_updn_mds_msg(), it will memorize the client to down_list. 
So in the situation of this ticket, the standby will have sequence of msg as 
below:
1 -> receive checkpoint of finalize
2 -> receive checkpoint of client_down
3 -> receive checkpoint of initialize
4 -> process proc_ntfa_updn_mds_msg, which comes from MDS and sent to mailbox 
before checkpoint of finalize. So we just add the client to down_list, that will 
avoid the error in this ticket, but we are adding a client that had been down?
[Canh] Yes, the agent down will be add to list and no longer removed 
from list in this case. But the client belong to agents down always 
are removed from database(clientMap…) in right way. The down_list is 
just helpful in case failover. I will increase the priority of 
MDS_DOWN to VERY_HIGH before putting in to mailbox. And I keep the 
standby ntfd remove the client at checkpoint of client_down, not in 
process proc_ntfa_updn_mds_msg

How do you think about it?
[Minh] Since you introduce the down_list, so when client instance is 
removed, its item in down_list must not existed. Let's see the above 
sequence, if next step there will be a failover, the patch will remove 
the client (this is new client from next initialize() but same mds_dest) 
because it is present in down_list. The result is that the client will 
be removed unexpectedly.
If you increase the priority of MDS_DOWN before putting in mailbox, if 
it works, proc_ntfa_updn_mds_msg should be processed before checkpoint 
of finalize(). So then with just increasing the priority, the original 
problem would be solved without this patch?

Thanks,
Minh

On 01/12/17 17:17, Canh Van Truong wrote:

The issue happen because the clients are removed in both active and standby 
node when getting

NCSMDS_DOWN event. In standby node, ntfd get NCSMDS_DOWN event is slower 
than next initialize request.

This cause the ntfd will removed all client from data base including new 
client of next initialze.

Any action relate to this client will fail.

The fixing is that when getting NCSMDS_DOWN event, ntfd remove client in 
active node but does not remove

client in standby node. At standby node, ntfd will remove client when 
process the checkpoint of NCSMDS_DOWN event.

---

  src/ntf/ntfd/NtfAdmin.cc  | 89 
+++

  src/ntf/ntfd/NtfAdmin.h   | 11 --

  src/ntf/ntfd/ntfs_com.c   | 15 

  src/ntf/ntfd/ntfs_com.h   |  4 +++

  src/ntf/ntfd/ntfs_evt.c   | 17 +++--

  src/ntf/ntfd/ntfs_mbcsv.c | 39 +++--

  6 files changed, 169 insertions(+), 6 deletions(-)

diff --git a/src/ntf/ntfd/NtfAdmin.cc b/src/ntf/ntfd/NtfAdmin.cc

index dad00383d..aa616d7ee 100644

--- a/src/ntf/ntfd/NtfAdmin.cc

+++ b/src/ntf/ntfd/NtfAdmin.cc

@@ -467,6 +467,80 @@ void NtfAdmin::clientRemoveMDS(MDS_DEST mds_dest) {

  }

  


  /**

+ * Checking if the ntf agent with MDS_DEST is valid

+ *

+ * @param agent_dest

+ */

+bool NtfAdmin::is_valid_ntf_agent(MDS_DEST agent_dest) {

+  TRACE_ENTER();

+  ClientMap::iterator it;

+  bool valid = false;

+  for (it = clientMap.begin(); it != clientMap.end(); it++) {

+    NtfClient *client = it->second;

+    if (client->getMdsDest() == agent_dest) {

+  valid = true;

+  break;

+    }

+  }

+  TRACE_LEAVE2("The validation of ntfa: %d", valid);

+  return valid;

+}

+

+/**

+ * Add the ntfa down to the list. This is helpful to remember the

+ * list of ntfa to process in case failover.

+ *

+ * @param agent_dest

+ */

+void NtfAdmin::AddNtfAgentDown(MDS_DEST agent_dest) {

+  TRACE_ENTER2(" Add ntfa down (%ld) to the list", agent_dest);

+

+  if (is_valid_ntf_agent(agent_dest)) {

+    MDS_DEST *mds_dest = new MDS_DEST;

+    *mds_dest = agent_dest;

+    ntfa_down_list.push_back(mds_dest);

+  }

+}

+

  

Re: [devel] [PATCH 1/1] ntf: fix ntfd remove client in standby node while not finalize in active node [#2705]

2017-12-06 Thread Canh Van Truong
Hi aMinh,

 

Please see my comment.

 

Regards

Canh

 

From: Minh Hon Chau [mailto:minh.c...@dektech.com.au] 
Sent: Wednesday, December 6, 2017 7:20 AM
To: Canh Van Truong 
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] ntf: fix ntfd remove client in standby node while not 
finalize in active node [#2705]

 

Hi Canh,

I got a build error with 32 bit

src/ntf/ntfd/ntfs_com.c: In function ‘sendNtfaDownUpdate’:
src/ntf/ntfd/ntfs_com.c:562:2: error: format ‘%ld’ expects argument of type 
‘long int’, but argument 6 has type ‘MDS_DEST’ [-Werror=format=]
  TRACE_ENTER2("mdsDest: %ld", mdsDest);
  ^
cc1: all warnings being treated as errors
make[2]: *** [src/ntf/ntfd/bin_osafntfd-ntfs_com.o] Error 1
 
And another question, with this patch the standby NTFD will no longer delete 
client in proc_ntfa_updn_mds_msg(), it will memorize the client to down_list. 
So in the situation of this ticket, the standby will have sequence of msg as 
below:
 
1 -> receive checkpoint of finalize
2 -> receive checkpoint of client_down
3 -> receive checkpoint of initialize
4 -> process proc_ntfa_updn_mds_msg, which comes from MDS and sent to mailbox 
before checkpoint of finalize. So we just add the client to down_list, that 
will avoid the error in this ticket, but we are adding a client that had been 
down?
 
[Canh] Yes, the agent down will be add to list and no longer removed from list 
in this case. But the client belong to agents down always are removed from 
database(clientMap…) in right way. The down_list is just helpful in case 
failover. I will increase the priority of MDS_DOWN to VERY_HIGH before putting 
in to mailbox. And I keep the standby ntfd remove the client at checkpoint of 
client_down, not in process proc_ntfa_updn_mds_msg
How do you think about it?
 
 
Thanks,
Minh 

 

On 01/12/17 17:17, Canh Van Truong wrote:

The issue happen because the clients are removed in both active and standby 
node when getting
NCSMDS_DOWN event. In standby node, ntfd get NCSMDS_DOWN event is slower than 
next initialize request.
This cause the ntfd will removed all client from data base including new client 
of next initialze.
Any action relate to this client will fail.
 
The fixing is that when getting NCSMDS_DOWN event, ntfd remove client in active 
node but does not remove
client in standby node. At standby node, ntfd will remove client when process 
the checkpoint of NCSMDS_DOWN event.
---
 src/ntf/ntfd/NtfAdmin.cc  | 89 +++
 src/ntf/ntfd/NtfAdmin.h   | 11 --
 src/ntf/ntfd/ntfs_com.c   | 15 
 src/ntf/ntfd/ntfs_com.h   |  4 +++
 src/ntf/ntfd/ntfs_evt.c   | 17 +++--
 src/ntf/ntfd/ntfs_mbcsv.c | 39 +++--
 6 files changed, 169 insertions(+), 6 deletions(-)
 
diff --git a/src/ntf/ntfd/NtfAdmin.cc b/src/ntf/ntfd/NtfAdmin.cc
index dad00383d..aa616d7ee 100644
--- a/src/ntf/ntfd/NtfAdmin.cc
+++ b/src/ntf/ntfd/NtfAdmin.cc
@@ -467,6 +467,80 @@ void NtfAdmin::clientRemoveMDS(MDS_DEST mds_dest) {
 }
 
 /**
+ * Checking if the ntf agent with MDS_DEST is valid
+ *
+ * @param agent_dest
+ */
+bool NtfAdmin::is_valid_ntf_agent(MDS_DEST agent_dest) {
+  TRACE_ENTER();
+  ClientMap::iterator it;
+  bool valid = false;
+  for (it = clientMap.begin(); it != clientMap.end(); it++) {
+NtfClient *client = it->second;
+if (client->getMdsDest() == agent_dest) {
+  valid = true;
+  break;
+}
+  }
+  TRACE_LEAVE2("The validation of ntfa: %d", valid);
+  return valid;
+}
+
+/**
+ * Add the ntfa down to the list. This is helpful to remember the
+ * list of ntfa to process in case failover.
+ *
+ * @param agent_dest
+ */
+void NtfAdmin::AddNtfAgentDown(MDS_DEST agent_dest) {
+  TRACE_ENTER2(" Add ntfa down (%ld) to the list", agent_dest);
+
+  if (is_valid_ntf_agent(agent_dest)) {
+MDS_DEST *mds_dest = new MDS_DEST;
+*mds_dest = agent_dest;
+ntfa_down_list.push_back(mds_dest);
+  }
+}
+
+/**
+ * Remove the ntfa down from the list
+ *
+ * @param agent_dest
+ */
+void NtfAdmin::RemoveNtfAgentDownFromList(MDS_DEST agent_dest) {
+  TRACE_ENTER();
+  std::list::iterator it;
+  for (it = ntfa_down_list.begin(); it != ntfa_down_list.end(); ++it) {
+MDS_DEST *mds_dest = *it;
+if (*mds_dest == agent_dest) {
+  ntfa_down_list.erase(it);
+  TRACE(" Remove ntfa down (%ld) from the list", agent_dest);
+  delete mds_dest;
+  return;
+}
+  }
+}
+
+/**
+ * Process to clear all ntfa down records in ntfd. The old client of
+ * agent down is removed from database.
+ *
+ * @param agent_dest
+ */
+void NtfAdmin::ProcessNtfAgentDownList() {
+  TRACE_ENTER();
+  std::list::iterator it = ntfa_down_list.begin();;
+  while(it != ntfa_down_list.end()) {
+MDS_DEST *mds_dest = *it;
+it = ntfa_down_list.erase(it);
+clientRemoveMDS(*mds_dest);
+delete mds_dest;
+  }
+  TRACE_LEAVE();
+  return;
+}
+
+/**
  * The node object where the client who had the subscription is notified
  * so it can 

Re: [devel] [PATCH 1/1] ntf: fix ntfd remove client in standby node while not finalize in active node [#2705]

2017-12-05 Thread Minh Hon Chau

Hi Canh,

I got a build error with 32 bit

src/ntf/ntfd/ntfs_com.c: In function ‘sendNtfaDownUpdate’:
src/ntf/ntfd/ntfs_com.c:562:2: error: format ‘%ld’ expects argument of type 
‘long int’, but argument 6 has type ‘MDS_DEST’ [-Werror=format=]
  TRACE_ENTER2("mdsDest: %ld", mdsDest);
  ^
cc1: all warnings being treated as errors
make[2]: *** [src/ntf/ntfd/bin_osafntfd-ntfs_com.o] Error 1

And another question, with this patch the standby NTFD will no longer delete 
client in proc_ntfa_updn_mds_msg(), it will memorize the client to down_list. 
So in the situation of this ticket, the standby will have sequence of msg as 
below:

1 -> receive checkpoint of finalize
2 -> receive checkpoint of client_down
3 -> receive checkpoint of initialize
4 -> process proc_ntfa_updn_mds_msg, which comes from MDS and sent to mailbox 
before checkpoint of finalize. So we just add the client to down_list, that will 
avoid the error in this ticket, but we are adding a client that had been down?

Thanks,
Minh


On 01/12/17 17:17, Canh Van Truong wrote:

The issue happen because the clients are removed in both active and standby 
node when getting
NCSMDS_DOWN event. In standby node, ntfd get NCSMDS_DOWN event is slower than 
next initialize request.
This cause the ntfd will removed all client from data base including new client 
of next initialze.
Any action relate to this client will fail.

The fixing is that when getting NCSMDS_DOWN event, ntfd remove client in active 
node but does not remove
client in standby node. At standby node, ntfd will remove client when process 
the checkpoint of NCSMDS_DOWN event.
---
  src/ntf/ntfd/NtfAdmin.cc  | 89 +++
  src/ntf/ntfd/NtfAdmin.h   | 11 --
  src/ntf/ntfd/ntfs_com.c   | 15 
  src/ntf/ntfd/ntfs_com.h   |  4 +++
  src/ntf/ntfd/ntfs_evt.c   | 17 +++--
  src/ntf/ntfd/ntfs_mbcsv.c | 39 +++--
  6 files changed, 169 insertions(+), 6 deletions(-)

diff --git a/src/ntf/ntfd/NtfAdmin.cc b/src/ntf/ntfd/NtfAdmin.cc
index dad00383d..aa616d7ee 100644
--- a/src/ntf/ntfd/NtfAdmin.cc
+++ b/src/ntf/ntfd/NtfAdmin.cc
@@ -467,6 +467,80 @@ void NtfAdmin::clientRemoveMDS(MDS_DEST mds_dest) {
  }
  
  /**

+ * Checking if the ntf agent with MDS_DEST is valid
+ *
+ * @param agent_dest
+ */
+bool NtfAdmin::is_valid_ntf_agent(MDS_DEST agent_dest) {
+  TRACE_ENTER();
+  ClientMap::iterator it;
+  bool valid = false;
+  for (it = clientMap.begin(); it != clientMap.end(); it++) {
+NtfClient *client = it->second;
+if (client->getMdsDest() == agent_dest) {
+  valid = true;
+  break;
+}
+  }
+  TRACE_LEAVE2("The validation of ntfa: %d", valid);
+  return valid;
+}
+
+/**
+ * Add the ntfa down to the list. This is helpful to remember the
+ * list of ntfa to process in case failover.
+ *
+ * @param agent_dest
+ */
+void NtfAdmin::AddNtfAgentDown(MDS_DEST agent_dest) {
+  TRACE_ENTER2(" Add ntfa down (%ld) to the list", agent_dest);
+
+  if (is_valid_ntf_agent(agent_dest)) {
+MDS_DEST *mds_dest = new MDS_DEST;
+*mds_dest = agent_dest;
+ntfa_down_list.push_back(mds_dest);
+  }
+}
+
+/**
+ * Remove the ntfa down from the list
+ *
+ * @param agent_dest
+ */
+void NtfAdmin::RemoveNtfAgentDownFromList(MDS_DEST agent_dest) {
+  TRACE_ENTER();
+  std::list::iterator it;
+  for (it = ntfa_down_list.begin(); it != ntfa_down_list.end(); ++it) {
+MDS_DEST *mds_dest = *it;
+if (*mds_dest == agent_dest) {
+  ntfa_down_list.erase(it);
+  TRACE(" Remove ntfa down (%ld) from the list", agent_dest);
+  delete mds_dest;
+  return;
+}
+  }
+}
+
+/**
+ * Process to clear all ntfa down records in ntfd. The old client of
+ * agent down is removed from database.
+ *
+ * @param agent_dest
+ */
+void NtfAdmin::ProcessNtfAgentDownList() {
+  TRACE_ENTER();
+  std::list::iterator it = ntfa_down_list.begin();;
+  while(it != ntfa_down_list.end()) {
+MDS_DEST *mds_dest = *it;
+it = ntfa_down_list.erase(it);
+clientRemoveMDS(*mds_dest);
+delete mds_dest;
+  }
+  TRACE_LEAVE();
+  return;
+}
+
+/**
   * The node object where the client who had the subscription is notified
   * so it can delete the appropriate subscription and filter object.
   *
@@ -992,6 +1066,21 @@ void clientRemoveMDS(MDS_DEST mds_dest) {
NtfAdmin::theNtfAdmin->clientRemoveMDS(mds_dest);
  }
  
+void AddNtfAgentDown(MDS_DEST agent_dest) {

+  osafassert(NtfAdmin::theNtfAdmin != NULL);
+  NtfAdmin::theNtfAdmin->AddNtfAgentDown(agent_dest);
+}
+
+void RemoveNtfAgentDownFromList(MDS_DEST agent_dest) {
+  osafassert(NtfAdmin::theNtfAdmin != NULL);
+  NtfAdmin::theNtfAdmin->RemoveNtfAgentDownFromList(agent_dest);
+}
+
+void ProcessNtfAgentDownList() {
+  osafassert(NtfAdmin::theNtfAdmin != NULL);
+  NtfAdmin::theNtfAdmin->ProcessNtfAgentDownList();
+}
+
  void subscriptionRemoved(unsigned int clientId,
   SaNtfSubscriptionIdT subscriptionId,
   MDS_SYNC_SND_CTXT *mdsCtxt) {
diff --git 

[devel] [PATCH 1/1] ntf: fix ntfd remove client in standby node while not finalize in active node [#2705]

2017-11-30 Thread Canh Van Truong
The issue happen because the clients are removed in both active and standby 
node when getting
NCSMDS_DOWN event. In standby node, ntfd get NCSMDS_DOWN event is slower than 
next initialize request.
This cause the ntfd will removed all client from data base including new client 
of next initialze.
Any action relate to this client will fail.

The fixing is that when getting NCSMDS_DOWN event, ntfd remove client in active 
node but does not remove
client in standby node. At standby node, ntfd will remove client when process 
the checkpoint of NCSMDS_DOWN event.
---
 src/ntf/ntfd/NtfAdmin.cc  | 89 +++
 src/ntf/ntfd/NtfAdmin.h   | 11 --
 src/ntf/ntfd/ntfs_com.c   | 15 
 src/ntf/ntfd/ntfs_com.h   |  4 +++
 src/ntf/ntfd/ntfs_evt.c   | 17 +++--
 src/ntf/ntfd/ntfs_mbcsv.c | 39 +++--
 6 files changed, 169 insertions(+), 6 deletions(-)

diff --git a/src/ntf/ntfd/NtfAdmin.cc b/src/ntf/ntfd/NtfAdmin.cc
index dad00383d..aa616d7ee 100644
--- a/src/ntf/ntfd/NtfAdmin.cc
+++ b/src/ntf/ntfd/NtfAdmin.cc
@@ -467,6 +467,80 @@ void NtfAdmin::clientRemoveMDS(MDS_DEST mds_dest) {
 }
 
 /**
+ * Checking if the ntf agent with MDS_DEST is valid
+ *
+ * @param agent_dest
+ */
+bool NtfAdmin::is_valid_ntf_agent(MDS_DEST agent_dest) {
+  TRACE_ENTER();
+  ClientMap::iterator it;
+  bool valid = false;
+  for (it = clientMap.begin(); it != clientMap.end(); it++) {
+NtfClient *client = it->second;
+if (client->getMdsDest() == agent_dest) {
+  valid = true;
+  break;
+}
+  }
+  TRACE_LEAVE2("The validation of ntfa: %d", valid);
+  return valid;
+}
+
+/**
+ * Add the ntfa down to the list. This is helpful to remember the
+ * list of ntfa to process in case failover.
+ *
+ * @param agent_dest
+ */
+void NtfAdmin::AddNtfAgentDown(MDS_DEST agent_dest) {
+  TRACE_ENTER2(" Add ntfa down (%ld) to the list", agent_dest);
+
+  if (is_valid_ntf_agent(agent_dest)) {
+MDS_DEST *mds_dest = new MDS_DEST;
+*mds_dest = agent_dest;
+ntfa_down_list.push_back(mds_dest);
+  }
+}
+
+/**
+ * Remove the ntfa down from the list
+ *
+ * @param agent_dest
+ */
+void NtfAdmin::RemoveNtfAgentDownFromList(MDS_DEST agent_dest) {
+  TRACE_ENTER();
+  std::list::iterator it;
+  for (it = ntfa_down_list.begin(); it != ntfa_down_list.end(); ++it) {
+MDS_DEST *mds_dest = *it;
+if (*mds_dest == agent_dest) {
+  ntfa_down_list.erase(it);
+  TRACE(" Remove ntfa down (%ld) from the list", agent_dest);
+  delete mds_dest;
+  return;
+}
+  }
+}
+
+/**
+ * Process to clear all ntfa down records in ntfd. The old client of
+ * agent down is removed from database.
+ *
+ * @param agent_dest
+ */
+void NtfAdmin::ProcessNtfAgentDownList() {
+  TRACE_ENTER();
+  std::list::iterator it = ntfa_down_list.begin();;
+  while(it != ntfa_down_list.end()) {
+MDS_DEST *mds_dest = *it;
+it = ntfa_down_list.erase(it);
+clientRemoveMDS(*mds_dest);
+delete mds_dest;
+  }
+  TRACE_LEAVE();
+  return;
+}
+
+/**
  * The node object where the client who had the subscription is notified
  * so it can delete the appropriate subscription and filter object.
  *
@@ -992,6 +1066,21 @@ void clientRemoveMDS(MDS_DEST mds_dest) {
   NtfAdmin::theNtfAdmin->clientRemoveMDS(mds_dest);
 }
 
+void AddNtfAgentDown(MDS_DEST agent_dest) {
+  osafassert(NtfAdmin::theNtfAdmin != NULL);
+  NtfAdmin::theNtfAdmin->AddNtfAgentDown(agent_dest);
+}
+
+void RemoveNtfAgentDownFromList(MDS_DEST agent_dest) {
+  osafassert(NtfAdmin::theNtfAdmin != NULL);
+  NtfAdmin::theNtfAdmin->RemoveNtfAgentDownFromList(agent_dest);
+}
+
+void ProcessNtfAgentDownList() {
+  osafassert(NtfAdmin::theNtfAdmin != NULL);
+  NtfAdmin::theNtfAdmin->ProcessNtfAgentDownList();
+}
+
 void subscriptionRemoved(unsigned int clientId,
  SaNtfSubscriptionIdT subscriptionId,
  MDS_SYNC_SND_CTXT *mdsCtxt) {
diff --git a/src/ntf/ntfd/NtfAdmin.h b/src/ntf/ntfd/NtfAdmin.h
index 1d51b3c52..5ca899711 100644
--- a/src/ntf/ntfd/NtfAdmin.h
+++ b/src/ntf/ntfd/NtfAdmin.h
@@ -102,6 +102,10 @@ class NtfAdmin {
   SaClmClusterChangesT cluster_change, NODE_ID node_id);
   bool is_stale_client(unsigned int clientId);
 
+  void AddNtfAgentDown(MDS_DEST agent_dest);
+  void RemoveNtfAgentDownFromList(MDS_DEST agent_dest);
+  void ProcessNtfAgentDownList();
+
  private:
   void processNotification(unsigned int clientId,
SaNtfNotificationTypeT notificationType,
@@ -110,14 +114,17 @@ class NtfAdmin {
SaNtfIdentifierT notificationId);
 
   void updateNotIdCounter(SaNtfIdentifierT notification);
+  bool is_valid_ntf_agent(MDS_DEST agent_dest);
 
   typedef std::map ClientMap;
   ClientMap clientMap;
   NotificationMap notificationMap;
   SaNtfIdentifierT notificationIdCounter;
   unsigned int clientIdCounter;
-  std::list
-  member_node_list; /*To maintain NCS node_ids of CLM memeber nodes.*/
+  // To maintain NCS node_ids