Re: [devel] [PATCH 1/1] ntfd: Do not send response to client if client down [#3084]

2019-10-09 Thread Thien Minh Huynh
Hi Minh,

Thanks for your comment.
In case, agent down before clientAdded() then SearchAndSetClientsDownFlag() 
does not work.

Best Regards,
ThienHuynh

-Original Message-
From: Minh Hon Chau  
Sent: Thursday, October 10, 2019 1:50 AM
To: Thien Minh Huynh ; 'Nguyen Minh Vu' 
; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] ntfd: Do not send response to client if client down 
[#3084]

Hi all,

What I guess from the ticket that it is a race condition between the mds thread 
and main thread in ntfd. We normally get NCSDOWN callback from mds, and send 
event to main thread to remove the client. But the mds callback here comes in 
the middle of processing Initialize().

We have something similar done in ntfd with SearchAndSetClientsDownFlag(), 
GetClientDownFlag(), SetClientDownFlag(), can we try to reuse them?

Thanks,

Minh

On 9/10/19 5:10 pm, Thien Minh Huynh wrote:
> Hi Vu,
>
> Thanks for your time to review the patch.
>
> Best Regards,
> ThienHuynh
>
> -Original Message-
> From: Nguyen Minh Vu 
> Sent: Wednesday, October 9, 2019 11:15 AM
> To: thien.m.huynh ; 
> thuan.t...@dektech.com.au; minh.c...@dektech.com.au
> Cc: opensaf-devel@lists.sourceforge.net
> Subject: Re: [PATCH 1/1] ntfd: Do not send response to client if 
> client down [#3084]
>
> Hi Thien,
>
> I have some comments below.
>
> I see this enhancement does not bring much value to NTF as it deals with a 
> very rare case - process is terminated before saNtfInitialize() returns. In 
> reality, if NTF server is getting overloaded by such process, there must be 
> an error in that process.
>
> @Minh: how about your opinion? is this ticket valid?
>
> Anyway, here are my comments:
> 1) Only C source files, ntfs_mds.c & ntfs_evt.c, access the new added list 
> `ntfa_down_list_head`, why put new added methods in the C++ file and add C 
> wrapper functions for them?
> It should be more clean if you move these functions into a new files
> e.g: ntfs_client_down.{h,c}.
>
> 2) C++ method name should start with a capital letter (refer to C++ 
> google coding rule)
>
> 3) Naming methods that represent adding a down client to list, and 
> removing from the list should pair/opposite with each other e.g. Open 
> vs Close, Add vs Remove, not mark vs remove
>
> 4) The list is accessing from 02 different threads, mds and main thread, 
> therefore must use mutex to prevent race conditions.
>
> 5) Should have a check to ensure *not* adding the down client into the list 
> if that client has successfully initialized.
>
> Regards, Vu
>
> On 10/9/19 9:36 AM, thien.m.huynh wrote:
>> Ntfd will not send response to a client when client already down.
>> This will avoid timeout when ntfd send via mds.
>> ---
>>src/ntf/ntfd/NtfAdmin.cc | 93 
>> 
>>src/ntf/ntfd/NtfAdmin.h  |  3 ++
>>src/ntf/ntfd/ntfs_cb.h   |  6 
>>src/ntf/ntfd/ntfs_com.h  |  3 ++
>>src/ntf/ntfd/ntfs_evt.c  |  1 +
>>src/ntf/ntfd/ntfs_mds.c  |  9 -
>>6 files changed, 114 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/ntf/ntfd/NtfAdmin.cc b/src/ntf/ntfd/NtfAdmin.cc 
>> index 8bbee69..641171b 100644
>> --- a/src/ntf/ntfd/NtfAdmin.cc
>> +++ b/src/ntf/ntfd/NtfAdmin.cc
>> @@ -560,6 +560,85 @@ void NtfAdmin::SearchAndSetClientsDownFlag(MDS_DEST 
>> mds_dest) {
>>}
>>
>>/**
>> + * @brief Add mds_dest tag into ntfa down list
>> + * @param mds_dest
>> + */
>> +void NtfAdmin::markAgentDown(MDS_DEST mds_dest) {
>> +  TRACE_ENTER();
>> +  NTFA_DOWN_LIST *ntfa_down_rec = NULL;
>> +  if ((ntfa_down_rec = reinterpret_cast(
>> +   malloc(sizeof(NTFA_DOWN_LIST == NULL) {
>> +LOG_ER("memory allocation for the NTFA_DOWN_LIST failed");
>> +return;
>> +  }
>> +  memset(ntfa_down_rec, 0, sizeof(NTFA_DOWN_LIST));
>> +  ntfa_down_rec->mds_dest = mds_dest;
>> +  ntfa_down_rec->next = NULL;
>> +
>> +  if (ntfs_cb->ntfa_down_list_head == NULL) {
>> +ntfs_cb->ntfa_down_list_head = ntfa_down_rec;
>> +  } else {
>> +NTFA_DOWN_LIST *p = ntfs_cb->ntfa_down_list_head;
>> +while (p->next != NULL) {
>> +  p = p->next;
>> +}
>> +p->next = ntfa_down_rec;
>> +  }
>> +  TRACE_1("Added MDS dest: %" PRIx64, ntfa_down_rec->mds_dest);
>> +  TRACE_LEAVE();
>> +}
>> +
>> +/**
>> + * @brief Find and remove agent from ntfa down list
>> + * @param mds_dest
>> + */
>> +void NtfAdmin::removeAgentFromDownList(MDS_DEST mds_dest) {
>> +  NTFA_DOWN_LIST *ntfa_down_rec = ntfs_cb->ntfa_down_list_head;
>> +  NTFA_DOWN_LIST *prev = NULL;
>> +  TRACE_ENTER();
>> +  while (ntfa_down_rec != NULL) {
>> +if (mds_dest == ntfa_down_rec->mds_dest) {
>> +  if (ntfa_down_rec == ntfs_cb->ntfa_down_list_head) {
>> +if (ntfa_down_rec->next == NULL) {
>> +  ntfs_cb->ntfa_down_list_head = NULL;
>> +} else {
>> +  ntfs_cb->ntfa_down_list_head = ntfa_down_rec->next;
>> +}
>> +  } else if (prev) {
>> +prev->next = 

[devel] [PATCH 1/1] osaf: return new takeover_request immediately [#3098]

2019-10-09 Thread Gary Lee
If a takeover_request is created just before the active controller
calls 'watch takeover_request', then it's possible that the
active rded instance is not informed of the request.

When 'watch takeover_request' is called, check if there's already
a takeover_request in 'NEW' state and return immediately.
---
 src/osaf/consensus/plugins/etcd3.plugin | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/src/osaf/consensus/plugins/etcd3.plugin 
b/src/osaf/consensus/plugins/etcd3.plugin
index d926885..4e09ef6 100644
--- a/src/osaf/consensus/plugins/etcd3.plugin
+++ b/src/osaf/consensus/plugins/etcd3.plugin
@@ -337,13 +337,22 @@ watch() {
   orig_value=$(get "$watch_key")
   result=$?
 
-  if [ "$result" -le "1" ]; then
+  if [ "$result" -le 1 ]; then
+  if [ "$result" -eq 0 ] && [ "$watch_key" == "$takeover_request" ]; then
+state=$(echo $orig_value | awk '{print $4}')
+if [ "$state" == "NEW" ]; then
+  # takeover_request already exists; maybe it was written created
+  # while this node was being promoted
+  echo $orig_value
+  return 0
+fi
+  fi
 while true
 do
   sleep $heartbeat_interval
   current_value=$(get "$watch_key")
   result=$?
-  if [ "$result" -gt "1" ]; then
+  if [ "$result" -gt 1 ]; then
 # etcd down?
 if [ "$watch_key" == "$takeover_request" ]; then
   hostname=`cat $node_name_file`
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for osaf: return new takeover_request immediately [#3098]

2019-10-09 Thread Gary Lee
Summary: osaf: return new takeover_request immediately [#3098]
Review request for Ticket(s): 3098
Peer Reviewer(s): Minh, Thuan, Thang, Hans
Pull request to: *** LIST THE PERSON WITH PUSH ACCESS HERE ***
Affected branch(es): develop
Development branch: ticket-3098
Base revision: cafbc5d02c90b57c7c94a7735ce8e002224b3d6b
Personal repository: git://git.code.sf.net/u/userid-2226215/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples y 
 Tests   n
 Other   n


Comments (indicate scope for each "y" above):
-

revision 903ebd435993cce00350c60827e35b15a78ca3c8
Author: Gary Lee 
Date:   Thu, 10 Oct 2019 14:53:41 +1100

osaf: return new takeover_request immediately [#3098]

If a takeover_request is created just before the active controller
calls 'watch takeover_request', then it's possible that the
active rded instance is not informed of the request.

When 'watch takeover_request' is called, check if there's already
a takeover_request in 'NEW' state and return immediately.



Complete diffstat:
--
 src/osaf/consensus/plugins/etcd3.plugin | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)


Testing Commands:
-
*** LIST THE COMMAND LINE TOOLS/STEPS TO TEST YOUR CHANGES ***


Testing, Expected Results:
--
*** PASTE COMMAND OUTPUTS / TEST RESULTS ***


Conditions of Submission:
-
ack from anyone


Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y 
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] ntfd: Do not send response to client if client down [#3084]

2019-10-09 Thread Minh Hon Chau

Hi all,

What I guess from the ticket that it is a race condition between the mds 
thread and main thread in ntfd. We normally get NCSDOWN callback from 
mds, and send event to main thread to remove the client. But the mds 
callback here comes in the middle of processing Initialize().


We have something similar done in ntfd with 
SearchAndSetClientsDownFlag(), GetClientDownFlag(), SetClientDownFlag(), 
can we try to reuse them?


Thanks,

Minh

On 9/10/19 5:10 pm, Thien Minh Huynh wrote:

Hi Vu,

Thanks for your time to review the patch.

Best Regards,
ThienHuynh

-Original Message-
From: Nguyen Minh Vu 
Sent: Wednesday, October 9, 2019 11:15 AM
To: thien.m.huynh ; thuan.t...@dektech.com.au; 
minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] ntfd: Do not send response to client if client down 
[#3084]

Hi Thien,

I have some comments below.

I see this enhancement does not bring much value to NTF as it deals with a very 
rare case - process is terminated before saNtfInitialize() returns. In reality, 
if NTF server is getting overloaded by such process, there must be an error in 
that process.

@Minh: how about your opinion? is this ticket valid?

Anyway, here are my comments:
1) Only C source files, ntfs_mds.c & ntfs_evt.c, access the new added list 
`ntfa_down_list_head`, why put new added methods in the C++ file and add C wrapper 
functions for them?
It should be more clean if you move these functions into a new files
e.g: ntfs_client_down.{h,c}.

2) C++ method name should start with a capital letter (refer to C++ google 
coding rule)

3) Naming methods that represent adding a down client to list, and removing 
from the list should pair/opposite with each other e.g. Open vs Close, Add vs 
Remove, not mark vs remove

4) The list is accessing from 02 different threads, mds and main thread, 
therefore must use mutex to prevent race conditions.

5) Should have a check to ensure *not* adding the down client into the list if 
that client has successfully initialized.

Regards, Vu

On 10/9/19 9:36 AM, thien.m.huynh wrote:

Ntfd will not send response to a client when client already down.
This will avoid timeout when ntfd send via mds.
---
   src/ntf/ntfd/NtfAdmin.cc | 93 

   src/ntf/ntfd/NtfAdmin.h  |  3 ++
   src/ntf/ntfd/ntfs_cb.h   |  6 
   src/ntf/ntfd/ntfs_com.h  |  3 ++
   src/ntf/ntfd/ntfs_evt.c  |  1 +
   src/ntf/ntfd/ntfs_mds.c  |  9 -
   6 files changed, 114 insertions(+), 1 deletion(-)

diff --git a/src/ntf/ntfd/NtfAdmin.cc b/src/ntf/ntfd/NtfAdmin.cc index
8bbee69..641171b 100644
--- a/src/ntf/ntfd/NtfAdmin.cc
+++ b/src/ntf/ntfd/NtfAdmin.cc
@@ -560,6 +560,85 @@ void NtfAdmin::SearchAndSetClientsDownFlag(MDS_DEST 
mds_dest) {
   }
   
   /**

+ * @brief Add mds_dest tag into ntfa down list
+ * @param mds_dest
+ */
+void NtfAdmin::markAgentDown(MDS_DEST mds_dest) {
+  TRACE_ENTER();
+  NTFA_DOWN_LIST *ntfa_down_rec = NULL;
+  if ((ntfa_down_rec = reinterpret_cast(
+   malloc(sizeof(NTFA_DOWN_LIST == NULL) {
+LOG_ER("memory allocation for the NTFA_DOWN_LIST failed");
+return;
+  }
+  memset(ntfa_down_rec, 0, sizeof(NTFA_DOWN_LIST));
+  ntfa_down_rec->mds_dest = mds_dest;
+  ntfa_down_rec->next = NULL;
+
+  if (ntfs_cb->ntfa_down_list_head == NULL) {
+ntfs_cb->ntfa_down_list_head = ntfa_down_rec;
+  } else {
+NTFA_DOWN_LIST *p = ntfs_cb->ntfa_down_list_head;
+while (p->next != NULL) {
+  p = p->next;
+}
+p->next = ntfa_down_rec;
+  }
+  TRACE_1("Added MDS dest: %" PRIx64, ntfa_down_rec->mds_dest);
+  TRACE_LEAVE();
+}
+
+/**
+ * @brief Find and remove agent from ntfa down list
+ * @param mds_dest
+ */
+void NtfAdmin::removeAgentFromDownList(MDS_DEST mds_dest) {
+  NTFA_DOWN_LIST *ntfa_down_rec = ntfs_cb->ntfa_down_list_head;
+  NTFA_DOWN_LIST *prev = NULL;
+  TRACE_ENTER();
+  while (ntfa_down_rec != NULL) {
+if (mds_dest == ntfa_down_rec->mds_dest) {
+  if (ntfa_down_rec == ntfs_cb->ntfa_down_list_head) {
+if (ntfa_down_rec->next == NULL) {
+  ntfs_cb->ntfa_down_list_head = NULL;
+} else {
+  ntfs_cb->ntfa_down_list_head = ntfa_down_rec->next;
+}
+  } else if (prev) {
+prev->next = ntfa_down_rec->next;
+  }
+  TRACE("Deleted MDS dest: %" PRIx64, ntfa_down_rec->mds_dest);
+  free(ntfa_down_rec);
+  ntfa_down_rec = NULL;
+  break;
+}
+prev = ntfa_down_rec;
+ntfa_down_rec = ntfa_down_rec->next;
+  }
+  TRACE_LEAVE();
+}
+
+/**
+ * @brief  Check if agent exists in down list
+ * @param  mds_dest
+ * @return true/false
+ */
+bool NtfAdmin::isInNtfaDownList(MDS_DEST mds_dest) {
+  bool found = false;
+  NTFA_DOWN_LIST *ntfa_down_rec = ntfs_cb->ntfa_down_list_head;
+  TRACE_ENTER();
+  while (ntfa_down_rec != NULL) {
+if (mds_dest == ntfa_down_rec->mds_dest) {
+  found = true;
+  break;
+}
+ntfa_down_rec = ntfa_down_rec->next;
+  }
+  TRACE_LEAVE();
+  

Re: [devel] [PATCH 1/1] ntfd: Do not send response to client if client down [#3084]

2019-10-09 Thread Thien Minh Huynh
Hi Vu,

Thanks for your time to review the patch.

Best Regards,
ThienHuynh

-Original Message-
From: Nguyen Minh Vu  
Sent: Wednesday, October 9, 2019 11:15 AM
To: thien.m.huynh ; thuan.t...@dektech.com.au; 
minh.c...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] ntfd: Do not send response to client if client down 
[#3084]

Hi Thien,

I have some comments below.

I see this enhancement does not bring much value to NTF as it deals with a very 
rare case - process is terminated before saNtfInitialize() returns. In reality, 
if NTF server is getting overloaded by such process, there must be an error in 
that process.

@Minh: how about your opinion? is this ticket valid?

Anyway, here are my comments:
1) Only C source files, ntfs_mds.c & ntfs_evt.c, access the new added list 
`ntfa_down_list_head`, why put new added methods in the C++ file and add C 
wrapper functions for them?
It should be more clean if you move these functions into a new files
e.g: ntfs_client_down.{h,c}.

2) C++ method name should start with a capital letter (refer to C++ google 
coding rule)

3) Naming methods that represent adding a down client to list, and removing 
from the list should pair/opposite with each other e.g. Open vs Close, Add vs 
Remove, not mark vs remove

4) The list is accessing from 02 different threads, mds and main thread, 
therefore must use mutex to prevent race conditions.

5) Should have a check to ensure *not* adding the down client into the list if 
that client has successfully initialized.

Regards, Vu

On 10/9/19 9:36 AM, thien.m.huynh wrote:
> Ntfd will not send response to a client when client already down.
> This will avoid timeout when ntfd send via mds.
> ---
>   src/ntf/ntfd/NtfAdmin.cc | 93 
> 
>   src/ntf/ntfd/NtfAdmin.h  |  3 ++
>   src/ntf/ntfd/ntfs_cb.h   |  6 
>   src/ntf/ntfd/ntfs_com.h  |  3 ++
>   src/ntf/ntfd/ntfs_evt.c  |  1 +
>   src/ntf/ntfd/ntfs_mds.c  |  9 -
>   6 files changed, 114 insertions(+), 1 deletion(-)
>
> diff --git a/src/ntf/ntfd/NtfAdmin.cc b/src/ntf/ntfd/NtfAdmin.cc index 
> 8bbee69..641171b 100644
> --- a/src/ntf/ntfd/NtfAdmin.cc
> +++ b/src/ntf/ntfd/NtfAdmin.cc
> @@ -560,6 +560,85 @@ void NtfAdmin::SearchAndSetClientsDownFlag(MDS_DEST 
> mds_dest) {
>   }
>   
>   /**
> + * @brief Add mds_dest tag into ntfa down list
> + * @param mds_dest
> + */
> +void NtfAdmin::markAgentDown(MDS_DEST mds_dest) {
> +  TRACE_ENTER();
> +  NTFA_DOWN_LIST *ntfa_down_rec = NULL;
> +  if ((ntfa_down_rec = reinterpret_cast(
> +   malloc(sizeof(NTFA_DOWN_LIST == NULL) {
> +LOG_ER("memory allocation for the NTFA_DOWN_LIST failed");
> +return;
> +  }
> +  memset(ntfa_down_rec, 0, sizeof(NTFA_DOWN_LIST));
> +  ntfa_down_rec->mds_dest = mds_dest;
> +  ntfa_down_rec->next = NULL;
> +
> +  if (ntfs_cb->ntfa_down_list_head == NULL) {
> +ntfs_cb->ntfa_down_list_head = ntfa_down_rec;
> +  } else {
> +NTFA_DOWN_LIST *p = ntfs_cb->ntfa_down_list_head;
> +while (p->next != NULL) {
> +  p = p->next;
> +}
> +p->next = ntfa_down_rec;
> +  }
> +  TRACE_1("Added MDS dest: %" PRIx64, ntfa_down_rec->mds_dest);
> +  TRACE_LEAVE();
> +}
> +
> +/**
> + * @brief Find and remove agent from ntfa down list
> + * @param mds_dest
> + */
> +void NtfAdmin::removeAgentFromDownList(MDS_DEST mds_dest) {
> +  NTFA_DOWN_LIST *ntfa_down_rec = ntfs_cb->ntfa_down_list_head;
> +  NTFA_DOWN_LIST *prev = NULL;
> +  TRACE_ENTER();
> +  while (ntfa_down_rec != NULL) {
> +if (mds_dest == ntfa_down_rec->mds_dest) {
> +  if (ntfa_down_rec == ntfs_cb->ntfa_down_list_head) {
> +if (ntfa_down_rec->next == NULL) {
> +  ntfs_cb->ntfa_down_list_head = NULL;
> +} else {
> +  ntfs_cb->ntfa_down_list_head = ntfa_down_rec->next;
> +}
> +  } else if (prev) {
> +prev->next = ntfa_down_rec->next;
> +  }
> +  TRACE("Deleted MDS dest: %" PRIx64, ntfa_down_rec->mds_dest);
> +  free(ntfa_down_rec);
> +  ntfa_down_rec = NULL;
> +  break;
> +}
> +prev = ntfa_down_rec;
> +ntfa_down_rec = ntfa_down_rec->next;
> +  }
> +  TRACE_LEAVE();
> +}
> +
> +/**
> + * @brief  Check if agent exists in down list
> + * @param  mds_dest
> + * @return true/false
> + */
> +bool NtfAdmin::isInNtfaDownList(MDS_DEST mds_dest) {
> +  bool found = false;
> +  NTFA_DOWN_LIST *ntfa_down_rec = ntfs_cb->ntfa_down_list_head;
> +  TRACE_ENTER();
> +  while (ntfa_down_rec != NULL) {
> +if (mds_dest == ntfa_down_rec->mds_dest) {
> +  found = true;
> +  break;
> +}
> +ntfa_down_rec = ntfa_down_rec->next;
> +  }
> +  TRACE_LEAVE();
> +  return found;
> +}
> +
> +/**
>* The node object where the client who had the subscription is notified
>* so it can delete the appropriate subscription and filter object.
>*
> @@ -1300,6 +1379,20 @@ uint32_t 
> send_clm_node_status_change(SaClmClusterChangesT cluster_change,
>