Re: [devel] [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to send response message [#3102]

2019-10-22 Thread Minh Hon Chau

Hi Thuan,

Please see comment inline

Thanks

Minh

On 23/10/19 3:32 pm, Tran Thuan wrote:


Hi Minh,

Thanks for comments. See my response inline.

Btw, I am preparing to send out new patch, I think I found an issue in 
current patch.


Best Regards,

ThuanTr

-Original Message-
From: Minh Hon Chau 
Sent: Wednesday, October 23, 2019 5:52 AM
To: Tran Thuan ; 'Nguyen Minh Vu' 
; hans.nordeb...@ericsson.com; 
gary@dektech.com.au

Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already 
down to send response message [#3102]


Hi Thuan,

I wonder the patch would work in the same reproduced steps if the both

adests have subscribed each other more than 2 services. The svc_cnt will

be greater than 1 until it is the last service down event. I think

that's why mds has the database @subtn_results, in which each item is an

adest associated with a service id separately.

[Thuan] We can understand that adest still alive, then go with origin 
flow (wait 1.5s).


But can a process send SNDRSP then mds unregister? I think it cannot, 
because it’s in SNDRSP (blocking)


[M] Not unregister, it can be unsubscribe. Or do you mean a process can 
not send two SNDRSP at the same time on 2 different subscribed services?


The scenario of this ticket happens for the process terminated/crash.

[M] Yes my doubt is in the context of this ticket - terminated/crash - 
you would get 2 service down event I think


[M] I don't think adding a new database at the global scope for this 
specific case is a good idea, if we can reuse the existing database. Can 
you try to use MDS_SUBSCRIPTION_INFO, add a flag or something similar to 
indicate which case mds should wait 1.5 sec. It would isolate the bug 
fix in the scope of this problem.



The problem originally resides at the services code e.g ntf, imm...

where the threads structure between mds receiving thread and main thread

cause a race condition. Thus the service sends a message with a death

adest which is removed from mds database, that confuses mds and hit 1.5

secs wait time.

If I read the code correctly, the 1.5 wait time is for another case, it

gives another chance to wait 1.5 when the subscription result is empty

in @subtn_results because the service up has not arrived yet.

[Thuan] Yes, it will give a chance if adest not yet UP any.

My patch still keep that chance as origin code.

But I think I need reduce timeout for adest down timer, I am verifying 
this change.


mds  subscribe >

mds  sends message A x

mds wait 1.5 sec

mds <--- service up 

mds  sends message A >

So the 1.5 sec time is for early phase of waiting service up.

    } else if (sub_info->tmr_flag != true) {

        if ((MDS_SENDTYPE_RSP == req->i_sendtype) ||

            (MDS_SENDTYPE_RRSP == req->i_sendtype)) {

            time_wait = true;

            m_MDS_LOG_INFO(

                "MDS_SND_RCV:Disc queue: Subscr exists no timer

running: Waiting for some time\n");

-> I think at this line, it means: the SUBSCRIPTION_TMR has timeout, and

mds is sending RSP/RRSP which means mds should have received the

*request* message (?), so mds wants to wait for another 1.5 second for

service_up to create the subscription result in database.

The problem in this ticket hit 1.5 because the service down has already

come and mds removed the subscription result item, now the ntf, imm...

sends msg with a death adest, and mds now it thinks it is waiting for a

service up to come as at the early phase, so it waits. Both two

scenarios can be distinguished themselves to avoid to wait 1.5 secs for

the latter case I think.

Thanks

Minh

On 22/10/19 9:50 pm, Tran Thuan wrote:

> Hi Vu,

>

> Thanks for additional comments.

> I reply your concerns inline below.

>

> Best Regards,

> ThuanTr

>

> -Original Message-

> From: Nguyen Minh Vu >


> Sent: Tuesday, October 22, 2019 5:28 PM

> To: thuan.tran >; 'Minh Chau' 
mailto:minh.c...@dektech.com.au>>; 
hans.nordeb...@ericsson.com ; 
gary@dektech.com.au 


> Cc: opensaf-devel@lists.sourceforge.net 



> Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest 
already down to send response message [#3102]


>

> Hi Thuan,

>

> I have additional comments below.

>

> Regards, Vu

>

> On 10/22/19 7:14 AM, thuan.tran wrote:

>> - When sending response message which Adest not exist (already down)

>> current MDS try to wait for 1.5 seconds before conclude no route to

>> send response message.

>>

>> - There are 2 scenarios may have:

>> UP -> DOWN -> receive SNDRSP -> response timeout after 1.5s

>> UP -> receive SNDRSP -> DOWN -> response timeout after 1.5s

>>

>> - With this change, MDS will not waste for 1.5s which can cause trouble

>> for higher layer services, e.g: 

[devel] [PATCH 0/1] Review Request for nid: Remove the absolute path of tipc command [#3105] V2

2019-10-22 Thread thien.m.huynh
Summary: nid: Remove the absolute path of tipc command [#3105]
Review request for Ticket(s): 3105
Peer Reviewer(s): Thuan, Vu
Pull request to: Vu
Affected branch(es): develop
Development branch: ticket-3105
Base revision: 9b1c588a4f244cb34a304b03c7568e04fa6cd8a7
Personal repository: git://git.code.sf.net/u/thienhuynh/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   y


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision 917e219495f0646744774f3c7486f8f471e4eab4
Author: thien.m.huynh 
Date:   Wed, 23 Oct 2019 11:51:36 +0700

nid: Remove the absolute path of tipc command [#3105]



Complete diffstat:
--
 scripts/tipc-config | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK from reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] nid: Remove the absolute path of tipc command [#3105]

2019-10-22 Thread thien.m.huynh
---
 scripts/tipc-config | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/scripts/tipc-config b/scripts/tipc-config
index 34eb9a5..6a20275 100755
--- a/scripts/tipc-config
+++ b/scripts/tipc-config
@@ -36,10 +36,12 @@ EOF
 exit 1
 fi
 
+tipc=$(which tipc 2> /dev/null)
+
 while [ $# -gt 0 ]; do
 case "$1" in
-addr)
-   addr=$(/sbin/tipc node get address)
+   addr=$("$tipc" node get address)
hex_pattern="^[0-9a-fA-F]+$"
if [[ $addr =~ $hex_pattern ]]; then
dec_addr=$((16#$addr))
@@ -53,36 +55,36 @@ while [ $# -gt 0 ]; do
echo "node address: $addr"
;;
-a=*)
-   /sbin/tipc node set address "$(echo "$1" | cut -d= -f2)"
+   "$tipc" node set address "$(echo "$1" | cut -d= -f2)"
;;
-b)
echo "Bearers:"
-   /sbin/tipc bearer list
+   "$tipc" bearer list
;;
-bd=*)
-   /sbin/tipc bearer disable media eth device "$(echo "$1" | cut -d: 
-f2)"
+   "$tipc" bearer disable media eth device "$(echo "$1" | cut -d: -f2)"
;;
-be=*)
-   /sbin/tipc bearer enable media eth device "$(echo "$1" | cut -d: 
-f2)"
+   "$tipc" bearer enable media eth device "$(echo "$1" | cut -d: -f2)"
;;
-l)
echo "Links:"
-   /sbin/tipc link list
+   "$tipc" link list
;;
-lt=*)
tolerance=$(echo "$1" | cut -d= -f2)
link=$(echo "$tolerance" | cut -d/ -f1)
value=$(echo "$tolerance" | cut -d/ -f2)
-   /sbin/tipc link set tolerance "$value" link "$link"
+   "$tipc" link set tolerance "$value" link "$link"
;;
-netid)
-   echo "current network id: $(/sbin/tipc node get netid)"
+   echo "current network id: $("$tipc" node get netid)"
;;
-netid=*)
-   /sbin/tipc node set netid "$(echo "$1" | cut -d= -f2)"
+   "$tipc" node set netid "$(echo "$1" | cut -d= -f2)"
;;
-nt)
-   /sbin/tipc nametable show
+   "$tipc" nametable show
;;
*)
echo "$0: unrecognized option '$1'" 1>&2
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to send response message [#3102]

2019-10-22 Thread Tran Thuan
Hi Minh,

 

Thanks for comments. See my response inline.

Btw, I am preparing to send out new patch, I think I found an issue in current 
patch.

 

Best Regards,

ThuanTr

 

-Original Message-
From: Minh Hon Chau  
Sent: Wednesday, October 23, 2019 5:52 AM
To: Tran Thuan ; 'Nguyen Minh Vu' 
; hans.nordeb...@ericsson.com; 
gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to 
send response message [#3102]

 

Hi Thuan,

 

I wonder the patch would work in the same reproduced steps if the both 

adests have subscribed each other more than 2 services. The svc_cnt will 

be greater than 1 until it is the last service down event. I think 

that's why mds has the database @subtn_results, in which each item is an 

adest associated with a service id separately.

[Thuan] We can understand that adest still alive, then go with origin flow 
(wait 1.5s).

But can a process send SNDRSP then mds unregister? I think it cannot, because 
it’s in SNDRSP (blocking)

The scenario of this ticket happens for the process terminated/crash.

 

The problem originally resides at the services code e.g ntf, imm... 

where the threads structure between mds receiving thread and main thread 

cause a race condition. Thus the service sends a message with a death 

adest which is removed from mds database, that confuses mds and hit 1.5 

secs wait time.

 

If I read the code correctly, the 1.5 wait time is for another case, it 

gives another chance to wait 1.5 when the subscription result is empty 

in @subtn_results because the service up has not arrived yet.

[Thuan] Yes, it will give a chance if adest not yet UP any.

My patch still keep that chance as origin code.

But I think I need reduce timeout for adest down timer, I am verifying this 
change.

 

mds  subscribe >

 

mds  sends message A x

 

mds wait 1.5 sec

 

mds <--- service up 

 

mds  sends message A >

 

So the 1.5 sec time is for early phase of waiting service up.

 

} else if (sub_info->tmr_flag != true) {

if ((MDS_SENDTYPE_RSP == req->i_sendtype) ||

(MDS_SENDTYPE_RRSP == req->i_sendtype)) {

time_wait = true;

m_MDS_LOG_INFO(

"MDS_SND_RCV:Disc queue: Subscr exists no timer 

running: Waiting for some time\n");

 

-> I think at this line, it means: the SUBSCRIPTION_TMR has timeout, and 

mds is sending RSP/RRSP which means mds should have received the 

*request* message (?), so mds wants to wait for another 1.5 second for 

service_up to create the subscription result in database.

 

The problem in this ticket hit 1.5 because the service down has already 

come and mds removed the subscription result item, now the ntf, imm... 

sends msg with a death adest, and mds now it thinks it is waiting for a 

service up to come as at the early phase, so it waits. Both two 

scenarios can be distinguished themselves to avoid to wait 1.5 secs for 

the latter case I think.

 

Thanks

 

Minh

 

On 22/10/19 9:50 pm, Tran Thuan wrote:

> Hi Vu,

> 

> Thanks for additional comments.

> I reply your concerns inline below.

> 

> Best Regards,

> ThuanTr

> 

> -Original Message-

> From: Nguyen Minh Vu <  
> vu.m.ngu...@dektech.com.au>

> Sent: Tuesday, October 22, 2019 5:28 PM

> To: thuan.tran <  
> thuan.t...@dektech.com.au>; 'Minh Chau' <  
> minh.c...@dektech.com.au>;   
> hans.nordeb...@ericsson.com;   
> gary@dektech.com.au

> Cc:   
> opensaf-devel@lists.sourceforge.net

> Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to 
> send response message [#3102]

> 

> Hi Thuan,

> 

> I have additional comments below.

> 

> Regards, Vu

> 

> On 10/22/19 7:14 AM, thuan.tran wrote:

>> - When sending response message which Adest not exist (already down)

>> current MDS try to wait for 1.5 seconds before conclude no route to

>> send response message.

>> 

>> - There are 2 scenarios may have:

>> UP -> DOWN -> receive SNDRSP -> response timeout after 1.5s

>> UP -> receive SNDRSP -> DOWN -> response timeout after 1.5s

>> 

>> - With this change, MDS will not waste for 1.5s which can cause trouble

>> for higher layer services, e.g: ntf, imm, etc...

>> ---

>>src/mds/mds_c_api.c | 70 -

>>src/mds/mds_c_sndrcv.c  | 52 --

>>src/mds/mds_core.h  | 25 +--

>>src/mds/mds_dt2c.h  |  2 +-

>>src/mds/mds_dt_common.c | 22 -

>>5 files changed, 162 insertions(+), 9 deletions(-)

>> 

>> diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c

>> index 132555b8e..5dd30c536 100644

>> --- a/src/mds/mds_c_api.c

>> 

Re: [devel] [PATCH 1/1] nid: Remove the absolute path of tipc command [#3105]

2019-10-22 Thread Thien Minh Huynh
Hi Vu,

Thanks for your comments.

Best Regards,
ThienHuynh

-Original Message-
From: Nguyen Minh Vu  
Sent: Wednesday, October 23, 2019 9:58 AM
To: thien.m.huynh ; thuan.t...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] nid: Remove the absolute path of tipc command [#3105]

Hi Thien,

Ack with comments.

Regards, Vu

On 10/23/19 9:42 AM, thien.m.huynh wrote:
> ---
>   scripts/tipc-config | 22 --
>   1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/scripts/tipc-config b/scripts/tipc-config index 
> 34eb9a5..b7c6cef 100755
> --- a/scripts/tipc-config
> +++ b/scripts/tipc-config
> @@ -36,10 +36,12 @@ EOF
>   exit 1
>   fi
>   
> +tipc=$(which tipc 2> /dev/null)
> +
>   while [ $# -gt 0 ]; do
>   case "$1" in
>   -addr)
> - addr=$(/sbin/tipc node get address)
> + addr=$($tipc node get address)
[Vu] Use double quotes around $tipc to avoid the case that having
space(s) in the tipc path.
So, use addr=$("$tipc" node get address). This comment is also applicable for 
below changes.
>   hex_pattern="^[0-9a-fA-F]+$"
>   if [[ $addr =~ $hex_pattern ]]; then
>   dec_addr=$((16#$addr))
> @@ -53,36 +55,36 @@ while [ $# -gt 0 ]; do
>   echo "node address: $addr"
>   ;;
>   -a=*)
> - /sbin/tipc node set address "$(echo "$1" | cut -d= -f2)"
> + $tipc node set address "$(echo "$1" | cut -d= -f2)"
>   ;;
>   -b)
>   echo "Bearers:"
> - /sbin/tipc bearer list
> + $tipc bearer list
>   ;;
>   -bd=*)
> - /sbin/tipc bearer disable media eth device "$(echo "$1" | cut -d: 
> -f2)"
> + $tipc bearer disable media eth device "$(echo "$1" | cut -d: -f2)"
>   ;;
>   -be=*)
> - /sbin/tipc bearer enable media eth device "$(echo "$1" | cut -d: 
> -f2)"
> + $tipc bearer enable media eth device "$(echo "$1" | cut -d: -f2)"
>   ;;
>   -l)
>   echo "Links:"
> - /sbin/tipc link list
> + $tipc link list
>   ;;
>   -lt=*)
>   tolerance=$(echo "$1" | cut -d= -f2)
>   link=$(echo "$tolerance" | cut -d/ -f1)
>   value=$(echo "$tolerance" | cut -d/ -f2)
> - /sbin/tipc link set tolerance "$value" link "$link"
> + $tipc link set tolerance "$value" link "$link"
>   ;;
>   -netid)
> - echo "current network id: $(/sbin/tipc node get netid)"
> + echo "current network id: $($tipc node get netid)"
>   ;;
>   -netid=*)
> - /sbin/tipc node set netid "$(echo "$1" | cut -d= -f2)"
> + $tipc node set netid "$(echo "$1" | cut -d= -f2)"
>   ;;
>   -nt)
> - /sbin/tipc nametable show
> + $tipc nametable show
>   ;;
>   *)
>   echo "$0: unrecognized option '$1'" 1>&2




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] nid: Remove the absolute path of tipc command [#3105]

2019-10-22 Thread Nguyen Minh Vu

Hi Thien,

Ack with comments.

Regards, Vu

On 10/23/19 9:42 AM, thien.m.huynh wrote:

---
  scripts/tipc-config | 22 --
  1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/scripts/tipc-config b/scripts/tipc-config
index 34eb9a5..b7c6cef 100755
--- a/scripts/tipc-config
+++ b/scripts/tipc-config
@@ -36,10 +36,12 @@ EOF
  exit 1
  fi
  
+tipc=$(which tipc 2> /dev/null)

+
  while [ $# -gt 0 ]; do
  case "$1" in
-addr)
-   addr=$(/sbin/tipc node get address)
+   addr=$($tipc node get address)
[Vu] Use double quotes around $tipc to avoid the case that having 
space(s) in the tipc path.
So, use addr=$("$tipc" node get address). This comment is also 
applicable for below changes.

hex_pattern="^[0-9a-fA-F]+$"
if [[ $addr =~ $hex_pattern ]]; then
dec_addr=$((16#$addr))
@@ -53,36 +55,36 @@ while [ $# -gt 0 ]; do
echo "node address: $addr"
;;
-a=*)
-   /sbin/tipc node set address "$(echo "$1" | cut -d= -f2)"
+   $tipc node set address "$(echo "$1" | cut -d= -f2)"
;;
-b)
echo "Bearers:"
-   /sbin/tipc bearer list
+   $tipc bearer list
;;
-bd=*)
-   /sbin/tipc bearer disable media eth device "$(echo "$1" | cut -d: 
-f2)"
+   $tipc bearer disable media eth device "$(echo "$1" | cut -d: -f2)"
;;
-be=*)
-   /sbin/tipc bearer enable media eth device "$(echo "$1" | cut -d: 
-f2)"
+   $tipc bearer enable media eth device "$(echo "$1" | cut -d: -f2)"
;;
-l)
echo "Links:"
-   /sbin/tipc link list
+   $tipc link list
;;
-lt=*)
tolerance=$(echo "$1" | cut -d= -f2)
link=$(echo "$tolerance" | cut -d/ -f1)
value=$(echo "$tolerance" | cut -d/ -f2)
-   /sbin/tipc link set tolerance "$value" link "$link"
+   $tipc link set tolerance "$value" link "$link"
;;
-netid)
-   echo "current network id: $(/sbin/tipc node get netid)"
+   echo "current network id: $($tipc node get netid)"
;;
-netid=*)
-   /sbin/tipc node set netid "$(echo "$1" | cut -d= -f2)"
+   $tipc node set netid "$(echo "$1" | cut -d= -f2)"
;;
-nt)
-   /sbin/tipc nametable show
+   $tipc nametable show
;;
*)
echo "$0: unrecognized option '$1'" 1>&2




___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 1/1] nid: Remove the absolute path of tipc command [#3105]

2019-10-22 Thread thien.m.huynh
---
 scripts/tipc-config | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/scripts/tipc-config b/scripts/tipc-config
index 34eb9a5..b7c6cef 100755
--- a/scripts/tipc-config
+++ b/scripts/tipc-config
@@ -36,10 +36,12 @@ EOF
 exit 1
 fi
 
+tipc=$(which tipc 2> /dev/null)
+
 while [ $# -gt 0 ]; do
 case "$1" in
-addr)
-   addr=$(/sbin/tipc node get address)
+   addr=$($tipc node get address)
hex_pattern="^[0-9a-fA-F]+$"
if [[ $addr =~ $hex_pattern ]]; then
dec_addr=$((16#$addr))
@@ -53,36 +55,36 @@ while [ $# -gt 0 ]; do
echo "node address: $addr"
;;
-a=*)
-   /sbin/tipc node set address "$(echo "$1" | cut -d= -f2)"
+   $tipc node set address "$(echo "$1" | cut -d= -f2)"
;;
-b)
echo "Bearers:"
-   /sbin/tipc bearer list
+   $tipc bearer list
;;
-bd=*)
-   /sbin/tipc bearer disable media eth device "$(echo "$1" | cut -d: 
-f2)"
+   $tipc bearer disable media eth device "$(echo "$1" | cut -d: -f2)"
;;
-be=*)
-   /sbin/tipc bearer enable media eth device "$(echo "$1" | cut -d: 
-f2)"
+   $tipc bearer enable media eth device "$(echo "$1" | cut -d: -f2)"
;;
-l)
echo "Links:"
-   /sbin/tipc link list
+   $tipc link list
;;
-lt=*)
tolerance=$(echo "$1" | cut -d= -f2)
link=$(echo "$tolerance" | cut -d/ -f1)
value=$(echo "$tolerance" | cut -d/ -f2)
-   /sbin/tipc link set tolerance "$value" link "$link"
+   $tipc link set tolerance "$value" link "$link"
;;
-netid)
-   echo "current network id: $(/sbin/tipc node get netid)"
+   echo "current network id: $($tipc node get netid)"
;;
-netid=*)
-   /sbin/tipc node set netid "$(echo "$1" | cut -d= -f2)"
+   $tipc node set netid "$(echo "$1" | cut -d= -f2)"
;;
-nt)
-   /sbin/tipc nametable show
+   $tipc nametable show
;;
*)
echo "$0: unrecognized option '$1'" 1>&2
-- 
2.7.4



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


[devel] [PATCH 0/1] Review Request for nid: Remove the absolute path of tipc command [#3105]

2019-10-22 Thread thien.m.huynh
Summary: nid: Remove the absolute path of tipc command [#3105]
Review request for Ticket(s): 3105
Peer Reviewer(s): Thuan, Vu
Pull request to: Vu
Affected branch(es): develop
Development branch: ticket-3105
Base revision: 9b1c588a4f244cb34a304b03c7568e04fa6cd8a7
Personal repository: git://git.code.sf.net/u/thienhuynh/review


Impacted area   Impact y/n

 Docsn
 Build systemn
 RPM/packaging   n
 Configuration files n
 Startup scripts n
 SAF servicesn
 OpenSAF servicesn
 Core libraries  n
 Samples n
 Tests   n
 Other   y


Comments (indicate scope for each "y" above):
-
*** EXPLAIN/COMMENT THE PATCH SERIES HERE ***

revision ea877de5cf8d891b45eb7a26817099df975f09cc
Author: thien.m.huynh 
Date:   Tue, 22 Oct 2019 17:25:36 +0700

nid: Remove the absolute path of tipc command [#3105]



Complete diffstat:
--
 scripts/tipc-config | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)


Testing Commands:
-
N/A

Testing, Expected Results:
--
N/A

Conditions of Submission:
-
ACK from reviewers

Arch  Built StartedLinux distro
---
mipsn  n
mips64  n  n
x86 n  n
x86_64  y  y
powerpc n  n
powerpc64   n  n


Reviewer Checklist:
---
[Submitters: make sure that your review doesn't trigger any checkmarks!]


Your checkin has not passed review because (see checked entries):

___ Your RR template is generally incomplete; it has too many blank entries
that need proper data filled in.

___ You have failed to nominate the proper persons for review and push.

___ Your patches do not have proper short+long header

___ You have grammar/spelling in your header that is unacceptable.

___ You have exceeded a sensible line length in your headers/comments/text.

___ You have failed to put in a proper Trac Ticket # into your commits.

___ You have incorrectly put/left internal data in your comments/files
(i.e. internal bug tracking tool IDs, product names etc)

___ You have not given any evidence of testing beyond basic build tests.
Demonstrate some level of runtime or other sanity testing.

___ You have ^M present in some of your files. These have to be removed.

___ You have needlessly changed whitespace or added whitespace crimes
like trailing spaces, or spaces before tabs.

___ You have mixed real technical changes with whitespace and other
cosmetic code cleanup changes. These have to be separate commits.

___ You need to refactor your submission into logical chunks; there is
too much content into a single commit.

___ You have extraneous garbage in your review (merge commits etc)

___ You have giant attachments which should never have been sent;
Instead you should place your content in a public tree to be pulled.

___ You have too many commits attached to an e-mail; resend as threaded
commits, or place in a public tree for a pull.

___ You have resent this content multiple times without a clear indication
of what has changed between each re-send.

___ You have failed to adequately and individually address all of the
comments and change requests that were proposed in the initial review.

___ You have a misconfigured ~/.gitconfig file (i.e. user.name, user.email etc)

___ Your computer have a badly configured date and time; confusing the
the threaded patch review.

___ Your changes affect IPC mechanism, and you don't present any results
for in-service upgradability test.

___ Your changes affect user manual and documentation, your patch series
do not contain the patch that updates the Doxygen manual.



___
Opensaf-devel mailing list
Opensaf-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-devel


Re: [devel] [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to send response message [#3102]

2019-10-22 Thread Minh Hon Chau

Hi Thuan,

I wonder the patch would work in the same reproduced steps if the both 
adests have subscribed each other more than 2 services. The svc_cnt will 
be greater than 1 until it is the last service down event. I think 
that's why mds has the database @subtn_results, in which each item is an 
adest associated with a service id separately.


The problem originally resides at the services code e.g ntf, imm... 
where the threads structure between mds receiving thread and main thread 
cause a race condition. Thus the service sends a message with a death 
adest which is removed from mds database, that confuses mds and hit 1.5 
secs wait time.


If I read the code correctly, the 1.5 wait time is for another case, it 
gives another chance to wait 1.5 when the subscription result is empty 
in @subtn_results because the service up has not arrived yet.


mds  subscribe >

mds  sends message A x

mds wait 1.5 sec

mds <--- service up 

mds  sends message A >

So the 1.5 sec time is for early phase of waiting service up.

    } else if (sub_info->tmr_flag != true) {
        if ((MDS_SENDTYPE_RSP == req->i_sendtype) ||
            (MDS_SENDTYPE_RRSP == req->i_sendtype)) {
            time_wait = true;
            m_MDS_LOG_INFO(
                "MDS_SND_RCV:Disc queue: Subscr exists no timer 
running: Waiting for some time\n");


-> I think at this line, it means: the SUBSCRIPTION_TMR has timeout, and 
mds is sending RSP/RRSP which means mds should have received the 
*request* message (?), so mds wants to wait for another 1.5 second for 
service_up to create the subscription result in database.


The problem in this ticket hit 1.5 because the service down has already 
come and mds removed the subscription result item, now the ntf, imm... 
sends msg with a death adest, and mds now it thinks it is waiting for a 
service up to come as at the early phase, so it waits. Both two 
scenarios can be distinguished themselves to avoid to wait 1.5 secs for 
the latter case I think.


Thanks

Minh

On 22/10/19 9:50 pm, Tran Thuan wrote:

Hi Vu,

Thanks for additional comments.
I reply your concerns inline below.

Best Regards,
ThuanTr

-Original Message-
From: Nguyen Minh Vu 
Sent: Tuesday, October 22, 2019 5:28 PM
To: thuan.tran ; 'Minh Chau' 
; hans.nordeb...@ericsson.com; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to 
send response message [#3102]

Hi Thuan,

I have additional comments below.

Regards, Vu

On 10/22/19 7:14 AM, thuan.tran wrote:

- When sending response message which Adest not exist (already down)
current MDS try to wait for 1.5 seconds before conclude no route to
send response message.

- There are 2 scenarios may have:
UP -> DOWN -> receive SNDRSP -> response timeout after 1.5s
UP -> receive SNDRSP -> DOWN -> response timeout after 1.5s

- With this change, MDS will not waste for 1.5s which can cause trouble
for higher layer services, e.g: ntf, imm, etc...
---
   src/mds/mds_c_api.c | 70 -
   src/mds/mds_c_sndrcv.c  | 52 --
   src/mds/mds_core.h  | 25 +--
   src/mds/mds_dt2c.h  |  2 +-
   src/mds/mds_dt_common.c | 22 -
   5 files changed, 162 insertions(+), 9 deletions(-)

diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c
index 132555b8e..5dd30c536 100644
--- a/src/mds/mds_c_api.c
+++ b/src/mds/mds_c_api.c
@@ -1900,6 +1900,32 @@ uint32_t mds_mcm_svc_up(PW_ENV_ID pwe_id, MDS_SVC_ID 
svc_id, V_DEST_RL role,
   
   	/*** Validation for SCOPE **/
   
+	if (adest != m_MDS_GET_ADEST) {

+   MDS_ADEST_INFO *adest_info =
+   (MDS_ADEST_INFO *)ncs_patricia_tree_get(
+   _mds_mcm_cb->adest_list,
+   (uint8_t *));
+   if (!adest_info) {
+   /* Add adest to adest list */
+   adest_info = m_MMGR_ALLOC_ADEST_INFO;
+   memset(adest_info, 0, sizeof(MDS_ADEST_INFO));
+   adest_info->adest = adest;
+   adest_info->node.key_info =
+   (uint8_t *)_info->adest;
+   adest_info->svc_cnt = 1;
+   adest_info->tmr_start = false;
+   ncs_patricia_tree_add(
+   _mds_mcm_cb->adest_list,
+   (NCS_PATRICIA_NODE *)adest_info);
+   m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
+   " svc_cnt=%u", adest, adest_info->svc_cnt);
+   } else {
+   adest_info->svc_cnt++;
+   m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
+   " svc_cnt=%u", adest, adest_info->svc_cnt);
+   }
+   }
+
status = 

Re: [devel] [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to send response message [#3102]

2019-10-22 Thread Tran Thuan
Hi Vu,

Thanks for additional comments.
I reply your concerns inline below.

Best Regards,
ThuanTr

-Original Message-
From: Nguyen Minh Vu  
Sent: Tuesday, October 22, 2019 5:28 PM
To: thuan.tran ; 'Minh Chau' 
; hans.nordeb...@ericsson.com; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to 
send response message [#3102]

Hi Thuan,

I have additional comments below.

Regards, Vu

On 10/22/19 7:14 AM, thuan.tran wrote:
> - When sending response message which Adest not exist (already down)
> current MDS try to wait for 1.5 seconds before conclude no route to
> send response message.
>
> - There are 2 scenarios may have:
> UP -> DOWN -> receive SNDRSP -> response timeout after 1.5s
> UP -> receive SNDRSP -> DOWN -> response timeout after 1.5s
>
> - With this change, MDS will not waste for 1.5s which can cause trouble
> for higher layer services, e.g: ntf, imm, etc...
> ---
>   src/mds/mds_c_api.c | 70 -
>   src/mds/mds_c_sndrcv.c  | 52 --
>   src/mds/mds_core.h  | 25 +--
>   src/mds/mds_dt2c.h  |  2 +-
>   src/mds/mds_dt_common.c | 22 -
>   5 files changed, 162 insertions(+), 9 deletions(-)
>
> diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c
> index 132555b8e..5dd30c536 100644
> --- a/src/mds/mds_c_api.c
> +++ b/src/mds/mds_c_api.c
> @@ -1900,6 +1900,32 @@ uint32_t mds_mcm_svc_up(PW_ENV_ID pwe_id, MDS_SVC_ID 
> svc_id, V_DEST_RL role,
>   
>   /*** Validation for SCOPE **/
>   
> + if (adest != m_MDS_GET_ADEST) {
> + MDS_ADEST_INFO *adest_info =
> + (MDS_ADEST_INFO *)ncs_patricia_tree_get(
> + _mds_mcm_cb->adest_list,
> + (uint8_t *));
> + if (!adest_info) {
> + /* Add adest to adest list */
> + adest_info = m_MMGR_ALLOC_ADEST_INFO;
> + memset(adest_info, 0, sizeof(MDS_ADEST_INFO));
> + adest_info->adest = adest;
> + adest_info->node.key_info =
> + (uint8_t *)_info->adest;
> + adest_info->svc_cnt = 1;
> + adest_info->tmr_start = false;
> + ncs_patricia_tree_add(
> + _mds_mcm_cb->adest_list,
> + (NCS_PATRICIA_NODE *)adest_info);
> + m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
> + " svc_cnt=%u", adest, adest_info->svc_cnt);
> + } else {
> + adest_info->svc_cnt++;
> + m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
> + " svc_cnt=%u", adest, adest_info->svc_cnt);
> + }
> + }
> +
>   status = mds_get_subtn_res_tbl_by_adest(local_svc_hdl, svc_id, vdest_id,
>   adest, _subtn_result_info);
>   
> @@ -3571,6 +3597,24 @@ uint32_t mds_mcm_svc_down(PW_ENV_ID pwe_id, MDS_SVC_ID 
> svc_id, V_DEST_RL role,
>   /* Discard : Getting down before getting up */
>   } else { /* Entry exist in subscription result table */
>   
> + /* If adest exist and no sndrsp, start a timer */
> + MDS_ADEST_INFO *adest_info =
> + (MDS_ADEST_INFO *)ncs_patricia_tree_get(
> + _mds_mcm_cb->adest_list,
> + (uint8_t *));
> + if (adest_info) {
> + adest_info->svc_cnt--;
> + if (adest_info->svc_cnt == 0 &&
> + adest_info->sndrsp_cnt == 0) {
> + m_MDS_LOG_INFO("MCM:API: Adest=%" PRIu64
> + " down timer start", adest);
> + if (adest_info->tmr_start == false) {
> + adest_info->tmr_start = true;
> + start_mds_down_tmr(adest, svc_id);
> + }
> + }
> + }
> +
>   if (vdest_id == m_VDEST_ID_FOR_ADEST_ENTRY) {
>   status = mds_subtn_res_tbl_del(
>   local_svc_hdl, svc_id, vdest_id, adest,
> @@ -4956,6 +5000,17 @@ uint32_t mds_mcm_init(void)
>   return NCSCC_RC_FAILURE;
>   }
>   
> + /* ADEST TREE */
> + memset(_tree_params, 0, sizeof(NCS_PATRICIA_PARAMS));
> + pat_tree_params.key_size = sizeof(MDS_DEST);
> + if (NCSCC_RC_SUCCESS !=
> + ncs_patricia_tree_init(_mds_mcm_cb->adest_list,
> +_tree_params)) {
> + m_MDS_LOG_ERR(
> + "MCM:API: patricia_tree_init: adest :failure, L 
> mds_mcm_init");
> + return NCSCC_RC_FAILURE;
> + }
> +

Re: [devel] [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to send response message [#3102]

2019-10-22 Thread Nguyen Minh Vu

Hi Thuan,

I have additional comments below.

Regards, Vu

On 10/22/19 7:14 AM, thuan.tran wrote:

- When sending response message which Adest not exist (already down)
current MDS try to wait for 1.5 seconds before conclude no route to
send response message.

- There are 2 scenarios may have:
UP -> DOWN -> receive SNDRSP -> response timeout after 1.5s
UP -> receive SNDRSP -> DOWN -> response timeout after 1.5s

- With this change, MDS will not waste for 1.5s which can cause trouble
for higher layer services, e.g: ntf, imm, etc...
---
  src/mds/mds_c_api.c | 70 -
  src/mds/mds_c_sndrcv.c  | 52 --
  src/mds/mds_core.h  | 25 +--
  src/mds/mds_dt2c.h  |  2 +-
  src/mds/mds_dt_common.c | 22 -
  5 files changed, 162 insertions(+), 9 deletions(-)

diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c
index 132555b8e..5dd30c536 100644
--- a/src/mds/mds_c_api.c
+++ b/src/mds/mds_c_api.c
@@ -1900,6 +1900,32 @@ uint32_t mds_mcm_svc_up(PW_ENV_ID pwe_id, MDS_SVC_ID 
svc_id, V_DEST_RL role,
  
  	/*** Validation for SCOPE **/
  
+	if (adest != m_MDS_GET_ADEST) {

+   MDS_ADEST_INFO *adest_info =
+   (MDS_ADEST_INFO *)ncs_patricia_tree_get(
+   _mds_mcm_cb->adest_list,
+   (uint8_t *));
+   if (!adest_info) {
+   /* Add adest to adest list */
+   adest_info = m_MMGR_ALLOC_ADEST_INFO;
+   memset(adest_info, 0, sizeof(MDS_ADEST_INFO));
+   adest_info->adest = adest;
+   adest_info->node.key_info =
+   (uint8_t *)_info->adest;
+   adest_info->svc_cnt = 1;
+   adest_info->tmr_start = false;
+   ncs_patricia_tree_add(
+   _mds_mcm_cb->adest_list,
+   (NCS_PATRICIA_NODE *)adest_info);
+   m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
+   " svc_cnt=%u", adest, adest_info->svc_cnt);
+   } else {
+   adest_info->svc_cnt++;
+   m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
+   " svc_cnt=%u", adest, adest_info->svc_cnt);
+   }
+   }
+
status = mds_get_subtn_res_tbl_by_adest(local_svc_hdl, svc_id, vdest_id,
adest, _subtn_result_info);
  
@@ -3571,6 +3597,24 @@ uint32_t mds_mcm_svc_down(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id, V_DEST_RL role,

/* Discard : Getting down before getting up */
} else { /* Entry exist in subscription result table */
  
+		/* If adest exist and no sndrsp, start a timer */

+   MDS_ADEST_INFO *adest_info =
+   (MDS_ADEST_INFO *)ncs_patricia_tree_get(
+   _mds_mcm_cb->adest_list,
+   (uint8_t *));
+   if (adest_info) {
+   adest_info->svc_cnt--;
+   if (adest_info->svc_cnt == 0 &&
+   adest_info->sndrsp_cnt == 0) {
+   m_MDS_LOG_INFO("MCM:API: Adest=%" PRIu64
+   " down timer start", adest);
+   if (adest_info->tmr_start == false) {
+   adest_info->tmr_start = true;
+   start_mds_down_tmr(adest, svc_id);
+   }
+   }
+   }
+
if (vdest_id == m_VDEST_ID_FOR_ADEST_ENTRY) {
status = mds_subtn_res_tbl_del(
local_svc_hdl, svc_id, vdest_id, adest,
@@ -4956,6 +5000,17 @@ uint32_t mds_mcm_init(void)
return NCSCC_RC_FAILURE;
}
  
+	/* ADEST TREE */

+   memset(_tree_params, 0, sizeof(NCS_PATRICIA_PARAMS));
+   pat_tree_params.key_size = sizeof(MDS_DEST);
+   if (NCSCC_RC_SUCCESS !=
+   ncs_patricia_tree_init(_mds_mcm_cb->adest_list,
+  _tree_params)) {
+   m_MDS_LOG_ERR(
+   "MCM:API: patricia_tree_init: adest :failure, L 
mds_mcm_init");
+   return NCSCC_RC_FAILURE;
+   }
+
/* SERVICE TREE */
memset(_tree_params, 0, sizeof(NCS_PATRICIA_PARAMS));
pat_tree_params.key_size = sizeof(MDS_SVC_HDL);
@@ -4966,7 +5021,12 @@ uint32_t mds_mcm_init(void)
if (NCSCC_RC_SUCCESS !=
ncs_patricia_tree_destroy(_mds_mcm_cb->vdest_list)) {
m_MDS_LOG_ERR(
-   "MCM:API: patricia_tree_destroy: service :failure, L 
mds_mcm_init");
+   "MCM:API: patricia_tree_destroy: vdest 

Re: [devel] [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to send response message [#3102]

2019-10-22 Thread Tran Thuan
Hi Vu,

Thanks for comments. I reply my answer inline.

Best Regards,
ThuanTr

-Original Message-
From: Nguyen Minh Vu  
Sent: Tuesday, October 22, 2019 4:42 PM
To: thuan.tran ; 'Minh Chau' 
; hans.nordeb...@ericsson.com; gary@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to 
send response message [#3102]

Hi Thuan,

I have few comments below.

Regards, Vu

On 10/22/19 7:14 AM, thuan.tran wrote:
> - When sending response message which Adest not exist (already down)
> current MDS try to wait for 1.5 seconds before conclude no route to
> send response message.
>
> - There are 2 scenarios may have:
> UP -> DOWN -> receive SNDRSP -> response timeout after 1.5s
> UP -> receive SNDRSP -> DOWN -> response timeout after 1.5s
>
> - With this change, MDS will not waste for 1.5s which can cause trouble
> for higher layer services, e.g: ntf, imm, etc...
> ---
>   src/mds/mds_c_api.c | 70 -
>   src/mds/mds_c_sndrcv.c  | 52 --
>   src/mds/mds_core.h  | 25 +--
>   src/mds/mds_dt2c.h  |  2 +-
>   src/mds/mds_dt_common.c | 22 -
>   5 files changed, 162 insertions(+), 9 deletions(-)
>
> diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c
> index 132555b8e..5dd30c536 100644
> --- a/src/mds/mds_c_api.c
> +++ b/src/mds/mds_c_api.c
> @@ -1900,6 +1900,32 @@ uint32_t mds_mcm_svc_up(PW_ENV_ID pwe_id, MDS_SVC_ID 
> svc_id, V_DEST_RL role,
>   
>   /*** Validation for SCOPE **/
>   
> + if (adest != m_MDS_GET_ADEST) {
> + MDS_ADEST_INFO *adest_info =
> + (MDS_ADEST_INFO *)ncs_patricia_tree_get(
> + _mds_mcm_cb->adest_list,
> + (uint8_t *));
> + if (!adest_info) {
> + /* Add adest to adest list */
> + adest_info = m_MMGR_ALLOC_ADEST_INFO;
> + memset(adest_info, 0, sizeof(MDS_ADEST_INFO));
> + adest_info->adest = adest;
> + adest_info->node.key_info =
> + (uint8_t *)_info->adest;
> + adest_info->svc_cnt = 1;
> + adest_info->tmr_start = false;
> + ncs_patricia_tree_add(
> + _mds_mcm_cb->adest_list,
> + (NCS_PATRICIA_NODE *)adest_info);
> + m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
> + " svc_cnt=%u", adest, adest_info->svc_cnt);
> + } else {
> + adest_info->svc_cnt++;
> + m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
> + " svc_cnt=%u", adest, adest_info->svc_cnt);
> + }
> + }
[Vu] This new database, adest_list, is shared b/w internal osaf_mds 
thread and mds's user thread, hence should be protected by mutex.
[Thuan] It's protected by gl_mds_library_mutex
> +
>   status = mds_get_subtn_res_tbl_by_adest(local_svc_hdl, svc_id, vdest_id,
>   adest, _subtn_result_info);
>   
> @@ -3571,6 +3597,24 @@ uint32_t mds_mcm_svc_down(PW_ENV_ID pwe_id, MDS_SVC_ID 
> svc_id, V_DEST_RL role,
>   /* Discard : Getting down before getting up */
>   } else { /* Entry exist in subscription result table */
>   
> + /* If adest exist and no sndrsp, start a timer */
> + MDS_ADEST_INFO *adest_info =
> + (MDS_ADEST_INFO *)ncs_patricia_tree_get(
> + _mds_mcm_cb->adest_list,
> + (uint8_t *));
> + if (adest_info) {
> + adest_info->svc_cnt--;
> + if (adest_info->svc_cnt == 0 &&
> + adest_info->sndrsp_cnt == 0) {
> + m_MDS_LOG_INFO("MCM:API: Adest=%" PRIu64
> + " down timer start", adest);
> + if (adest_info->tmr_start == false) {
> + adest_info->tmr_start = true;
> + start_mds_down_tmr(adest, svc_id);
[Vu] Seems mds_down tmr is started twice. The first start is at the 
beginning of the function.
[Thuan] That timer is not same with this timer, that timer for current process. 
This timer for remote adest

[Vu] But what is the reason to start the mds down timer here? What if we 
won't start the tmr?
[Thuan] Timer for handle UP -> DOWN -> receive SNDRSP -> send RSP

 /* potentially clean up the process info database */
 MDS_PROCESS_INFO *info = mds_process_info_get(adest, svc_id);
 if (info != NULL) {
 /* Process info exist, delay the cleanup with a timeout to avoid
  * race */
 start_mds_down_tmr(adest, svc_id);
 }
> +   

Re: [devel] [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to send response message [#3102]

2019-10-22 Thread Nguyen Minh Vu

Hi Thuan,

I have few comments below.

Regards, Vu

On 10/22/19 7:14 AM, thuan.tran wrote:

- When sending response message which Adest not exist (already down)
current MDS try to wait for 1.5 seconds before conclude no route to
send response message.

- There are 2 scenarios may have:
UP -> DOWN -> receive SNDRSP -> response timeout after 1.5s
UP -> receive SNDRSP -> DOWN -> response timeout after 1.5s

- With this change, MDS will not waste for 1.5s which can cause trouble
for higher layer services, e.g: ntf, imm, etc...
---
  src/mds/mds_c_api.c | 70 -
  src/mds/mds_c_sndrcv.c  | 52 --
  src/mds/mds_core.h  | 25 +--
  src/mds/mds_dt2c.h  |  2 +-
  src/mds/mds_dt_common.c | 22 -
  5 files changed, 162 insertions(+), 9 deletions(-)

diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c
index 132555b8e..5dd30c536 100644
--- a/src/mds/mds_c_api.c
+++ b/src/mds/mds_c_api.c
@@ -1900,6 +1900,32 @@ uint32_t mds_mcm_svc_up(PW_ENV_ID pwe_id, MDS_SVC_ID 
svc_id, V_DEST_RL role,
  
  	/*** Validation for SCOPE **/
  
+	if (adest != m_MDS_GET_ADEST) {

+   MDS_ADEST_INFO *adest_info =
+   (MDS_ADEST_INFO *)ncs_patricia_tree_get(
+   _mds_mcm_cb->adest_list,
+   (uint8_t *));
+   if (!adest_info) {
+   /* Add adest to adest list */
+   adest_info = m_MMGR_ALLOC_ADEST_INFO;
+   memset(adest_info, 0, sizeof(MDS_ADEST_INFO));
+   adest_info->adest = adest;
+   adest_info->node.key_info =
+   (uint8_t *)_info->adest;
+   adest_info->svc_cnt = 1;
+   adest_info->tmr_start = false;
+   ncs_patricia_tree_add(
+   _mds_mcm_cb->adest_list,
+   (NCS_PATRICIA_NODE *)adest_info);
+   m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
+   " svc_cnt=%u", adest, adest_info->svc_cnt);
+   } else {
+   adest_info->svc_cnt++;
+   m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
+   " svc_cnt=%u", adest, adest_info->svc_cnt);
+   }
+   }
[Vu] This new database, adest_list, is shared b/w internal osaf_mds 
thread and mds's user thread, hence should be protected by mutex.

+
status = mds_get_subtn_res_tbl_by_adest(local_svc_hdl, svc_id, vdest_id,
adest, _subtn_result_info);
  
@@ -3571,6 +3597,24 @@ uint32_t mds_mcm_svc_down(PW_ENV_ID pwe_id, MDS_SVC_ID svc_id, V_DEST_RL role,

/* Discard : Getting down before getting up */
} else { /* Entry exist in subscription result table */
  
+		/* If adest exist and no sndrsp, start a timer */

+   MDS_ADEST_INFO *adest_info =
+   (MDS_ADEST_INFO *)ncs_patricia_tree_get(
+   _mds_mcm_cb->adest_list,
+   (uint8_t *));
+   if (adest_info) {
+   adest_info->svc_cnt--;
+   if (adest_info->svc_cnt == 0 &&
+   adest_info->sndrsp_cnt == 0) {
+   m_MDS_LOG_INFO("MCM:API: Adest=%" PRIu64
+   " down timer start", adest);
+   if (adest_info->tmr_start == false) {
+   adest_info->tmr_start = true;
+   start_mds_down_tmr(adest, svc_id);
[Vu] Seems mds_down tmr is started twice. The first start is at the 
beginning of the function.
But what is the reason to start the mds down timer here? What if we 
won't start the tmr?


    /* potentially clean up the process info database */
    MDS_PROCESS_INFO *info = mds_process_info_get(adest, svc_id);
    if (info != NULL) {
        /* Process info exist, delay the cleanup with a timeout to avoid
         * race */
        start_mds_down_tmr(adest, svc_id);
    }

+   }
+   }
+   }
+
if (vdest_id == m_VDEST_ID_FOR_ADEST_ENTRY) {
status = mds_subtn_res_tbl_del(
local_svc_hdl, svc_id, vdest_id, adest,
@@ -4956,6 +5000,17 @@ uint32_t mds_mcm_init(void)
return NCSCC_RC_FAILURE;
}
  
+	/* ADEST TREE */

+   memset(_tree_params, 0, sizeof(NCS_PATRICIA_PARAMS));
+   pat_tree_params.key_size = sizeof(MDS_DEST);
+   if (NCSCC_RC_SUCCESS !=
+   ncs_patricia_tree_init(_mds_mcm_cb->adest_list,
+  _tree_params)) {
+   m_MDS_LOG_ERR(
+   "MCM:API: patricia_tree_init: 

Re: [devel] [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to send response message [#3102]

2019-10-22 Thread Tran Thuan
Hi Minh,

Thanks for your comments:

1-  The wait time 1.5 in this flow:
mds_mcm_process_disc_queue_checks_redundant()
mds_subtn_tbl_add_disc_queue()
if (true == time_wait) {
timeout_val = 150; /* This may need a tuning */
}

2- We should not touch to current adest db because it is used
everywhere with current logic, if we used it, we have to change logic
of many code places, which become more complex and risky.

Best Regards,
ThuanTr

-Original Message-
From: Minh Hon Chau  
Sent: Tuesday, October 22, 2019 11:31 AM
To: thuan.tran ; hans.nordeb...@ericsson.com; 
gary@dektech.com.au; vu.m.ngu...@dektech.com.au
Cc: opensaf-devel@lists.sourceforge.net
Subject: Re: [PATCH 1/1] mds: not waste 1.5s in waiting Adest already down to 
send response message [#3102]

Hi Thuan,

1- Can you point out where is the mds code that waits for 1.5 seconds, 
is it hard coded within 1.5 secs?

2- Is existing db (mds_c_db.c) in mds not enough so we need to introduce 
adest_list? I think mds must have a memory of adest, perhaps in another 
implicit form, so mds can validate an adest, a svc_id associated with adest.

thanks

Minh

On 22/10/19 11:14 am, thuan.tran wrote:
> - When sending response message which Adest not exist (already down)
> current MDS try to wait for 1.5 seconds before conclude no route to
> send response message.
>
> - There are 2 scenarios may have:
> UP -> DOWN -> receive SNDRSP -> response timeout after 1.5s
> UP -> receive SNDRSP -> DOWN -> response timeout after 1.5s
>
> - With this change, MDS will not waste for 1.5s which can cause trouble
> for higher layer services, e.g: ntf, imm, etc...
> ---
>   src/mds/mds_c_api.c | 70 -
>   src/mds/mds_c_sndrcv.c  | 52 --
>   src/mds/mds_core.h  | 25 +--
>   src/mds/mds_dt2c.h  |  2 +-
>   src/mds/mds_dt_common.c | 22 -
>   5 files changed, 162 insertions(+), 9 deletions(-)
>
> diff --git a/src/mds/mds_c_api.c b/src/mds/mds_c_api.c
> index 132555b8e..5dd30c536 100644
> --- a/src/mds/mds_c_api.c
> +++ b/src/mds/mds_c_api.c
> @@ -1900,6 +1900,32 @@ uint32_t mds_mcm_svc_up(PW_ENV_ID pwe_id, MDS_SVC_ID 
> svc_id, V_DEST_RL role,
>   
>   /*** Validation for SCOPE **/
>   
> + if (adest != m_MDS_GET_ADEST) {
> + MDS_ADEST_INFO *adest_info =
> + (MDS_ADEST_INFO *)ncs_patricia_tree_get(
> + _mds_mcm_cb->adest_list,
> + (uint8_t *));
> + if (!adest_info) {
> + /* Add adest to adest list */
> + adest_info = m_MMGR_ALLOC_ADEST_INFO;
> + memset(adest_info, 0, sizeof(MDS_ADEST_INFO));
> + adest_info->adest = adest;
> + adest_info->node.key_info =
> + (uint8_t *)_info->adest;
> + adest_info->svc_cnt = 1;
> + adest_info->tmr_start = false;
> + ncs_patricia_tree_add(
> + _mds_mcm_cb->adest_list,
> + (NCS_PATRICIA_NODE *)adest_info);
> + m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
> + " svc_cnt=%u", adest, adest_info->svc_cnt);
> + } else {
> + adest_info->svc_cnt++;
> + m_MDS_LOG_DBG("MCM:API: Adest=%" PRIu64
> + " svc_cnt=%u", adest, adest_info->svc_cnt);
> + }
> + }
> +
>   status = mds_get_subtn_res_tbl_by_adest(local_svc_hdl, svc_id, vdest_id,
>   adest, _subtn_result_info);
>   
> @@ -3571,6 +3597,24 @@ uint32_t mds_mcm_svc_down(PW_ENV_ID pwe_id, MDS_SVC_ID 
> svc_id, V_DEST_RL role,
>   /* Discard : Getting down before getting up */
>   } else { /* Entry exist in subscription result table */
>   
> + /* If adest exist and no sndrsp, start a timer */
> + MDS_ADEST_INFO *adest_info =
> + (MDS_ADEST_INFO *)ncs_patricia_tree_get(
> + _mds_mcm_cb->adest_list,
> + (uint8_t *));
> + if (adest_info) {
> + adest_info->svc_cnt--;
> + if (adest_info->svc_cnt == 0 &&
> + adest_info->sndrsp_cnt == 0) {
> + m_MDS_LOG_INFO("MCM:API: Adest=%" PRIu64
> + " down timer start", adest);
> + if (adest_info->tmr_start == false) {
> + adest_info->tmr_start = true;
> + start_mds_down_tmr(adest, svc_id);
> + }
> + }
> + }
> +
>   if (vdest_id == m_VDEST_ID_FOR_ADEST_ENTRY) {
>