Hi,
I think, as Minh has pointed out 10 second is plausbile time and may depend on 
the type of API. If NTFs reads notification from a file(not supported now), 
then a not only a reader API may have to wait for long time but also an API 
which is related to less critical activity. In this way, response time for less 
critical APIs may also increase . In that sense 10 seconds may become some 
average time uniformly for all the APIs. SAF API interface is designed for use 
by both threaded and non-threaded application processes, this can help 
application to optimize their service requests.

Regarding the patch for this ticket, I see both the approaches needs 
refinement. I think termination of IMCN can be avoided by making it role aware 
by way of RDE callback mechanism (Not the #157 way). Upon role change from 
standby to active, IMCN can send the error notification to the user giving 
impression that it has been restarted. If Imm handle goes BAD then IMCN will 
exit and survillence thread will start it in correct role. So in this case NTFS 
wll not be terminaing it. I am still evaluating this.

Thanks,
Praveen



---

** [tickets:#2219] ntfd: circular dependency with osafntfimcnd**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Thu Dec 08, 2016 05:14 AM UTC by Gary Lee
**Last Updated:** Tue Dec 13, 2016 04:40 AM UTC
**Owner:** Praveen


A circular dependency can be seen when performing a si-swap of 
safSi=SC-2N,safApp=OpenSAF:

1. Active NTFD is trying to sync with Standby using MBC
2. Standby NTFD is the process of terminating its local osafntfimcnd. It is 
stuck in timedwait_imcn_exit() and cannot reply to the Active.
3. osafntfimcnd [on standby] is trying to send a notfication to Active NTFD

So we have (1) depending on (2) depending on (3) depending on (1)

This results in a temporary deadlock that dramatically slows down NTFD's 
ability to process its main dispatch loop. The deadlock only lasts for approx. 
1 second, when mbcsv_mds_send_msg() times out. But since there could be lots of 
MBC messages to send, sometimes osafntfimcnd is killed with SIGABRT generating 
a coredump. The si-swap operation will also timeout.

steps to reproduce
- Run loop of ntftest 32
root@SC-1:~# for i in {1..10}; do ntftest 32; done
- On another terminal, keep swapping 2N Opensaf SI, got coredump after couples 
of swaps
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF
...
root@SC-1:~# amf-adm si-swap safSi=SC-2N,safApp=OpenSAF

~~~
SC-2 (active)

There are a lot of send failures. Each taking approx. 1 second to timeout. 
During these 1 second timeouts, NTFD cannot process the main dispatch loop.
    
    Dec  7 11:01:37.531772 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
    Dec  7 11:01:37.531781 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
    Dec  7 11:01:38.537307 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
    Dec  7 11:01:38.537758 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
    Dec  7 11:01:38.537766 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
    Dec  7 11:01:39.543180 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
    Dec  7 11:01:39.543695 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
    Dec  7 11:01:39.543698 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
    Dec  7 11:01:40.545252 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
    Dec  7 11:01:40.545719 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
    Dec  7 11:01:40.545726 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
    Dec  7 11:01:41.551328 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
    Dec  7 11:01:41.551971 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
    Dec  7 11:01:41.551979 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
    Dec  7 11:01:42.557594 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
    Dec  7 11:01:42.558171 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
    Dec  7 11:01:42.558179 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
    Dec  7 11:01:43.564051 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
    Dec  7 11:01:43.564874 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
    Dec  7 11:01:43.564883 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
    Dec  7 11:01:44.572407 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
    Dec  7 11:01:44.573262 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
    Dec  7 11:01:44.573271 osafntfd [452:mbcsv_mds.c:0209] TR send type 
MDS_SENDTYPE_REDRSP:
    Dec  7 11:01:45.575091 osafntfd [452:mbcsv_mds.c:0247] << 
mbcsv_mds_send_msg: failure
    Dec  7 11:01:47.083548 osafntfd [452:mbcsv_mds.c:0185] >> 
mbcsv_mds_send_msg: sending to vdest:e
    
~~~


~~~
SC-1 (standby)

NTFD is trying to terminate osafntfimcnd. While it is doing that, it cannot 
reply to NTFD on SC-2. Meanwhile, osafntfimcnd is sending NTF notifications to 
NTFD on SC-1.
    
    Dec  7 11:01:35.453151 osafntfd [464:ntfs_imcnutil.c:0316] TR 
handle_state_ntfimcn: Terminating osafntfimcnd process
    Dec  7 11:01:45.474313 osafntfd [464:ntfs_imcnutil.c:0124] TR   Termination 
timeout
    Dec  7 11:01:45.474375 osafntfd [464:ntfs_imcnutil.c:0130] << 
wait_imcnproc_termination: rc = -1, retry_cnt = 101
    Dec  7 11:01:45.474387 osafntfd [464:ntfs_imcnutil.c:0168] TR   Normal 
termination failed. Escalate to abort
    Dec  7 11:01:45.574703 osafntfd [464:ntfs_imcnutil.c:0172] TR   Imcn 
successfully aborted
    Dec  7 11:01:45.574712 osafntfd [464:ntfs_imcnutil.c:0187] << 
timedwait_imcn_exit    
~~~



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to