Some additional info.
I found out that the users were testing in a lab that had a delay between the
two SC nodes. The delay was added for geographical redundancy testing.
Once the time was reduced, the timeout error for the opensaf swap went away.
In looking through the osafimmnd log file, I see the following:
Mar 14 11:31:48.320965 osafimmnd [21104:ImmModel.cc:12042] T5 Forcing Adm Req
continuation to expire 609885356033
...
Mar 14 11:31:48.601903 osafimmnd [21104:ImmModel.cc:12437] T5 Timeout on
AdministrativeOp continuation 609885356033 tmout:1
Mar 14 11:31:48.601952 osafimmnd [21104:ImmModel.cc:11311] T5 REQ ADM
CONTINUATION 5069295 FOUND FOR 609885356033
Mar 14 11:31:48.601987 osafimmnd [21104:immnd_proc.c:1086] WA Timeout on
syncronous admin operation 1
The code around line 12042 of file ImmModel.cc is as follows:
12040 for(ci2=sAdmReqContinuationMap.begin();
ci2!=sAdmReqContinuationMap.end(); ++ci2) {
12041 if((ci2->second.mTimeout) && (ci2->second.mImplId ==
implHandle)) {
12042 TRACE_5("Forcing Adm Req continuation to expire %llu",
ci2->first);
12043 ci2->second.mTimeout = 1; /* one second is minimum
timeout. */
12044 }
12045 }
Right after the log at line 12042 is generated, the timeout value is updated to
1 second (line12043).
Can I increase this to 2 seconds?
If so, would it cause any badness?
Regards,
David
From: David Hoyt
Sent: Tuesday, March 14, 2017 2:45 PM
To: [email protected]
Subject: si-swap opensaf SUs results in error but the action still completes
Hi all,
Setup:
- Opensaf 4.6 running on RHEL 6.6 VMs with TCP
- 2 controllers, 4 payloads
- SC-1's hostname=sb117vm0
- SC-2's hostname=sb117vm1
I'm encountering a timeout when issuing a si-swap of the opensaf SUs.
However, the swap actually completes, so why does the timeout error occur?
Before the swap, SC-1/SC-2 were active/standby:
[root@sb117vm0 ~]# date ; amf-state siass | grep -A 1
"safSi=SC-2N,safApp=OpenSAF"
Tue Mar 14 11:30:58 EDT 2017
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
--
safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=STANDBY(2)
[root@sb117vm0 ~]#
The following shows the error response from the si-swap as well as the opensaf
SU status after the swap. The status indicates that the swap did occur.
[root@sb117vm0 ~]# date ; amf-adm si-swap safSi=SC-2N,safApp=OpenSAF; date ;
amf-state su | grep -A 4 "safSg=2N,safApp=OpenSAF"; amf-state siass | grep -A
2 "safSi=SC-2N,safApp=OpenSAF"
Tue Mar 14 11:31:41 EDT 2017
error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_TIMEOUT (5)
Tue Mar 14 11:31:48 EDT 2017
safSu=SC-1,safSg=2N,safApp=OpenSAF
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)
--
safSu=SC-2,safSg=2N,safApp=OpenSAF
saAmfSUAdminState=UNLOCKED(1)
saAmfSUOperState=ENABLED(1)
saAmfSUPresenceState=INSTANTIATED(3)
saAmfSUReadinessState=IN-SERVICE(2)
safSISU=safSu=SC-1\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=STANDBY(2)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
--
safSISU=safSu=SC-2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)
saAmfSISUHAReadinessState=READY_FOR_ASSIGNMENT(1)
[root@sb117vm0 ~]#
Below provides a snippet of the /var/log/messages file from both SCs.
Again, the logs show the swap action completed as expected - with the exception
of the osafimmnd timeout.
Also attached are the osafamfnd log files from each SC.
sb117vm0 (SC-1)
Mar 14 11:31:41 sb117vm0 osafamfd[21236]: NO safSi=SC-2N,safApp=OpenSAF Swap
initiated
Mar 14 11:31:41 sb117vm0 osafamfnd[21267]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Mar 14 11:31:41 sb117vm0 osafimmnd[21104]: NO Implementer disconnected 65 <3,
2010f> (safLogService)
Mar 14 11:31:42 sb117vm0 osafimmnd[21104]: NO Implementer disconnected 66 <7,
2010f> (safClmService)
Mar 14 11:31:43 sb117vm0 osafamfnd[21267]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' QUIESCED to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Mar 14 11:31:43 sb117vm0 osafimmnd[21104]: NO Implementer connected: 69
(safLogService) <0, 2020f>
Mar 14 11:31:43 sb117vm0 osafimmnd[21104]: NO Implementer connected: 70
(safClmService) <0, 2020f>
Mar 14 11:31:43 sb117vm0 osafimmpbed: IN Starting distributed PBE commit for
PRTA update Ccb:10000017e/4294967678
Mar 14 11:31:43 sb117vm0 osafimmpbed: IN Slave PBE replied with OK on attempt
to start prepare of ccb:10000017e/4294967678
Mar 14 11:31:43 sb117vm0 osafamfnd[21267]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Mar 14 11:31:43 sb117vm0 osafamfnd[21267]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' STANDBY to 'safSu=SC-1,safSg=2N,safApp=OpenSAF'
Mar 14 11:31:44 sb117vm0 osafamfd[21236]: NO Controller switch over initiated
Mar 14 11:31:44 sb117vm0 osafamfd[21236]: NO ROLE SWITCH Active --> Quiesced
Mar 14 11:31:44 sb117vm0 osafimmpbed: IN Starting distributed PBE commit for
PRTA update Ccb:10000017f/4294967679
Mar 14 11:31:44 sb117vm0 osafimmpbed: IN Slave PBE replied with OK on attempt
to start prepare of ccb:10000017f/4294967679
Mar 14 11:31:44 sb117vm0 osafimmpbed: IN Starting distributed PBE commit for
PRTA update Ccb:100000180/4294967680
Mar 14 11:31:45 sb117vm0 osafimmpbed: IN Slave PBE replied with OK on attempt
to start prepare of ccb:100000180/4294967680
Mar 14 11:31:45 sb117vm0 osafimmpbed: IN Starting distributed PBE commit for
PRTA update Ccb:100000181/4294967681
Mar 14 11:31:45 sb117vm0 osafimmpbed: IN Slave PBE replied with OK on attempt
to start prepare of ccb:100000181/4294967681
Mar 14 11:31:46 sb117vm0 osafimmpbed: IN Starting distributed PBE commit for
PRTA update Ccb:100000182/4294967682
Mar 14 11:31:46 sb117vm0 osafimmpbed: IN Slave PBE replied with OK on attempt
to start prepare of ccb:100000182/4294967682
Mar 14 11:31:46 sb117vm0 osafimmpbed: IN Starting distributed PBE commit for
PRTA update Ccb:100000183/4294967683
Mar 14 11:31:46 sb117vm0 osafimmpbed: IN Slave PBE replied with OK on attempt
to start prepare of ccb:100000183/4294967683
Mar 14 11:31:47 sb117vm0 osafrded[21057]: NO RDE role set to QUIESCED
Mar 14 11:31:48 sb117vm0 osafimmnd[21104]: NO Implementer disconnected 68
<1161150, 2010f> (safAmfService)
Mar 14 11:31:48 sb117vm0 osafimmnd[21104]: NO Implementer (applier) connected:
71 (@safAmfService2010f) <1161150, 2010f>
Mar 14 11:31:48 sb117vm0 osafimmnd[21104]: WA Timeout on syncronous admin
operation 1
Mar 14 11:31:50 sb117vm0 osafimmnd[21104]: NO Implementer disconnected 67 <0,
2020f> (@safAmfService2020f)
Mar 14 11:31:50 sb117vm0 osafimmnd[21104]: NO Implementer connected: 72
(safAmfService) <0, 2020f>
Mar 14 11:31:50 sb117vm0 osafamfd[21236]: NO Switching Quiesced --> StandBy
Mar 14 11:31:50 sb117vm0 osafrded[21057]: NO RDE role set to STANDBY
Mar 14 11:31:50 sb117vm0 osafamfd[21236]: NO Controller switch over done
sb117vm1 (SC-2)
Mar 14 11:31:41 sb117vm1 osafimmnd[1898]: NO Implementer disconnected 65 <0,
2010f> (safLogService)
Mar 14 11:31:42 sb117vm1 osafimmnd[1898]: NO Implementer disconnected 66 <0,
2010f> (safClmService)
Mar 14 11:31:43 sb117vm1 osafamfnd[1994]: NO Assigning
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Mar 14 11:31:43 sb117vm1 osafimmd[1886]: WA IMMD not re-electing coord for
switch-over (si-swap) coord at (2010f)
Mar 14 11:31:43 sb117vm1 osafimmnd[1898]: NO Implementer connected: 69
(safLogService) <3, 2020f>
Mar 14 11:31:43 sb117vm1 osafimmnd[1898]: NO Implementer connected: 70
(safClmService) <4, 2020f>
Mar 14 11:31:43 sb117vm1 osaflogd[1923]: WA read_logsv_configuration(). All
attributes could not be read
Mar 14 11:31:43 sb117vm1 osafamfnd[1994]: NO Assigned
'safSi=SC-2N,safApp=OpenSAF' ACTIVE to 'safSu=SC-2,safSg=2N,safApp=OpenSAF'
Mar 14 11:31:43 sb117vm1 osafimmpbed: IN PBE slave waiting for prepare from
primary on PRTA update ccb:10000017e
Mar 14 11:31:43 sb117vm1 osafimmpbed: IN ccb-prepare received at PBE slave
ccbId:10000017e/4294967678 numOps:1
Mar 14 11:31:44 sb117vm1 osafimmpbed: IN PBE slave waiting for prepare from
primary on PRTA update ccb:10000017f
Mar 14 11:31:44 sb117vm1 osafimmpbed: IN ccb-prepare received at PBE slave
ccbId:10000017f/4294967679 numOps:1
Mar 14 11:31:44 sb117vm1 osafimmpbed: IN PBE slave waiting for prepare from
primary on PRTA update ccb:100000180
Mar 14 11:31:45 sb117vm1 osafimmpbed: IN ccb-prepare received at PBE slave
ccbId:100000180/4294967680 numOps:1
Mar 14 11:31:45 sb117vm1 osafimmpbed: IN PBE slave waiting for prepare from
primary on PRTA update ccb:100000181
Mar 14 11:31:45 sb117vm1 osafimmpbed: IN ccb-prepare received at PBE slave
ccbId:100000181/4294967681 numOps:1
Mar 14 11:31:46 sb117vm1 osafimmpbed: IN PBE slave waiting for prepare from
primary on PRTA update ccb:100000182
Mar 14 11:31:46 sb117vm1 osafimmpbed: IN ccb-prepare received at PBE slave
ccbId:100000182/4294967682 numOps:1
Mar 14 11:31:46 sb117vm1 osafimmpbed: IN PBE slave waiting for prepare from
primary on PRTA update ccb:100000183
Mar 14 11:31:46 sb117vm1 osafimmpbed: IN ccb-prepare received at PBE slave
ccbId:100000183/4294967683 numOps:1
Mar 14 11:31:48 sb117vm1 osafimmnd[1898]: NO Implementer disconnected 68 <0,
2010f> (safAmfService)
Mar 14 11:31:48 sb117vm1 osafimmnd[1898]: NO Implementer (applier) connected:
71 (@safAmfService2010f) <0, 2010f>
Mar 14 11:31:50 sb117vm1 osafamfd[1973]: NO Switching StandBy --> Active State
Mar 14 11:31:50 sb117vm1 osafimmnd[1898]: NO Implementer disconnected 67 <9,
2020f> (@safAmfService2020f)
Mar 14 11:31:50 sb117vm1 osafimmnd[1898]: NO Implementer connected: 72
(safAmfService) <9, 2020f>
Mar 14 11:31:50 sb117vm1 osafrded[1850]: NO RDE role set to ACTIVE
Mar 14 11:31:50 sb117vm1 osafclmd[1950]: NO ACTIVE request
Mar 14 11:31:50 sb117vm1 osafamfd[1973]: NO Controller switch over done
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users