Re: [users] si-swap opensaf SUs results in error but the action still completes

David Hoyt Fri, 17 Mar 2017 06:36:35 -0700

Hi Neel,

The purpose of the test is to see if our system can continue to run “normally” 
when in a geographical configuration.
That is, both SCs are NOT co-located, but reside thousands of km apart.
This is simulated in the lab by adding a delay between the two severs which 
host the SCs.


What we’re seeing is that when the delay is increased to a certain value, the 
si-swap command between the two OpenSAF SUs results in an error.
[root@sb117vm0 ~]# date ; amf-adm si-swap safSi=SC-2N,safApp=OpenSAF;
Tue Mar 14 11:31:41 EDT 2017
error - saImmOmAdminOperationInvoke_2 FAILED: SA_AIS_ERR_TIMEOUT (5)

However, the logs show that the action actually completes about 2 seconds after 
the timeout.
Mar 14 11:31:48 sb117vm0 osafimmnd[21104]: WA Timeout on syncronous admin 
operation 1
Mar 14 11:31:50 sb117vm0 osafimmnd[21104]: NO Implementer disconnected 67 <0, 
2020f> (@safAmfService2020f)
Mar 14 11:31:50 sb117vm0 osafimmnd[21104]: NO Implementer connected: 72 
(safAmfService) <0, 2020f>
Mar 14 11:31:50 sb117vm0 osafamfd[21236]: NO Switching Quiesced --> StandBy
Mar 14 11:31:50 sb117vm0 osafrded[21057]: NO RDE role set to STANDBY
Mar 14 11:31:50 sb117vm0 osafamfd[21236]: NO Controller switch over done

I’m trying to determine if there’s some way to delay the immnd time-out so that 
the si-swap command returns success.
Regards,
David


From: Neelakanta Reddy [mailto:[email protected]]
Sent: Friday, March 17, 2017 7:10 AM
To: David Hoyt <[email protected]>; [email protected]
Subject: Re: [users] si-swap opensaf SUs results in error but the action still 
completes

________________________________
NOTICE: This email was received from an EXTERNAL sender
________________________________

Hi,

comments inline.

On 2017/03/16 07:33 PM, David Hoyt wrote:
> Some additional info.
>
> I found out that the users were testing in a lab that had a delay between the 
> two SC nodes. The delay was added for geographical redundancy testing.
> Once the time was reduced, the timeout error for the opensaf swap went away.
>
> In looking through the osafimmnd log file, I see the following:
> Mar 14 11:31:48.320965 osafimmnd [21104:ImmModel.cc:12042] T5 Forcing Adm Req 
> continuation to expire 609885356033
> ...
> Mar 14 11:31:48.601903 osafimmnd [21104:ImmModel.cc:12437] T5 Timeout on 
> AdministrativeOp continuation 609885356033 tmout:1
> Mar 14 11:31:48.601952 osafimmnd [21104:ImmModel.cc:11311] T5 REQ ADM 
> CONTINUATION 5069295 FOUND FOR 609885356033
> Mar 14 11:31:48.601987 osafimmnd [21104:immnd_proc.c:1086] WA Timeout on 
> syncronous admin operation 1
>
>
> The code around line 12042 of file ImmModel.cc is as follows:
>
> 12040 for(ci2=sAdmReqContinuationMap.begin(); 
> ci2!=sAdmReqContinuationMap.end(); ++ci2) {
> 12041 if((ci2->second.mTimeout) && (ci2->second.mImplId == implHandle)) {
> 12042 TRACE_5("Forcing Adm Req continuation to expire %llu", ci2->first);
> 12043 ci2->second.mTimeout = 1; /* one second is minimum timeout. */
> 12044 }
> 12045 }
>
>
> Right after the log at line 12042 is generated, the timeout value is updated 
> to 1 second (line12043).
The node where the adminoperation is targeted went down from OpenSAF
perspective.
Then the minimum timeout of 1 second is updated.
> Can I increase this to 2 seconds?
OpenSAF, noted the other node as down, increasing to 2 seconds what
additional benefit can be achieved?

> If so, would it cause any badness?
Explain, what is the end result you are targeting.

Regards,
Neel.
>
> Regards,
> David
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] si-swap opensaf SUs results in error but the action still completes

Reply via email to