Following is the analysis of the logs:

imma trace(Application is started from PL-3):

Adminoperation sync is called(here the timeout is specified is 1 second): 

May  7 19:51:13.918324 imma [16924:imma_om_api.c:3332] >> admin_op_invoke_common
May  7 19:51:14.927406 imma [16924:imma_om_api.c:3536] TR Fevs send RETURNED:5
May  7 19:51:14.927606 imma [16924:imma_om_api.c:3666] << admin_op_invoke_common

immnd traces:

1. Local IMMND (PL-3) received the event and setAdminReqContinuation:

May  7 19:51:13.922355 osafimmnd [16355:immsv_evt.c:5044] T8 Received: 
IMMND_EVT_A2ND_IMM_FEVS (14) from 2030f
May  7 19:51:13.922581 osafimmnd [16355:immnd_evt.c:2457] T2 sender_count: 
128849018881 size: 82
May  7 19:51:13.922706 osafimmnd [16355:immnd_evt.c:2733] >> 
immnd_fevs_local_checks
May  7 19:51:13.922874 osafimmnd [16355:immnd_evt.c:3065] << 
immnd_fevs_local_checks
May  7 19:51:13.923071 osafimmnd [16355:ImmModel.cc:8824] >> 
setAdmReqContinuation
May  7 19:51:13.923188 osafimmnd [16355:ImmModel.cc:8825] T5 
setAdmReqContinuation <128849018881, 54>
May  7 19:51:13.923335 osafimmnd [16355:ImmModel.cc:8827] << 
setAdmReqContinuation

2. Forwarding event to IMMD, in the 

May  7 19:51:13.923622 osafimmnd [16355:immnd_evt.c:2656] T2 SENDING FEVS TO 
IMMD
May  7 19:51:13.923763 osafimmnd [16355:immsv_evt.c:5025] T8 Sending:  
IMMD_EVT_ND2D_FEVS_REQ to 0

3. 

IMMD component is killed for failover on SC-1:

May  7 19:51:18 SC-1 osafamfnd[2315]: NO 
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'

on PL-3 Admin operation gor timed out (since the timeout is 1 second):


May  7 19:51:14.398207 osafimmnd [16355:ImmModel.cc:9819] T5 Did not timeout 
now - start < 12(1)
May  7 19:51:14.927803 osafimmnd [16355:immsv_evt.c:5044] T8 Received: 
IMMND_EVT_A2ND_CL_TIMEOUT (93) from 2030f
May  7 19:51:14.928374 osafimmnd [16355:immnd_evt.c:1664] >> 
immnd_evt_proc_cl_imma_timeout
May  7 19:51:14.928447 osafimmnd [16355:immnd_evt.c:1666] T2 timeout in imma 
library for handle: 360002030f

4. The setAdminReqContinuation is purged (which means req continuation is 
erased)

May  7 19:51:14.928509 osafimmnd [16355:ImmModel.cc:9684] >> purgeSyncRequest
May  7 19:51:14.928588 osafimmnd [16355:ImmModel.cc:9716] T5 Purged syncronous 
Admin-op continuation
May  7 19:51:14.928659 osafimmnd [16355:ImmModel.cc:9791] << purgeSyncRequest
May  7 19:51:14.928728 osafimmnd [16355:immnd_evt.c:1694] << 
immnd_evt_proc_cl_imma_timeout
May  7 19:51:18.420392 osafimmnd [16355:immnd_proc.c:1497] T5 tmout:1000 ste:10 
ME:10 RE:10 crd:0 rim:KEEP_REPO 4.1A:1
May  7 19:51:18.929426 osafimmnd [16355:immsv_evt.c:5044] T8 Received: 
IMMND_EVT_A2ND_IMM_FEVS (14) from 2030f

5. Once the IMMD is up the FEVS came to IMMND and invcation is sent to OI

May  7 19:51:19.774459 osafimmnd [16355:immnd_evt.c:7634] >> 
immnd_evt_proc_fevs_rcv
May  7 19:51:19.774633 osafimmnd [16355:immnd_evt.c:7650] T2 FEVS from myself, 
still pending:1
May  7 19:51:19.774944 osafimmnd [16355:immsv_evt.c:5044] T8 Received: 
IMMND_EVT_A2ND_IMM_ADMOP (12) from 0
May  7 19:51:19.775101 osafimmnd [16355:immnd_evt.c:4247] >> 
immnd_evt_proc_admop
May  7 19:51:19.775195 osafimmnd [16355:immnd_evt.c:4248] T1 SYNC ADMOP
May  7 19:51:19.775301 osafimmnd [16355:ImmModel.cc:8659] >> 
adminOperationInvoke
May  7 19:51:19.775418 osafimmnd [16355:ImmModel.cc:8666] T5 Admin op on 
objectName:Object1=tetConfig
May  7 19:51:19.775548 osafimmnd [16355:ImmModel.cc:8745] T5 IMPLEMENTER FOR 
ADMIN OPERATION INVOKE 82 conn:53 node:2030f name:OIImplementer
May  7 19:51:19.775648 osafimmnd [16355:ImmModel.cc:8753] T5 Updating req 
invocation inv:128849018881 conn:54 timeout:2
May  7 19:51:19.787377 osafimmnd [16355:ImmModel.cc:8757] WA Assuming reply for 
adminOp 128849018881 arrived before request.
May  7 19:51:19.787627 osafimmnd [16355:ImmModel.cc:8788] T5 Storing impl 
invocation 53 for inv: 128849018881
May  7 19:51:19.787761 osafimmnd [16355:ImmModel.cc:8817] << 
adminOperationInvoke
May  7 19:51:19.787880 osafimmnd [16355:immnd_evt.c:4303] T2 IMMND sending 
Agent upcall
May  7 19:51:19.793571 osafimmnd [16355:immnd_evt.c:4309] T2 IMMND UPCALL TO 
AGENT SEND SUCCEEDED
May  7 19:51:19.793769 osafimmnd [16355:immnd_evt.c:4367] T2 Send immediate 
reply to client
May  7 19:51:19.793897 osafimmnd [16355:immnd_evt.c:4421] T2 Delayed reply, 
wait for reply from implementer
May  7 19:51:19.793999 osafimmnd [16355:immnd_evt.c:4425] << 
immnd_evt_proc_admop
May  7 19:51:19.794101 osafimmnd [16355:immnd_evt.c:7576] >> dequeue_outgoing
May  7 19:51:19.794295 osafimmnd [16355:immnd_evt.c:7582] TR Pending replies:1 
space:15 out list?:(nil)
May  7 19:51:19.794399 osafimmnd [16355:immnd_evt.c:7611] << dequeue_outgoing
May  7 19:51:19.794494 osafimmnd [16355:immnd_evt.c:7691] << 
immnd_evt_proc_fevs_rcv


6. OI sent a reply

May  7 19:51:19.794833 osafimmnd [16355:immsv_evt.c:5044] T8 Received: 
IMMND_EVT_A2ND_ADMOP_RSP (21) from 2030f

7. Fetch of CONTIUATION suceeded but fetchAdmReqContinuation will not return 
reqConn(Because setAdminReqContinuation is erased)

May  7 19:51:19.794989 osafimmnd [16355:ImmModel.cc:8881] T5 Fetch implCon for 
invocation:128849018881
May  7 19:51:19.795090 osafimmnd [16355:ImmModel.cc:8891] T5 IMPL ADM 
CONTINUATION 53 FOUND FOR 128849018881
May  7 19:51:19.795198 osafimmnd [16355:immnd_evt.c:4139] T2 
invocation:128849018881, result:1
 impl:53 req:0 dest:566315114766340 me:566315114766340


8. Because of this IMMND tries to forward message to smae ND (i.e PL-3 to PL-3)

May  7 19:51:19.795288 osafimmnd [16355:immnd_evt.c:4197] T2 FORWARDING TO 
OTHER ND!
May  7 19:51:19.795391 osafimmnd [16355:immsv_evt.c:5030] T8 Sending:  
IMMND_EVT_ND2ND_ADMOP_RSP to 2030f

May  7 19:51:19.811010 osafimmnd [16355:immnd_mds.c:0737] WA MDS Send Failed to 
service:IMMND rc:2
May  7 19:51:19.811146 osafimmnd [16355:immnd_evt.c:4202] ER Problem in sending 
to peer IMMND over MDS. Discarding admin op reply.
May  7 19:51:19.811495 osafimmnd [16355:immnd_evt.c:0669] WA Error code 2 
returned for message type 21 - ignoring
May  7 19:51:19.811772 osafimmnd [16355:immsv_evt.c:5044] T8 Received: 
IMMND_EVT_A2ND_IMM_FEVS (14) from 2030f
May  7 19:51:19.811896 osafimmnd [16355:immnd_evt.c:2457] T2 sender_count: 1 
size: 46
May  7 19:51:19.812021 osafimmnd [16355:immnd_evt.c:2733] >> 
immnd_fevs_local_checks
May  7 19:51:19.813525 osafimmnd [16355:immnd_evt.c:3065] << 
immnd_fevs_local_checks
May  7 19:51:19.813708 osafimmnd [16355:immnd_evt.c:2637] TR Replies pending:2
May  7 19:51:19.813860 osafimmnd [16355:immnd_evt.c:2656] T2 SENDING FEVS TO 
IMMD
May  7 19:51:19.814467 osafimmnd [16355:immsv_evt.c:5025] T8 Sending:  
IMMD_EVT_ND2D_FEVS_REQ to 0

9. 

In general as surendar, observation of admin operation callback is not arrived 
at OI looks to be not correct as the reply from OI is observed at point 6 in 
above observation.

The above observation confirms, the IMM is working as expected.


---

** [tickets:#889] unknown: oi poll timeout differs during switchover and 
failover scenarios**

**Status:** unassigned
**Milestone:** future
**Created:** Wed May 07, 2014 03:10 PM UTC by surender khetavath
**Last Updated:** Thu May 08, 2014 10:50 AM UTC
**Owner:** nobody

changeset : 5143.

test:
1) in a thread do oiInit()
2) oiImplSet() & OiObjectImplSet on an object
3) oiselectionObjectGet()
4) poll() on the fd

In the main thread
1) om init, ownerset,
2) invoke controller failover/switchover
3) AdminOp(ONE_SECOND) on the object

If the poll timeout value is 40secs, then OI doesn't receive AdminOp callback 
and poll timesout.
If the poll timeout value is increased to say 80secs, then OI gets AdminOp 
callback.

How does  it differ? 
1) is the imm operation held until failover is completed?
2) is the imm operation held until the failed node re-joins?
3) The time to receive cbk i.e more than 40secs is not acceptable for HA.

The same test using swithover succeeds i.e receives cbk under 20secs of poll 
timeout.


two versions of logs attached


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
&#149; 3 signs your SCM is hindering your productivity
&#149; Requirements for releasing software faster
&#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to