Following is the analysis of the logs:
imma trace(Application is started from PL-3):
Adminoperation sync is called(here the timeout is specified is 1 second):
May 7 19:51:13.918324 imma [16924:imma_om_api.c:3332] >> admin_op_invoke_common
May 7 19:51:14.927406 imma [16924:imma_om_api.c:3536] TR Fevs send RETURNED:5
May 7 19:51:14.927606 imma [16924:imma_om_api.c:3666] << admin_op_invoke_common
immnd traces:
1. Local IMMND (PL-3) received the event and setAdminReqContinuation:
May 7 19:51:13.922355 osafimmnd [16355:immsv_evt.c:5044] T8 Received:
IMMND_EVT_A2ND_IMM_FEVS (14) from 2030f
May 7 19:51:13.922581 osafimmnd [16355:immnd_evt.c:2457] T2 sender_count:
128849018881 size: 82
May 7 19:51:13.922706 osafimmnd [16355:immnd_evt.c:2733] >>
immnd_fevs_local_checks
May 7 19:51:13.922874 osafimmnd [16355:immnd_evt.c:3065] <<
immnd_fevs_local_checks
May 7 19:51:13.923071 osafimmnd [16355:ImmModel.cc:8824] >>
setAdmReqContinuation
May 7 19:51:13.923188 osafimmnd [16355:ImmModel.cc:8825] T5
setAdmReqContinuation <128849018881, 54>
May 7 19:51:13.923335 osafimmnd [16355:ImmModel.cc:8827] <<
setAdmReqContinuation
2. Forwarding event to IMMD, in the
May 7 19:51:13.923622 osafimmnd [16355:immnd_evt.c:2656] T2 SENDING FEVS TO
IMMD
May 7 19:51:13.923763 osafimmnd [16355:immsv_evt.c:5025] T8 Sending:
IMMD_EVT_ND2D_FEVS_REQ to 0
3.
IMMD component is killed for failover on SC-1:
May 7 19:51:18 SC-1 osafamfnd[2315]: NO
'safComp=IMMD,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
on PL-3 Admin operation gor timed out (since the timeout is 1 second):
May 7 19:51:14.398207 osafimmnd [16355:ImmModel.cc:9819] T5 Did not timeout
now - start < 12(1)
May 7 19:51:14.927803 osafimmnd [16355:immsv_evt.c:5044] T8 Received:
IMMND_EVT_A2ND_CL_TIMEOUT (93) from 2030f
May 7 19:51:14.928374 osafimmnd [16355:immnd_evt.c:1664] >>
immnd_evt_proc_cl_imma_timeout
May 7 19:51:14.928447 osafimmnd [16355:immnd_evt.c:1666] T2 timeout in imma
library for handle: 360002030f
4. The setAdminReqContinuation is purged (which means req continuation is
erased)
May 7 19:51:14.928509 osafimmnd [16355:ImmModel.cc:9684] >> purgeSyncRequest
May 7 19:51:14.928588 osafimmnd [16355:ImmModel.cc:9716] T5 Purged syncronous
Admin-op continuation
May 7 19:51:14.928659 osafimmnd [16355:ImmModel.cc:9791] << purgeSyncRequest
May 7 19:51:14.928728 osafimmnd [16355:immnd_evt.c:1694] <<
immnd_evt_proc_cl_imma_timeout
May 7 19:51:18.420392 osafimmnd [16355:immnd_proc.c:1497] T5 tmout:1000 ste:10
ME:10 RE:10 crd:0 rim:KEEP_REPO 4.1A:1
May 7 19:51:18.929426 osafimmnd [16355:immsv_evt.c:5044] T8 Received:
IMMND_EVT_A2ND_IMM_FEVS (14) from 2030f
5. Once the IMMD is up the FEVS came to IMMND and invcation is sent to OI
May 7 19:51:19.774459 osafimmnd [16355:immnd_evt.c:7634] >>
immnd_evt_proc_fevs_rcv
May 7 19:51:19.774633 osafimmnd [16355:immnd_evt.c:7650] T2 FEVS from myself,
still pending:1
May 7 19:51:19.774944 osafimmnd [16355:immsv_evt.c:5044] T8 Received:
IMMND_EVT_A2ND_IMM_ADMOP (12) from 0
May 7 19:51:19.775101 osafimmnd [16355:immnd_evt.c:4247] >>
immnd_evt_proc_admop
May 7 19:51:19.775195 osafimmnd [16355:immnd_evt.c:4248] T1 SYNC ADMOP
May 7 19:51:19.775301 osafimmnd [16355:ImmModel.cc:8659] >>
adminOperationInvoke
May 7 19:51:19.775418 osafimmnd [16355:ImmModel.cc:8666] T5 Admin op on
objectName:Object1=tetConfig
May 7 19:51:19.775548 osafimmnd [16355:ImmModel.cc:8745] T5 IMPLEMENTER FOR
ADMIN OPERATION INVOKE 82 conn:53 node:2030f name:OIImplementer
May 7 19:51:19.775648 osafimmnd [16355:ImmModel.cc:8753] T5 Updating req
invocation inv:128849018881 conn:54 timeout:2
May 7 19:51:19.787377 osafimmnd [16355:ImmModel.cc:8757] WA Assuming reply for
adminOp 128849018881 arrived before request.
May 7 19:51:19.787627 osafimmnd [16355:ImmModel.cc:8788] T5 Storing impl
invocation 53 for inv: 128849018881
May 7 19:51:19.787761 osafimmnd [16355:ImmModel.cc:8817] <<
adminOperationInvoke
May 7 19:51:19.787880 osafimmnd [16355:immnd_evt.c:4303] T2 IMMND sending
Agent upcall
May 7 19:51:19.793571 osafimmnd [16355:immnd_evt.c:4309] T2 IMMND UPCALL TO
AGENT SEND SUCCEEDED
May 7 19:51:19.793769 osafimmnd [16355:immnd_evt.c:4367] T2 Send immediate
reply to client
May 7 19:51:19.793897 osafimmnd [16355:immnd_evt.c:4421] T2 Delayed reply,
wait for reply from implementer
May 7 19:51:19.793999 osafimmnd [16355:immnd_evt.c:4425] <<
immnd_evt_proc_admop
May 7 19:51:19.794101 osafimmnd [16355:immnd_evt.c:7576] >> dequeue_outgoing
May 7 19:51:19.794295 osafimmnd [16355:immnd_evt.c:7582] TR Pending replies:1
space:15 out list?:(nil)
May 7 19:51:19.794399 osafimmnd [16355:immnd_evt.c:7611] << dequeue_outgoing
May 7 19:51:19.794494 osafimmnd [16355:immnd_evt.c:7691] <<
immnd_evt_proc_fevs_rcv
6. OI sent a reply
May 7 19:51:19.794833 osafimmnd [16355:immsv_evt.c:5044] T8 Received:
IMMND_EVT_A2ND_ADMOP_RSP (21) from 2030f
7. Fetch of CONTIUATION suceeded but fetchAdmReqContinuation will not return
reqConn(Because setAdminReqContinuation is erased)
May 7 19:51:19.794989 osafimmnd [16355:ImmModel.cc:8881] T5 Fetch implCon for
invocation:128849018881
May 7 19:51:19.795090 osafimmnd [16355:ImmModel.cc:8891] T5 IMPL ADM
CONTINUATION 53 FOUND FOR 128849018881
May 7 19:51:19.795198 osafimmnd [16355:immnd_evt.c:4139] T2
invocation:128849018881, result:1
impl:53 req:0 dest:566315114766340 me:566315114766340
8. Because of this IMMND tries to forward message to smae ND (i.e PL-3 to PL-3)
May 7 19:51:19.795288 osafimmnd [16355:immnd_evt.c:4197] T2 FORWARDING TO
OTHER ND!
May 7 19:51:19.795391 osafimmnd [16355:immsv_evt.c:5030] T8 Sending:
IMMND_EVT_ND2ND_ADMOP_RSP to 2030f
May 7 19:51:19.811010 osafimmnd [16355:immnd_mds.c:0737] WA MDS Send Failed to
service:IMMND rc:2
May 7 19:51:19.811146 osafimmnd [16355:immnd_evt.c:4202] ER Problem in sending
to peer IMMND over MDS. Discarding admin op reply.
May 7 19:51:19.811495 osafimmnd [16355:immnd_evt.c:0669] WA Error code 2
returned for message type 21 - ignoring
May 7 19:51:19.811772 osafimmnd [16355:immsv_evt.c:5044] T8 Received:
IMMND_EVT_A2ND_IMM_FEVS (14) from 2030f
May 7 19:51:19.811896 osafimmnd [16355:immnd_evt.c:2457] T2 sender_count: 1
size: 46
May 7 19:51:19.812021 osafimmnd [16355:immnd_evt.c:2733] >>
immnd_fevs_local_checks
May 7 19:51:19.813525 osafimmnd [16355:immnd_evt.c:3065] <<
immnd_fevs_local_checks
May 7 19:51:19.813708 osafimmnd [16355:immnd_evt.c:2637] TR Replies pending:2
May 7 19:51:19.813860 osafimmnd [16355:immnd_evt.c:2656] T2 SENDING FEVS TO
IMMD
May 7 19:51:19.814467 osafimmnd [16355:immsv_evt.c:5025] T8 Sending:
IMMD_EVT_ND2D_FEVS_REQ to 0
9.
In general as surendar, observation of admin operation callback is not arrived
at OI looks to be not correct as the reply from OI is observed at point 6 in
above observation.
The above observation confirms, the IMM is working as expected.
---
** [tickets:#889] unknown: oi poll timeout differs during switchover and
failover scenarios**
**Status:** unassigned
**Milestone:** future
**Created:** Wed May 07, 2014 03:10 PM UTC by surender khetavath
**Last Updated:** Thu May 08, 2014 10:50 AM UTC
**Owner:** nobody
changeset : 5143.
test:
1) in a thread do oiInit()
2) oiImplSet() & OiObjectImplSet on an object
3) oiselectionObjectGet()
4) poll() on the fd
In the main thread
1) om init, ownerset,
2) invoke controller failover/switchover
3) AdminOp(ONE_SECOND) on the object
If the poll timeout value is 40secs, then OI doesn't receive AdminOp callback
and poll timesout.
If the poll timeout value is increased to say 80secs, then OI gets AdminOp
callback.
How does it differ?
1) is the imm operation held until failover is completed?
2) is the imm operation held until the failed node re-joins?
3) The time to receive cbk i.e more than 40secs is not acceptable for HA.
The same test using swithover succeeds i.e receives cbk under 20secs of poll
timeout.
two versions of logs attached
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
• 3 signs your SCM is hindering your productivity
• Requirements for releasing software faster
• Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets