- **Type**: defect --> enhancement


---

** [tickets:#216] CLMA: bad handle at saClmFinalize**

**Status:** unassigned
**Milestone:** future
**Created:** Wed May 15, 2013 07:45 AM UTC by Mathi Naickan
**Last Updated:** Fri Aug 15, 2014 11:23 AM UTC
**Owner:** nobody

We have seen once that SMF got a BAD_HANDLE response to saClmFinalize and this 
is after calling saClmClusterNodeGet with the exact same handle without any 
error. And finalize was not called twice for this handle.
Since saClmClusterNodeGet worked I assume it's a problem with the client_list 
since that's the other case you can get BAD_HANDLE.
After looking in the CLMA code I can see one potential problem. When 
clma_hdl_rec_del is called from saClmFinalize the clma_cb lock is not taken. So 
the client_list is changed without a lock taken. However in clma_hdl_rec_add 
the lock is taken when adding an entry to the list.
I doubt this is the problem in our case since we only have one thread using CLM 
in our process (unless some other SAF lib is using it). 
Another minor issue I noticed is that in clma_hdl_rec_add the version is not 
freed in case of an IPC error.



Changed 8 months ago by bertil


OK, this has happened before (see #1163) where an included CLMS log is 
indicating that CLMS is receiving CLMA down before the finalize msg and 
therefore finalize fails with BAD_HANDLE. So how can the CLMA down be received 
before the finalize ?? 
This is controlled by saClmFinalize in the CLMA library where the finalize msg 
is sent and the response is (should) be received before the CLMA port is 
removed ????
follow-up: ↓ 3   Changed 8 months ago by mathi


Well, i had previously checked the code, things look fine and as such this 
situation should not occur at all, because - an invocation of finalize() is a 
synchronous call, and the clma shutdown is done only after that call succeeds, 
so they should come in some order. 'This must be some special case', involving 
a reboot or something?
Anything different in the sequence of events or in that setup?
in reply to: ↑ 2   Changed 8 months ago by mathi


Replying to mathi:
Well, i had previously checked the code, things look fine and as such this 
situation should not occur at all, because - an invocation of finalize() is a 
synchronous call, and the clma shutdown is done only after that call succeeds, 
so they should come in some order. 'This must be some special case', involving 
a reboot or something?
Oops, typo there. I meant 'same' order! :-)
Anything different in the sequence of events or in that setup?


  Changed 8 months ago by bertil


One thing that might be special is that SMF makes saClmInitialize, 
saClmClusterNodeGet, saClmFinalize in one sequence (see smfnd_up in 
smfd_smfnd.c).
Another special thing is that this is during startup and it's the own node that 
fails. So what SMF is doing is that during startup when a SMFND is started we 
are using CLM to find out the CLM node name from the node id we got in the 
MDS_UP event for the SMFND. So when we get the MDS_UP for our own node (SMFND) 
it's quite early in the startup. But CLM has been started before so I guess it 
should work. We are not handling TRY_AGAIN yet in this SMF code which we'll fix.


Ticket #1163 : clma down from osafsmfd is coming before finalize, causing 
following error log

var/log/messege:
Apr 23 23:44:45 linux-yluj osafsmfd[21301]: Admin Op Timeout = 600000000000
Apr 23 23:44:45 linux-yluj osafsmfd[21301]: Cli Timeout = 600000000000
Apr 23 23:44:45 linux-yluj osafsmfd[21301]: Reboot Timeout = 600000000000
Apr 23 23:44:45 linux-yluj osafsmfd[21301]: SMF will use the STEP standard set 
of actions.
Apr 23 23:44:45 linux-yluj osafsmfd[21301]: saClmFinalize failed 9
clmd log :
Apr 24 13:17:05.491317 osafclmd [7550:clms_evt.c:0526] >> proc_clma_updn_mds_msg
Apr 24 13:17:05.491329 osafclmd [7550:clms_evt.c:0155] >> 
clms_client_delete_by_mds_dest: mds_dest 2020f98c67fc3
Apr 24 13:17:05.491346 osafclmd [7550:clms_evt.c:0183] >> clms_client_delete: 
client_id 23
Apr 24 13:17:05.491357 osafclmd [7550:clms_evt.c:0199] << clms_client_delete 
Apr 24 13:17:05.491369 osafclmd [7550:clms_evt.c:0167] << 
clms_client_delete_by_mds_dest
Apr 24 13:17:05.491393 osafclmd [7550:clms_mbcsv.c:0726] >> 
clms_send_async_update
Apr 24 13:17:05.491410 osafclmd [7550:clms_mbcsv.c:0657] >> mbcsv_callback 
Apr 24 13:17:05.491421 osafclmd [7550:clms_mbcsv.c:0770] >> 
ckpt_encode_cbk_handler
Apr 24 13:17:05.491440 osafclmd [7550:clms_mbcsv.c:0779] TR 
cbk_arg->info.encode.io_msg_type type 1
Apr 24 13:17:05.491452 osafclmd [7550:clms_mbcsv.c:1186] >> 
ckpt_encode_async_update 
Apr 24 13:17:05.491464 osafclmd [7550:clms_mbcsv.c:1203] TR data->header.type 8
Apr 24 13:17:05.491476 osafclmd [7550:clms_mbcsv.c:1328] TR Async Update 
CLMS_CKPT_AGENT_DOWN_REC
Apr 24 13:17:05.491487 osafclmd [7550:clms_mbcsv.c:1484] >> 
enc_mbcsv_agent_down_msg 
Apr 24 13:17:05.491498 osafclmd [7550:clms_mbcsv.c:1501] << 
enc_mbcsv_agent_down_msg
Apr 24 13:17:05.491509 osafclmd [7550:clms_mbcsv.c:1378] << 
ckpt_encode_async_update
Apr 24 13:17:05.491519 osafclmd [7550:clms_mbcsv.c:0820] << 
ckpt_encode_cbk_handler
Apr 24 13:17:05.491530 osafclmd [7550:clms_mbcsv.c:0697] << mbcsv_callback 
Apr 24 13:17:05.492097 osafclmd [7552:clms_mds.c:0790] >> clms_mds_dec 
Apr 24 13:17:05.492126 osafclmd [7552:clms_mds.c:0813] TR 
evt->info.msg.evt_type 0
Apr 24 13:17:05.492137 osafclmd [7552:clms_mds.c:0821] TR 
evt->info.msg.info.api_info.type 1
Apr 24 13:17:05.492149 osafclmd [7552:clms_mds.c:0089] T8 CLMSV_FINALIZE_REQ
Apr 24 13:17:05.492160 osafclmd [7552:clms_mds.c:0862] << clms_mds_dec 
Apr 24 13:17:05.492172 osafclmd [7552:clms_mds.c:0989] >> clms_mds_rcv: Event 
type 0
Apr 24 13:17:05.492184 osafclmd [7552:clms_mds.c:1003] << clms_mds_rcv 
Apr 24 13:17:05.493001 osafclmd [7550:clms_mbcsv.c:0748] << 
clms_send_async_update
Apr 24 13:17:05.493042 osafclmd [7550:clms_evt.c:0547] T4 ASYNC UPDATE SEND 
SUCCESS for CLMA_DOWN event..
Apr 24 13:17:05.493054 osafclmd [7550:clms_evt.c:0577] << 
proc_clma_updn_mds_msg 
Apr 24 13:17:05.493066 osafclmd [7550:clms_evt.c:1195] << clms_process_mbx
Apr 24 13:17:05.493083 osafclmd [7550:clms_evt.c:1149] >> clms_process_mbx 
Apr 24 13:17:05.493097 osafclmd [7550:clms_evt.c:1116] >> process_api_evt 
Apr 24 13:17:05.493108 osafclmd [7550:clms_evt.c:1060] >> proc_finalize_msg: 
finalize for client: client_id 23
Apr 24 13:17:05.493120 osafclmd [7550:clms_evt.c:0183] >> clms_client_delete: 
client_id 23
Apr 24 13:17:05.493132 osafclmd [7550:clms_evt.c:0115] TR client_id: 23 lookup 
failed
Apr 24 13:17:05.493143 osafclmd [7550:clms_evt.c:0199] << clms_client_delete 
Apr 24 13:17:05.493153 osafclmd [7550:clms_evt.c:1064] TR clms_client_delete 
FAILED: 1
Apr 24 13:17:05.493167 osafclmd [7550:clms_util.c:0147] TR Node found 131343
Apr 24 13:17:05.493210 osafclmd [7550:clms_util.c:0147] TR Node found 131599
Apr 24 13:17:05.493225 osafclmd [7550:clms_mds.c:1409] >> clms_mds_msg_send
Apr 24 13:17:05.493255 osafclmd [7550:clms_mds.c:1442] << clms_mds_msg_send 
Apr 24 13:17:05.493268 osafclmd [7550:clms_evt.c:1099] << proc_finalize_msg: 
finalize for client:client_id 23

This has now been seen again (see #2823). SMF is only using the CLM API to do 
saClmFinalize. So if there is a problem somewhere it must be in CLMA.
How SMF can affect the order between CLMA down and the finalize msg beats me.
In the logging included here the times are very different between the CLMS and 
SMF. I assume this is because it's on different nodes and the nodes are not 
synced in time.



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to