- **Type**: defect --> enhancement
---
** [tickets:#216] CLMA: bad handle at saClmFinalize**
**Status:** unassigned
**Milestone:** future
**Created:** Wed May 15, 2013 07:45 AM UTC by Mathi Naickan
**Last Updated:** Fri Aug 15, 2014 11:23 AM UTC
**Owner:** nobody
We have seen once that SMF got a BAD_HANDLE response to saClmFinalize and this
is after calling saClmClusterNodeGet with the exact same handle without any
error. And finalize was not called twice for this handle.
Since saClmClusterNodeGet worked I assume it's a problem with the client_list
since that's the other case you can get BAD_HANDLE.
After looking in the CLMA code I can see one potential problem. When
clma_hdl_rec_del is called from saClmFinalize the clma_cb lock is not taken. So
the client_list is changed without a lock taken. However in clma_hdl_rec_add
the lock is taken when adding an entry to the list.
I doubt this is the problem in our case since we only have one thread using CLM
in our process (unless some other SAF lib is using it).
Another minor issue I noticed is that in clma_hdl_rec_add the version is not
freed in case of an IPC error.
Changed 8 months ago by bertil
OK, this has happened before (see #1163) where an included CLMS log is
indicating that CLMS is receiving CLMA down before the finalize msg and
therefore finalize fails with BAD_HANDLE. So how can the CLMA down be received
before the finalize ??
This is controlled by saClmFinalize in the CLMA library where the finalize msg
is sent and the response is (should) be received before the CLMA port is
removed ????
follow-up: ↓ 3 Changed 8 months ago by mathi
Well, i had previously checked the code, things look fine and as such this
situation should not occur at all, because - an invocation of finalize() is a
synchronous call, and the clma shutdown is done only after that call succeeds,
so they should come in some order. 'This must be some special case', involving
a reboot or something?
Anything different in the sequence of events or in that setup?
in reply to: ↑ 2 Changed 8 months ago by mathi
Replying to mathi:
Well, i had previously checked the code, things look fine and as such this
situation should not occur at all, because - an invocation of finalize() is a
synchronous call, and the clma shutdown is done only after that call succeeds,
so they should come in some order. 'This must be some special case', involving
a reboot or something?
Oops, typo there. I meant 'same' order! :-)
Anything different in the sequence of events or in that setup?
Changed 8 months ago by bertil
One thing that might be special is that SMF makes saClmInitialize,
saClmClusterNodeGet, saClmFinalize in one sequence (see smfnd_up in
smfd_smfnd.c).
Another special thing is that this is during startup and it's the own node that
fails. So what SMF is doing is that during startup when a SMFND is started we
are using CLM to find out the CLM node name from the node id we got in the
MDS_UP event for the SMFND. So when we get the MDS_UP for our own node (SMFND)
it's quite early in the startup. But CLM has been started before so I guess it
should work. We are not handling TRY_AGAIN yet in this SMF code which we'll fix.
Ticket #1163 : clma down from osafsmfd is coming before finalize, causing
following error log
var/log/messege:
Apr 23 23:44:45 linux-yluj osafsmfd[21301]: Admin Op Timeout = 600000000000
Apr 23 23:44:45 linux-yluj osafsmfd[21301]: Cli Timeout = 600000000000
Apr 23 23:44:45 linux-yluj osafsmfd[21301]: Reboot Timeout = 600000000000
Apr 23 23:44:45 linux-yluj osafsmfd[21301]: SMF will use the STEP standard set
of actions.
Apr 23 23:44:45 linux-yluj osafsmfd[21301]: saClmFinalize failed 9
clmd log :
Apr 24 13:17:05.491317 osafclmd [7550:clms_evt.c:0526] >> proc_clma_updn_mds_msg
Apr 24 13:17:05.491329 osafclmd [7550:clms_evt.c:0155] >>
clms_client_delete_by_mds_dest: mds_dest 2020f98c67fc3
Apr 24 13:17:05.491346 osafclmd [7550:clms_evt.c:0183] >> clms_client_delete:
client_id 23
Apr 24 13:17:05.491357 osafclmd [7550:clms_evt.c:0199] << clms_client_delete
Apr 24 13:17:05.491369 osafclmd [7550:clms_evt.c:0167] <<
clms_client_delete_by_mds_dest
Apr 24 13:17:05.491393 osafclmd [7550:clms_mbcsv.c:0726] >>
clms_send_async_update
Apr 24 13:17:05.491410 osafclmd [7550:clms_mbcsv.c:0657] >> mbcsv_callback
Apr 24 13:17:05.491421 osafclmd [7550:clms_mbcsv.c:0770] >>
ckpt_encode_cbk_handler
Apr 24 13:17:05.491440 osafclmd [7550:clms_mbcsv.c:0779] TR
cbk_arg->info.encode.io_msg_type type 1
Apr 24 13:17:05.491452 osafclmd [7550:clms_mbcsv.c:1186] >>
ckpt_encode_async_update
Apr 24 13:17:05.491464 osafclmd [7550:clms_mbcsv.c:1203] TR data->header.type 8
Apr 24 13:17:05.491476 osafclmd [7550:clms_mbcsv.c:1328] TR Async Update
CLMS_CKPT_AGENT_DOWN_REC
Apr 24 13:17:05.491487 osafclmd [7550:clms_mbcsv.c:1484] >>
enc_mbcsv_agent_down_msg
Apr 24 13:17:05.491498 osafclmd [7550:clms_mbcsv.c:1501] <<
enc_mbcsv_agent_down_msg
Apr 24 13:17:05.491509 osafclmd [7550:clms_mbcsv.c:1378] <<
ckpt_encode_async_update
Apr 24 13:17:05.491519 osafclmd [7550:clms_mbcsv.c:0820] <<
ckpt_encode_cbk_handler
Apr 24 13:17:05.491530 osafclmd [7550:clms_mbcsv.c:0697] << mbcsv_callback
Apr 24 13:17:05.492097 osafclmd [7552:clms_mds.c:0790] >> clms_mds_dec
Apr 24 13:17:05.492126 osafclmd [7552:clms_mds.c:0813] TR
evt->info.msg.evt_type 0
Apr 24 13:17:05.492137 osafclmd [7552:clms_mds.c:0821] TR
evt->info.msg.info.api_info.type 1
Apr 24 13:17:05.492149 osafclmd [7552:clms_mds.c:0089] T8 CLMSV_FINALIZE_REQ
Apr 24 13:17:05.492160 osafclmd [7552:clms_mds.c:0862] << clms_mds_dec
Apr 24 13:17:05.492172 osafclmd [7552:clms_mds.c:0989] >> clms_mds_rcv: Event
type 0
Apr 24 13:17:05.492184 osafclmd [7552:clms_mds.c:1003] << clms_mds_rcv
Apr 24 13:17:05.493001 osafclmd [7550:clms_mbcsv.c:0748] <<
clms_send_async_update
Apr 24 13:17:05.493042 osafclmd [7550:clms_evt.c:0547] T4 ASYNC UPDATE SEND
SUCCESS for CLMA_DOWN event..
Apr 24 13:17:05.493054 osafclmd [7550:clms_evt.c:0577] <<
proc_clma_updn_mds_msg
Apr 24 13:17:05.493066 osafclmd [7550:clms_evt.c:1195] << clms_process_mbx
Apr 24 13:17:05.493083 osafclmd [7550:clms_evt.c:1149] >> clms_process_mbx
Apr 24 13:17:05.493097 osafclmd [7550:clms_evt.c:1116] >> process_api_evt
Apr 24 13:17:05.493108 osafclmd [7550:clms_evt.c:1060] >> proc_finalize_msg:
finalize for client: client_id 23
Apr 24 13:17:05.493120 osafclmd [7550:clms_evt.c:0183] >> clms_client_delete:
client_id 23
Apr 24 13:17:05.493132 osafclmd [7550:clms_evt.c:0115] TR client_id: 23 lookup
failed
Apr 24 13:17:05.493143 osafclmd [7550:clms_evt.c:0199] << clms_client_delete
Apr 24 13:17:05.493153 osafclmd [7550:clms_evt.c:1064] TR clms_client_delete
FAILED: 1
Apr 24 13:17:05.493167 osafclmd [7550:clms_util.c:0147] TR Node found 131343
Apr 24 13:17:05.493210 osafclmd [7550:clms_util.c:0147] TR Node found 131599
Apr 24 13:17:05.493225 osafclmd [7550:clms_mds.c:1409] >> clms_mds_msg_send
Apr 24 13:17:05.493255 osafclmd [7550:clms_mds.c:1442] << clms_mds_msg_send
Apr 24 13:17:05.493268 osafclmd [7550:clms_evt.c:1099] << proc_finalize_msg:
finalize for client:client_id 23
This has now been seen again (see #2823). SMF is only using the CLM API to do
saClmFinalize. So if there is a problem somewhere it must be in CLMA.
How SMF can affect the order between CLMA down and the finalize msg beats me.
In the logging included here the times are very different between the CLMS and
SMF. I assume this is because it's on different nodes and the nodes are not
synced in time.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets