- **Comment**:
The analysis is not entirely accurate, however the unexpected error code of
BAD_OPER opened up one path (admin state update) in CLM that did not check the
impl_set flag.
---
** [tickets:#1067] 2PBE: clm exits on BAD_HANDLE on class implementer set**
**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Fri Sep 12, 2014 07:53 AM UTC by Sirisha Alla
**Last Updated:** Fri Sep 12, 2014 07:53 AM UTC
**Owner:** nobody
The issue is seen on SLES X86 VMs. Opensaf is running with changeset 5697 with
2PBE holding 50k objects.
This issue is initially thought to be realted to #946. On further analysis
opening a ticket on clm.
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmnd[9565]: NO Implementer locally
disconnected. Marking it as doomed 26 <5, 2020f> (safClmService)
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmpbed: IN Delaying class delete
at slave PBE due to ongoing commit of ccb:15/21
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmnd[9565]: NO Implementer
disconnected 26 <5, 2020f> (safClmService)
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmnd[9565]: NO Implementer
connected: 30 (safClmService) <324, 2020f>
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafclmd[9606]: ER
saImmOiClassImplementerSet failed for class SaClmNode rc:9, exiting
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafamfnd[9639]: NO
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to
'avaDown' : Recovery is 'nodeFailfast'
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafamfnd[9639]: ER
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown
Recovery is:nodeFailfast
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafamfnd[9639]: Rebooting OpenSAF
NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node
failfast, OwnNodeId = 131599, SupervisionTime = 60
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmnd[9565]: NO Implementer locally
disconnected. Marking it as doomed 30 <324, 2020f> (safClmService)
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmnd[9565]: WA IMMND - Client went
down so no response
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmnd[9565]: WA IMMND - Client went
down so no response
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmnd[9565]: NO Implementer
disconnected 30 <324, 2020f> (safClmService)
Sep 11 14:58:24 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node;
timeout=60
clmd traces on that node
Sep 11 14:58:24.715267 osafclmd [9606:clms_imm.c:0530] TR Implementer
not yet set: Switching on the tryagain flag
Sep 11 14:58:24.715296 osafclmd [9606:clms_imm.c:0533] <<
clms_node_update_rattr
Sep 11 14:58:24.715319 osafclmd [9606:clms_imm.c:0449] >>
clms_admin_state_update_rattr: Admin state 1 update for node
safNode=SC-1,safCluster=myClmCluster
Sep 11 14:58:24.715343 osafclmd [9606:clms_util.c:0036] >>
clms_node_get_by_name: name input safNode=SC-1,safCluster=myClmCluster
length 36
Sep 11 14:58:24.715369 osafclmd [9606:clms_util.c:0044] TR nodename
after patricia tree get safNode=SC-1,safCluster=myClmCluster
Sep 11 14:58:24.715393 osafclmd [9606:clms_util.c:0047] <<
clms_node_get_by_name
Sep 11 14:58:24.715421 osafclmd [9606:imma_oi_api.c:2275] T2
ERR_BAD_HANDLE: No initialized handle exists!
Sep 11 14:58:24.715452 osafclmd [9606:clms_imm.c:0488] IN
saImmOiRtObjectUpdate for safNode=SC-1,safCluster=myClmCluster failed
with rc = 9. Reinit with IMM
Sep 11 14:58:24.715480 osafclmd [9606:imma_oi_api.c:0622] >> saImmOiFinalize
Sep 11 14:58:24.715505 osafclmd [9606:imma_oi_api.c:0626] T2
ERR_BAD_HANDLE: No initialized handle exists!
....
Sep 11 14:58:24.719788 osafclmd [9606:clms_imm.c:0721] TR Implementer is
not set. Switching on flag in clms_cluster_update_rattr
Sep 11 14:58:24.719801 osafclmd [9606:clms_imm.c:0724] <<
clms_cluster_update_rattr
Sep 11 14:58:24.719815 osafclmd [9606:clms_imm.c:0633] <<
clms_retry_pending_rtupdates
Sep 11 14:58:24.719828 osafclmd [9606:clms_main.c:0435] TR There is an
IMM task to be tried again. setting poll time out to 500
.....
Sep 11 14:58:24.740335 osafclmd [9606:clms_imm.c:0620] <<
clms_retry_pending_rtupdates: Implementerset yet to happen, try later
Sep 11 14:58:24.740345 osafclmd [9606:clms_main.c:0435] TR There is an
IMM task to be tried again. setting poll time out to 500
Sep 11 14:58:24.744013 osafclmd [9606:imma_oi_api.c:1636] T2
ERR_BAD_HANDLE: No implementer is set for this handle
Sep 11 14:58:24.744055 osafclmd [9606:clms_imm.c:2291] ER
saImmOiClassImplementerSet failed for class SaClmNode rc:9, exiting
syslog and clmd traces attached.
Following is neels analysis:
Analyses:
1. Rtobjectupdate is failed CLM tries to do Reinit with IMM
Sep 11 14:58:21.425773 osafclmd [9606:imma_oi_api.c:2279] >>
saImmOiRtObjectUpdate_2
Sep 11 14:58:24.709408 osafclmd [9606:imma_oi_api.c:2554] <<
saImmOiRtObjectUpdate_2
Sep 11 14:58:24.709606 osafclmd [9606:clms_imm.c:0488] IN saImmOiRtObjectUpdate
for safNode=PL-4,safCluster=myClmCluster failed with rc = 20. Reinit with IMM
Sep 11 14:58:24.709700 osafclmd [9606:imma_oi_api.c:0622] >> saImmOiFinalize
Sep 11 14:58:24.711075 osafclmd [9606:imma_proc.c:0147] >>
imma_callback_ipc_destroy
Sep 11 14:58:24.711160 osafclmd [9606:imma_proc.c:0206] TR Deleting client node
Sep 11 14:58:24.711213 osafclmd [9606:imma_init.c:0323] >> imma_shutdown: use
count 1
Sep 11 14:58:24.711247 osafclmd [9606:imma_init.c:0208] >> imma_destroy
Sep 11 14:58:24.711605 osafclmd [9606:imma_db.c:1318] >> imma_db_destroy
Sep 11 14:58:24.711643 osafclmd [9606:imma_db.c:0129] >>
imma_client_tree_destroy
Sep 11 14:58:24.711672 osafclmd [9606:imma_db.c:0152] >>
imma_client_tree_cleanup
Sep 11 14:58:24.711701 osafclmd [9606:imma_db.c:0167] <<
imma_client_tree_cleanup
Sep 11 14:58:24.711732 osafclmd [9606:imma_db.c:0135] <<
imma_client_tree_destroy
Sep 11 14:58:24.711762 osafclmd [9606:imma_db.c:1327] << imma_db_destroy
Sep 11 14:58:24.711795 osafclmd [9606:imma_init.c:0334] << imma_shutdown: use
count 0
Sep 11 14:58:24.711827 osafclmd [9606:imma_oi_api.c:0790] << saImmOiFinalize
Sep 11 14:58:24.711860 osafclmd [9606:clms_imm.c:2319] >> clm_imm_reinit_bg
Sep 11 14:58:24.711952 osafclmd [9606:clms_imm.c:2326] << clm_imm_reinit_bg
2. Again Rtobjectupdate is failed CLM tries to do Reinit with IMM
sep 11 14:58:24.715452 osafclmd [9606:clms_imm.c:0488] IN saImmOiRtObjectUpdate
for safNode=SC-1,safCluster=myClmCluster failed with rc = 9. Reinit with IMM
Sep 11 14:58:24.715480 osafclmd [9606:imma_oi_api.c:0622] >> saImmOiFinalize
Sep 11 14:58:24.715505 osafclmd [9606:imma_oi_api.c:0626] T2 ERR_BAD_HANDLE: No
initialized handle exists!
Sep 11 14:58:24.715530 osafclmd [9606:clms_imm.c:2319] >> clm_imm_reinit_bg
Sep 11 14:58:24.715638 osafclmd [9606:clms_imm.c:2326] << clm_imm_reinit_bg
3.Rtobjectupdate is failed CLM tries to do Reinit with IMM
Sep 11 14:58:24.719538 osafclmd [9606:imma_oi_api.c:2275] T2 ERR_BAD_HANDLE: No
initialized handle exists!
Sep 11 14:58:24.719557 osafclmd [9606:clms_imm.c:0488] IN saImmOiRtObjectUpdate
for safNode=SC-2,safCluster=myClmCluster failed with rc = 9. Reinit with IMM
Sep 11 14:58:24.719571 osafclmd [9606:imma_oi_api.c:0622] >> saImmOiFinalize
Sep 11 14:58:24.719584 osafclmd [9606:imma_oi_api.c:0626] T2 ERR_BAD_HANDLE: No
initialized handle exists!
Sep 11 14:58:24.719599 osafclmd [9606:clms_imm.c:2319] >> clm_imm_reinit_bg
Sep 11 14:58:24.719690 osafclmd [9606:clms_imm.c:2326] << clm_imm_reinit_bg
4. CLM does Oiinitialize three times
Sep 11 14:58:24.718336 osafclmd [9606:clms_imm.c:2272] >> clm_imm_reinit_thread
Sep 11 14:58:24.720346 osafclmd [9606:imma_init.c:0263] >> imma_startup: use
count 0
Sep 11 14:58:24.727340 osafclmd [9606:imma_mds.c:0421] T3 IMMND UP
Sep 11 14:58:24.727378 osafclmd [9606:osaf_secutil.c:0270] >>
osaf_auth_server_connect
Sep 11 14:58:24.727730 osafclmd [9606:osaf_secutil.c:0322] <<
osaf_auth_server_connect
Sep 11 14:58:24.727750 osafclmd [9606:mds_main.c:0249] TR received type:78,
status:0
Sep 11 14:58:24.727765 osafclmd [9606:imma_proc.c:0419] >>
imma_determine_clients_to_resurrect
Sep 11 14:58:24.727781 osafclmd [9606:imma_proc.c:0501] <<
imma_determine_clients_to_resurrect
Sep 11 14:58:24.727808 osafclmd [9606:imma_init.c:0063] >> imma_sync_with_immnd
Sep 11 14:58:24.727829 osafclmd [9606:imma_init.c:0173] TR Client agent
successfully initialized
Sep 11 14:58:24.727841 osafclmd [9606:imma_init.c:0286] << imma_startup: use
count 1
Sep 11 14:58:24.727860 osafclmd [9606:imma_oi_api.c:0122] T2 OI client version
A.2.1
Sep 11 14:58:24.729020 osafclmd [9606:imma_oi_api.c:0250] T1 Trying to add OI
client id:324 node:2020f handle:1440002020f
Sep 11 14:58:24.729384 osafclmd [9606:imma_oi_api.c:0341] << saImmOiInitialize_2
Sep 11 14:58:24.729456 osafclmd [9606:imma_oi_api.c:0364] >>
saImmOiSelectionObjectGet
Sep 11 14:58:24.729482 osafclmd [9606:imma_oi_api.c:0440] <<
saImmOiSelectionObjectGet
Sep 11 14:58:24.734381 osafimmnd [9565:immsv_evt.c:5382] T8 Received:
IMMND_EVT_A2ND_IMM_OI_INIT (4) from 2020f
Sep 11 14:58:24.734549 osafimmnd [9565:ImmModel.cc:3465] >> accessControlMode
Sep 11 14:58:24.734595 osafimmnd [9565:ImmModel.cc:3482] << accessControlMode: 0
Sep 11 14:58:24.734619 osafimmnd [9565:immnd_evt.c:0818] T2 Added client with
id: 1450002020f <node:2020f, count:325>
Sep 11 14:58:24.735833 osafimmnd [9565:immsv_evt.c:5382] T8 Received:
IMMND_EVT_A2ND_OI_IMPL_SET (40) from 2020f
Sep 11 14:58:24.735921 osafimmnd [9565:immnd_evt.c:2429] TR Messages pending:2
Sep 11 14:58:24.735937 osafimmnd [9565:immsv_evt.c:5363] T8 Sending:
IMMD_EVT_ND2D_IMPLSET_REQ to 0
Sep 11 14:58:24.738250 osafimmnd [9565:immsv_evt.c:5382] T8 Received:
IMMND_EVT_A2ND_IMM_OI_INIT (4) from 2020f
Sep 11 14:58:24.738290 osafimmnd [9565:ImmModel.cc:3465] >> accessControlMode
Sep 11 14:58:24.738313 osafimmnd [9565:ImmModel.cc:3482] << accessControlMode: 0
Sep 11 14:58:24.738329 osafimmnd [9565:immnd_evt.c:0818] T2 Added client with
id: 1460002020f <node:2020f, count:326>
5. CLM does implementerset the first OI initialize should have reached
Sep 11 14:58:24.735833 osafimmnd [9565:immsv_evt.c:5382] T8 Received:
IMMND_EVT_A2ND_OI_IMPL_SET (40) from 2020f
Sep 11 14:58:24.735921 osafimmnd [9565:immnd_evt.c:2429] TR Messages pending:2
Sep 11 14:58:24.735937 osafimmnd [9565:immsv_evt.c:5363] T8 Sending:
IMMD_EVT_ND2D_IMPLSET_REQ to 0
Sep 11 14:58:24.743689 osafimmnd [9565:immnd_evt.c:8502] >>
immnd_evt_proc_fevs_rcv
Sep 11 14:58:24.743729 osafimmnd [9565:immnd_evt.c:8518] T2 FEVS from myself,
still pending:2
Sep 11 14:58:24.743750 osafimmnd [9565:immsv_evt.c:5382] T8 Received:
IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0
Sep 11 14:58:24.743765 osafimmnd [9565:immnd_evt.c:9138] T2 originated here?:1
nodeId:2020f conn: 324
Sep 11 14:58:24.743776 osafimmnd [9565:ImmModel.cc:12116] >> implementerSet
Sep 11 14:58:24.743797 osafimmnd [9565:ImmModel.cc:12157] T7 Re-using
implementer for safClmService
Sep 11 14:58:24.743842 osafimmnd [9565:ImmModel.cc:12238] NO Implementer
connected: 30 (safClmService) <324, 2020f>
Sep 11 14:58:24.743883 osafimmnd [9565:ImmModel.cc:12302] << implementerSet
6.CLM tries to do saImmOiClassImplementerSet
Sep 11 14:58:24.744013 osafclmd [9606:imma_oi_api.c:1636] T2 ERR_BAD_HANDLE: No
implementer is set for this handle
Sep 11 14:58:24.744055 osafclmd [9606:clms_imm.c:2291] ER
saImmOiClassImplementerSet failed for class SaClmNode rc:9, exiting
Conclusion:
Each time RTobjectupdate is failed CLM is trying to re-init with IMM. In this
case it tries to re-init for three times and three threads are created for
each re-init. Oihandle is shared among three threads. since CLMD
clm_imm_reinit_thread is using shared variables for (cb->immOiHandle) for CLM
handle, the handle got assigned to latest OIinitialize (which is Added client
with id: 1460002020f <node:2020f, count:326>) . After succeding implementerset
in the first thread it tries to set saImmOiClassImplementerSet with OIhandle of
latest thread (thread3). so, ERR_BAD_HANDLE is returned.
CLM Before retrying another RTobjectupdate it has to wait untillthe IMM-reinit
is succededs(instead of spawning multiple threads).
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets