---

** [tickets:#1067] 2PBE: clm exits on BAD_HANDLE on class implementer set**

**Status:** unassigned
**Milestone:** 4.3.3
**Created:** Fri Sep 12, 2014 07:53 AM UTC by Sirisha Alla
**Last Updated:** Fri Sep 12, 2014 07:53 AM UTC
**Owner:** nobody

The issue is seen on SLES X86 VMs. Opensaf is running with changeset 5697 with 
2PBE holding 50k objects.

This issue is initially thought to be realted to #946. On further analysis 
opening a ticket on clm.

Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmnd[9565]: NO Implementer locally
disconnected. Marking it as doomed 26 <5, 2020f> (safClmService)
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmpbed: IN Delaying class delete
at slave PBE due to ongoing commit of ccb:15/21
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmnd[9565]: NO Implementer
disconnected 26 <5, 2020f> (safClmService)
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmnd[9565]: NO Implementer
connected: 30 (safClmService) <324, 2020f>
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafclmd[9606]: ER
saImmOiClassImplementerSet failed for class SaClmNode rc:9, exiting
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafamfnd[9639]: NO
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to
'avaDown' : Recovery is 'nodeFailfast'
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafamfnd[9639]: ER
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown
Recovery is:nodeFailfast
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafamfnd[9639]: Rebooting OpenSAF
NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node
failfast, OwnNodeId = 131599, SupervisionTime = 60
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmnd[9565]: NO Implementer locally
disconnected. Marking it as doomed 30 <324, 2020f> (safClmService)
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmnd[9565]: WA IMMND - Client went
down so no response
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmnd[9565]: WA IMMND - Client went
down so no response
Sep 11 14:58:24 SLES-64BIT-SLOT2 osafimmnd[9565]: NO Implementer
disconnected 30 <324, 2020f> (safClmService)
Sep 11 14:58:24 SLES-64BIT-SLOT2 opensaf_reboot: Rebooting local node;
timeout=60

clmd traces on that node

Sep 11 14:58:24.715267 osafclmd [9606:clms_imm.c:0530] TR Implementer
not yet set: Switching on the tryagain flag
Sep 11 14:58:24.715296 osafclmd [9606:clms_imm.c:0533] <<
clms_node_update_rattr
Sep 11 14:58:24.715319 osafclmd [9606:clms_imm.c:0449] >>
clms_admin_state_update_rattr: Admin state 1 update for node
safNode=SC-1,safCluster=myClmCluster
Sep 11 14:58:24.715343 osafclmd [9606:clms_util.c:0036] >>
clms_node_get_by_name: name input safNode=SC-1,safCluster=myClmCluster
length 36
Sep 11 14:58:24.715369 osafclmd [9606:clms_util.c:0044] TR nodename
after patricia tree get safNode=SC-1,safCluster=myClmCluster
Sep 11 14:58:24.715393 osafclmd [9606:clms_util.c:0047] <<
clms_node_get_by_name
Sep 11 14:58:24.715421 osafclmd [9606:imma_oi_api.c:2275] T2
ERR_BAD_HANDLE: No initialized handle exists!
Sep 11 14:58:24.715452 osafclmd [9606:clms_imm.c:0488] IN
saImmOiRtObjectUpdate for safNode=SC-1,safCluster=myClmCluster failed
with rc = 9. Reinit with IMM
Sep 11 14:58:24.715480 osafclmd [9606:imma_oi_api.c:0622] >> saImmOiFinalize
Sep 11 14:58:24.715505 osafclmd [9606:imma_oi_api.c:0626] T2
ERR_BAD_HANDLE: No initialized handle exists!

....

Sep 11 14:58:24.719788 osafclmd [9606:clms_imm.c:0721] TR Implementer is
not set. Switching on flag in clms_cluster_update_rattr
Sep 11 14:58:24.719801 osafclmd [9606:clms_imm.c:0724] <<
clms_cluster_update_rattr
Sep 11 14:58:24.719815 osafclmd [9606:clms_imm.c:0633] <<
clms_retry_pending_rtupdates
Sep 11 14:58:24.719828 osafclmd [9606:clms_main.c:0435] TR There is an
IMM task to be tried again. setting poll time out to 500

.....
Sep 11 14:58:24.740335 osafclmd [9606:clms_imm.c:0620] <<
clms_retry_pending_rtupdates: Implementerset yet to happen, try later
Sep 11 14:58:24.740345 osafclmd [9606:clms_main.c:0435] TR There is an
IMM task to be tried again. setting poll time out to 500
Sep 11 14:58:24.744013 osafclmd [9606:imma_oi_api.c:1636] T2
ERR_BAD_HANDLE: No implementer is set for this handle
Sep 11 14:58:24.744055 osafclmd [9606:clms_imm.c:2291] ER
saImmOiClassImplementerSet failed for class SaClmNode rc:9, exiting 

syslog and clmd traces attached.

Following is neels analysis:

Analyses:

1. Rtobjectupdate is failed CLM tries to do Reinit with IMM
Sep 11 14:58:21.425773 osafclmd [9606:imma_oi_api.c:2279] >> 
saImmOiRtObjectUpdate_2
Sep 11 14:58:24.709408 osafclmd [9606:imma_oi_api.c:2554] << 
saImmOiRtObjectUpdate_2
Sep 11 14:58:24.709606 osafclmd [9606:clms_imm.c:0488] IN saImmOiRtObjectUpdate 
for safNode=PL-4,safCluster=myClmCluster failed with rc = 20. Reinit with IMM
Sep 11 14:58:24.709700 osafclmd [9606:imma_oi_api.c:0622] >> saImmOiFinalize
Sep 11 14:58:24.711075 osafclmd [9606:imma_proc.c:0147] >> 
imma_callback_ipc_destroy
Sep 11 14:58:24.711160 osafclmd [9606:imma_proc.c:0206] TR Deleting client node
Sep 11 14:58:24.711213 osafclmd [9606:imma_init.c:0323] >> imma_shutdown: use 
count 1
Sep 11 14:58:24.711247 osafclmd [9606:imma_init.c:0208] >> imma_destroy
Sep 11 14:58:24.711605 osafclmd [9606:imma_db.c:1318] >> imma_db_destroy
Sep 11 14:58:24.711643 osafclmd [9606:imma_db.c:0129] >> 
imma_client_tree_destroy
Sep 11 14:58:24.711672 osafclmd [9606:imma_db.c:0152] >> 
imma_client_tree_cleanup
Sep 11 14:58:24.711701 osafclmd [9606:imma_db.c:0167] << 
imma_client_tree_cleanup
Sep 11 14:58:24.711732 osafclmd [9606:imma_db.c:0135] << 
imma_client_tree_destroy
Sep 11 14:58:24.711762 osafclmd [9606:imma_db.c:1327] << imma_db_destroy
Sep 11 14:58:24.711795 osafclmd [9606:imma_init.c:0334] << imma_shutdown: use 
count 0
Sep 11 14:58:24.711827 osafclmd [9606:imma_oi_api.c:0790] << saImmOiFinalize
Sep 11 14:58:24.711860 osafclmd [9606:clms_imm.c:2319] >> clm_imm_reinit_bg
Sep 11 14:58:24.711952 osafclmd [9606:clms_imm.c:2326] << clm_imm_reinit_bg

2. Again Rtobjectupdate is failed CLM tries to do Reinit with IMM
sep 11 14:58:24.715452 osafclmd [9606:clms_imm.c:0488] IN saImmOiRtObjectUpdate 
for safNode=SC-1,safCluster=myClmCluster failed with rc = 9. Reinit with IMM
Sep 11 14:58:24.715480 osafclmd [9606:imma_oi_api.c:0622] >> saImmOiFinalize
Sep 11 14:58:24.715505 osafclmd [9606:imma_oi_api.c:0626] T2 ERR_BAD_HANDLE: No 
initialized handle exists!
Sep 11 14:58:24.715530 osafclmd [9606:clms_imm.c:2319] >> clm_imm_reinit_bg
Sep 11 14:58:24.715638 osafclmd [9606:clms_imm.c:2326] << clm_imm_reinit_bg

3.Rtobjectupdate is failed CLM tries to do Reinit with IMM
Sep 11 14:58:24.719538 osafclmd [9606:imma_oi_api.c:2275] T2 ERR_BAD_HANDLE: No 
initialized handle exists!
Sep 11 14:58:24.719557 osafclmd [9606:clms_imm.c:0488] IN saImmOiRtObjectUpdate 
for safNode=SC-2,safCluster=myClmCluster failed with rc = 9. Reinit with IMM
Sep 11 14:58:24.719571 osafclmd [9606:imma_oi_api.c:0622] >> saImmOiFinalize
Sep 11 14:58:24.719584 osafclmd [9606:imma_oi_api.c:0626] T2 ERR_BAD_HANDLE: No 
initialized handle exists!
Sep 11 14:58:24.719599 osafclmd [9606:clms_imm.c:2319] >> clm_imm_reinit_bg
Sep 11 14:58:24.719690 osafclmd [9606:clms_imm.c:2326] << clm_imm_reinit_bg

4. CLM does Oiinitialize three times

Sep 11 14:58:24.718336 osafclmd [9606:clms_imm.c:2272] >> clm_imm_reinit_thread
Sep 11 14:58:24.720346 osafclmd [9606:imma_init.c:0263] >> imma_startup: use 
count 0
Sep 11 14:58:24.727340 osafclmd [9606:imma_mds.c:0421] T3 IMMND UP
Sep 11 14:58:24.727378 osafclmd [9606:osaf_secutil.c:0270] >> 
osaf_auth_server_connect
Sep 11 14:58:24.727730 osafclmd [9606:osaf_secutil.c:0322] << 
osaf_auth_server_connect
Sep 11 14:58:24.727750 osafclmd [9606:mds_main.c:0249] TR received type:78, 
status:0
Sep 11 14:58:24.727765 osafclmd [9606:imma_proc.c:0419] >> 
imma_determine_clients_to_resurrect
Sep 11 14:58:24.727781 osafclmd [9606:imma_proc.c:0501] << 
imma_determine_clients_to_resurrect
Sep 11 14:58:24.727808 osafclmd [9606:imma_init.c:0063] >> imma_sync_with_immnd
Sep 11 14:58:24.727829 osafclmd [9606:imma_init.c:0173] TR Client agent 
successfully initialized
Sep 11 14:58:24.727841 osafclmd [9606:imma_init.c:0286] << imma_startup: use 
count 1
Sep 11 14:58:24.727860 osafclmd [9606:imma_oi_api.c:0122] T2 OI client version 
A.2.1
Sep 11 14:58:24.729020 osafclmd [9606:imma_oi_api.c:0250] T1 Trying to add OI 
client id:324 node:2020f handle:1440002020f
Sep 11 14:58:24.729384 osafclmd [9606:imma_oi_api.c:0341] << saImmOiInitialize_2
Sep 11 14:58:24.729456 osafclmd [9606:imma_oi_api.c:0364] >> 
saImmOiSelectionObjectGet
Sep 11 14:58:24.729482 osafclmd [9606:imma_oi_api.c:0440] << 
saImmOiSelectionObjectGet


Sep 11 14:58:24.734381 osafimmnd [9565:immsv_evt.c:5382] T8 Received: 
IMMND_EVT_A2ND_IMM_OI_INIT (4) from 2020f
Sep 11 14:58:24.734549 osafimmnd [9565:ImmModel.cc:3465] >> accessControlMode
Sep 11 14:58:24.734595 osafimmnd [9565:ImmModel.cc:3482] << accessControlMode: 0
Sep 11 14:58:24.734619 osafimmnd [9565:immnd_evt.c:0818] T2 Added client with 
id: 1450002020f <node:2020f, count:325>
Sep 11 14:58:24.735833 osafimmnd [9565:immsv_evt.c:5382] T8 Received: 
IMMND_EVT_A2ND_OI_IMPL_SET (40) from 2020f
Sep 11 14:58:24.735921 osafimmnd [9565:immnd_evt.c:2429] TR Messages pending:2
Sep 11 14:58:24.735937 osafimmnd [9565:immsv_evt.c:5363] T8 Sending:  
IMMD_EVT_ND2D_IMPLSET_REQ to 0
Sep 11 14:58:24.738250 osafimmnd [9565:immsv_evt.c:5382] T8 Received: 
IMMND_EVT_A2ND_IMM_OI_INIT (4) from 2020f
Sep 11 14:58:24.738290 osafimmnd [9565:ImmModel.cc:3465] >> accessControlMode
Sep 11 14:58:24.738313 osafimmnd [9565:ImmModel.cc:3482] << accessControlMode: 0
Sep 11 14:58:24.738329 osafimmnd [9565:immnd_evt.c:0818] T2 Added client with 
id: 1460002020f <node:2020f, count:326>


5. CLM does implementerset the first OI initialize should have reached

Sep 11 14:58:24.735833 osafimmnd [9565:immsv_evt.c:5382] T8 Received: 
IMMND_EVT_A2ND_OI_IMPL_SET (40) from 2020f
Sep 11 14:58:24.735921 osafimmnd [9565:immnd_evt.c:2429] TR Messages pending:2
Sep 11 14:58:24.735937 osafimmnd [9565:immsv_evt.c:5363] T8 Sending:  
IMMD_EVT_ND2D_IMPLSET_REQ to 0

Sep 11 14:58:24.743689 osafimmnd [9565:immnd_evt.c:8502] >> 
immnd_evt_proc_fevs_rcv
Sep 11 14:58:24.743729 osafimmnd [9565:immnd_evt.c:8518] T2 FEVS from myself, 
still pending:2
Sep 11 14:58:24.743750 osafimmnd [9565:immsv_evt.c:5382] T8 Received: 
IMMND_EVT_D2ND_IMPLSET_RSP (60) from 0
Sep 11 14:58:24.743765 osafimmnd [9565:immnd_evt.c:9138] T2 originated here?:1 
nodeId:2020f conn: 324
Sep 11 14:58:24.743776 osafimmnd [9565:ImmModel.cc:12116] >> implementerSet
Sep 11 14:58:24.743797 osafimmnd [9565:ImmModel.cc:12157] T7 Re-using 
implementer for safClmService
Sep 11 14:58:24.743842 osafimmnd [9565:ImmModel.cc:12238] NO Implementer 
connected: 30 (safClmService) <324, 2020f>
Sep 11 14:58:24.743883 osafimmnd [9565:ImmModel.cc:12302] << implementerSet

6.CLM tries to do saImmOiClassImplementerSet

Sep 11 14:58:24.744013 osafclmd [9606:imma_oi_api.c:1636] T2 ERR_BAD_HANDLE: No 
implementer is set for this handle
Sep 11 14:58:24.744055 osafclmd [9606:clms_imm.c:2291] ER 
saImmOiClassImplementerSet failed for class SaClmNode rc:9, exiting


Conclusion:

Each time RTobjectupdate is failed CLM is trying to re-init with IMM. In this 
case it tries to re-init for three times and  three threads are created for 
each re-init. Oihandle is shared among three threads. since CLMD 
clm_imm_reinit_thread is using shared variables for (cb->immOiHandle) for CLM 
handle, the handle got assigned to latest OIinitialize (which is Added client 
with id: 1460002020f <node:2020f, count:326>) . After succeding implementerset 
in the first thread it tries to set saImmOiClassImplementerSet with OIhandle of 
latest thread (thread3). so, ERR_BAD_HANDLE is returned.

CLM Before retrying another RTobjectupdate it has to wait untillthe IMM-reinit 
is succededs(instead of spawning multiple threads). 


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to