- **status**: assigned --> accepted
- **Milestone**: 5.0.2 --> next
- **Comment**:

Analysis:
There seems problem with atleast one CLM clinet which resides on PL-3. CLM is 
unable to send any message to this client.
1) CLMD creates this client as standby:
2) Mar 15 13:47:47.080883 osafclmd [2763:src/clm/clmd/clms_evt.c:0140] TR 
client_id: 63 lookup failed
Mar 15 13:47:47.080886 osafclmd [2763:src/clm/clmd/clms_evt.c:0250] >> 
clms_client_new: MDS dest 2030feebec01a
Mar 15 13:47:47.080888 osafclmd [2763:src/clm/clmd/clms_evt.c:0277] << 
clms_client_new: client_id 63

2) When user performs admin operation, CLMD tries to send track callback for 
complete step to this client but mds returns failure:
3) Mar 15 14:32:10.759655 osafclmd [2763:src/clm/clmd/clms_util.c:1095] TR 
Client ID 63 ,track_flags=3
Mar 15 14:32:10.759658 osafclmd [2763:src/clm/clmd/clms_imm.c:1380] >> 
clms_prep_and_send_track
Mar 15 14:32:10.759661 osafclmd [2763:src/clm/clmd/clms_util.c:0352] >> 
clms_nodedb_lookup
Mar 15 14:32:10.759664 osafclmd [2763:src/clm/clmd/clms_util.c:0354] TR 
patricia tree size 4
Mar 15 14:32:10.759667 osafclmd [2763:src/clm/clmd/clms_util.c:0149] TR Node 
found 131343
Mar 15 14:32:10.759670 osafclmd [2763:src/clm/clmd/clms_util.c:0149] TR Node 
found 131599
Mar 15 14:32:10.759673 osafclmd [2763:src/clm/clmd/clms_util.c:0149] TR Node 
found 131855
Mar 15 14:32:10.759676 osafclmd [2763:src/clm/clmd/clms_util.c:0149] TR Node 
found 132111
Mar 15 14:32:10.759687 osafclmd [2763:src/clm/clmd/clms_util.c:0375] TR 
num_nd_changes 4
Mar 15 14:32:10.759689 osafclmd [2763:src/clm/clmd/clms_util.c:0376] << 
clms_nodedb_lookup
Mar 15 14:32:10.759693 osafclmd [2763:src/clm/clmd/clms_mds.c:1494] >> 
clms_mds_msg_send
Mar 15 14:32:10.759728 osafclmd [2763:src/clm/clmd/clms_mds.c:1525] IN mds send 
returned: 2
Mar 15 14:32:10.759732 osafclmd [2763:src/clm/clmd/clms_mds.c:1527] << 
clms_mds_msg_send
Mar 15 14:32:10.759735 osafclmd [2763:src/clm/clmd/clms_imm.c:1439] TR callback 
msg send to clma  failed

4) Before admin operation on PL-3, this node was restarted. There is no 
evidence of this client going down in clmd traces.
5) When unlock operation was performed, CLMD again could not send membeship 
status to this client  and did not reply to IMM. Also admin op params are not 
reset.. Since admin operation params are not reset, no further admin operation 
are not allowed and getting timed out.







---

** [tickets:#2381] clmd: clm admin operation returns BAD_OP after rebooting 
node**

**Status:** accepted
**Milestone:** next
**Created:** Thu Mar 16, 2017 07:30 AM UTC by Ritu Raj
**Last Updated:** Thu Mar 16, 2017 09:01 AM UTC
**Owner:** Praveen
**Attachments:**

- 
[active_clmd.tgz](https://sourceforge.net/p/opensaf/tickets/2381/attachment/active_clmd.tgz)
 (1.3 MB; application/x-compressed-tar)
- 
[messages](https://sourceforge.net/p/opensaf/tickets/2381/attachment/messages) 
(1.9 MB; application/octet-stream)


###Environment details
OS : Suse 64bit
Changeset : 8701 ( 5.2.RC1)
4 nodes setup(2 controller and 2 payload)

###Summary
clm admin operation returns SA_AIS_ERR_BAD_OPERATION after rebooting node 

###Steps followed & Observed behaviour
1. Initially performed clm_lock operation on Payload (PL-3) and immediately 
restarted the same payload(PL-3)
    > init 6; exit
2. Later, performed clm_unlock operation on PL-3, and got message unlock 
operation got timed out but  still node joined the cluster  

> SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster 
> Mar 15 14:35:20 SLES-SLOT1 osafclmd[2763]: ER clms_imm_node_unlock failed
> Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Received node_up from 2030f: 
> msg_id 1
> Mar 15 14:35:20 SLES-SLOT1 osafamfd[2773]: NO Node 'PL-3' joined the cluster
> Mar 15 14:35:20 SLES-SLOT1 osafimmnd[2733]: NO Implementer connected: 197 
> (MsgQueueService131855) <0, 2030f>
> error - command timed out (alarm)

3. After, that if clm_lock or unlock opeartion performed it returns 
'SA_AIS_ERR_BAD_OPERATION'

SLES-SLOT1:~ # amf-adm lock safNode=PL-3,safCluster=myClmCluster
error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
SA_AIS_ERR_BAD_OPERATION (20)
> 
> SLES-SLOT1:~ # amf-adm unlock safNode=PL-3,safCluster=myClmCluster
> error - saImmOmAdminOperationInvoke_2 admin-op RETURNED: 
> SA_AIS_ERR_BAD_OPERATION (20)


Traces:
>From the traces:
Node PL-3 joined the cluster 
~~~
Mar 15 14:35:20.373997 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> 
clms_imm_admin_op_callback: Admin callback for 
nodename:safNode=PL-3,safCluster=myClmCluster, opId:1
Mar 15 14:35:20.374002 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:35:20.374006 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
Mar 15 14:35:20.374009 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << 
clms_node_get_by_name
Mar 15 14:35:20.374012 osafclmd [2763:src/clm/clmd/clms_imm.c:2223] >> 
clms_imm_node_unlock: Node name safNode=PL-3,safCluster=myClmCluster to unlock
Mar 15 14:35:20.374015 osafclmd [2763:src/clm/clmd/clms_imm.c:0579] >> 
clms_admin_state_update_rattr: Admin state 1 update for node 
safNode=PL-3,safCluster=myClmCluster
Mar 15 14:35:20.374018 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:35:20.374021 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
~~~
......
......
*but Sending track callback failed for SA_CLM_CHANGE_COMPLETED*
~~~
Mar 15 14:35:20.380860 osafclmd [2763:src/clm/clmd/clms_imm.c:1439] TR callback 
msg send to clma  failed
Mar 15 14:35:20.380869 osafclmd [2763:src/clm/clmd/clms_imm.c:1447] << 
clms_prep_and_send_track
Mar 15 14:35:20.380872 osafclmd [2763:src/clm/clmd/clms_imm.c:1220] TR Sending 
track callback failed for SA_CLM_CHANGE_COMPLETED
Mar 15 14:35:20.380875 osafclmd [2763:src/clm/clmd/clms_imm.c:1380] >> 
clms_prep_and_send_track
~~~
------

and later performed admin operation got failed as 'Another Admin operation 
already in progress'
~~~
Mar 15 14:51:21.878688 osafclmd [2763:src/clm/clmd/clms_imm.c:0939] >> 
clms_imm_admin_op_callback: Admin callback for 
nodename:safNode=PL-3,safCluster=myClmCluster, opId:2
Mar 15 14:51:21.878700 osafclmd [2763:src/clm/clmd/clms_util.c:0038] >> 
clms_node_get_by_name: name input safNode=PL-3,safCluster=myClmCluster length 36
Mar 15 14:51:21.878712 osafclmd [2763:src/clm/clmd/clms_util.c:0046] TR 
nodename after patricia tree get safNode=PL-3,safCluster=myClmCluster
Mar 15 14:51:21.878720 osafclmd [2763:src/clm/clmd/clms_util.c:0049] << 
clms_node_get_by_name
Mar 15 14:51:21.878726 osafclmd [2763:src/clm/clmd/clms_imm.c:0982] TR Another 
Admin operation already in progress: 4
~~~


Notes:
1. Syslog of Active controller attached
2. osafclmd of Active controller attached


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to