Hi,
Syslogs in systemlogs.tgz indicates that cluster was coming up with SC-1, SC-2 
and PL-3 and also some CCB operations were initiated when SC-2 and PL-3 were 
still joining.
If CCB operations are related to scale out, then there is very thin window of 
time in which this issue can occur. I increased this thin window by adding some 
sleeps and reproduced the issue . Logs, traces and configuration file to add a 
payload in clm cluster are attached in clm_issue.tgz. I have not used scale out 
script but I think with that also it can be reproduced.

Steps to reproduce:
1) Bring first controller up with attached imm.xml. It contains all the MW 
objects for PL-3 except object of CLM node.
2) When standby controller is coming up and clms reads cluster information from 
IMM, add PL-3 configuraion  immcfg -f pl_3.xml. Active CLMS will not checkpoint 
this node to standby CLMS as it is still not visible via MBCSV.
3) Now when Standby is trying to encode MBCSV request for COLD sync, modify 
attribute of CLM node PL-3 with command:
immcfg -a saClmNodeLockCallbackTimeout=50000 
safNode=PL-3,safCluster=myClmCluster
4) SInce standby CLMS is now visible, active will try to send this runtime 
information to standby. PL-3 was added runtime after stanbby has read the 
configuration from IMM so it will assert for not finding the PL-3.

Solution: One solution could be: Active should not send async updates if cold 
sync is not completed. Other solution could be: standby CLMS should ignore 
async update requests if cold sync is not completed. In cold sync messages it 
will get updated states. Need to evaluate.

Thanks,
Praveen



Attachments:

- 
[clm_issue.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/b86ccaf0/4c32/attachment/clm_issue.tgz)
 (66.5 kB; application/x-compressed)


---

** [tickets:#2265] clm: clmd coredump**

**Status:** assigned
**Milestone:** 5.0.2
**Created:** Mon Jan 16, 2017 08:51 AM UTC by Hung Nguyen
**Last Updated:** Mon Jan 23, 2017 12:12 PM UTC
**Owner:** Praveen


Jan 11 10:36:23 SC-2 osafclmd[14467]: ER Node is NULL,problem with the database.
**Jan 11 10:36:23 SC-2 osafclmd[14467]: 
../../../../../../../opensaf/osaf/services/saf/clmsv/clms/clms_mbcsv.c:467: 
ckpt_proc_node_rec: Assertion '0' failed.**
Jan 11 10:36:23 SC-2 osafamfnd[14497]: NO 
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Jan 11 10:36:23 SC-2 osafamfnd[14497]: ER 
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Jan 11 10:36:23 SC-2 osafamfnd[14497]: Rebooting OpenSAF NodeId = 131599 EE 
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131599, SupervisionTime = 60
Jan 11 10:36:23 SC-2 opensaf_reboot: Rebooting local node; timeout=60


---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to