Hi,
Syslogs in systemlogs.tgz indicates that cluster was coming up with SC-1, SC-2
and PL-3 and also some CCB operations were initiated when SC-2 and PL-3 were
still joining.
If CCB operations are related to scale out, then there is very thin window of
time in which this issue can occur. I increased this thin window by adding some
sleeps and reproduced the issue . Logs, traces and configuration file to add a
payload in clm cluster are attached in clm_issue.tgz. I have not used scale out
script but I think with that also it can be reproduced.
Steps to reproduce:
1) Bring first controller up with attached imm.xml. It contains all the MW
objects for PL-3 except object of CLM node.
2) When standby controller is coming up and clms reads cluster information from
IMM, add PL-3 configuraion immcfg -f pl_3.xml. Active CLMS will not checkpoint
this node to standby CLMS as it is still not visible via MBCSV.
3) Now when Standby is trying to encode MBCSV request for COLD sync, modify
attribute of CLM node PL-3 with command:
immcfg -a saClmNodeLockCallbackTimeout=50000
safNode=PL-3,safCluster=myClmCluster
4) SInce standby CLMS is now visible, active will try to send this runtime
information to standby. PL-3 was added runtime after stanbby has read the
configuration from IMM so it will assert for not finding the PL-3.
Solution: One solution could be: Active should not send async updates if cold
sync is not completed. Other solution could be: standby CLMS should ignore
async update requests if cold sync is not completed. In cold sync messages it
will get updated states. Need to evaluate.
Thanks,
Praveen
Attachments:
-
[clm_issue.tgz](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/b86ccaf0/4c32/attachment/clm_issue.tgz)
(66.5 kB; application/x-compressed)
---
** [tickets:#2265] clm: clmd coredump**
**Status:** assigned
**Milestone:** 5.0.2
**Created:** Mon Jan 16, 2017 08:51 AM UTC by Hung Nguyen
**Last Updated:** Mon Jan 23, 2017 12:12 PM UTC
**Owner:** Praveen
Jan 11 10:36:23 SC-2 osafclmd[14467]: ER Node is NULL,problem with the database.
**Jan 11 10:36:23 SC-2 osafclmd[14467]:
../../../../../../../opensaf/osaf/services/saf/clmsv/clms/clms_mbcsv.c:467:
ckpt_proc_node_rec: Assertion '0' failed.**
Jan 11 10:36:23 SC-2 osafamfnd[14497]: NO
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
Jan 11 10:36:23 SC-2 osafamfnd[14497]: ER
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
Jan 11 10:36:23 SC-2 osafamfnd[14497]: Rebooting OpenSAF NodeId = 131599 EE
Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId =
131599, SupervisionTime = 60
Jan 11 10:36:23 SC-2 opensaf_reboot: Rebooting local node; timeout=60
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets