- **status**: unassigned --> accepted
- **assigned_to**: Mathi Naickan
- **Part**: - --> d
- **Priority**: critical --> major
- **Milestone**: future --> 4.4.1
- **Comment**:
Hi,
I have attached a patch(will send it out separately when i get my hands on the
latest review tool). However, please find my comments
1) w.r.t the log string and its relevance to /etc/opensaf/node_name:
The /etc/opensaf/node_name is an user exposed configuration file.
The node_name file contains the RDN value of the CLM node name.
(a) When opensaf cluster configuration is pre-provisioned using the OpenSAF IMM
tools:
the /etc/opensaf/node_name should contain one of the values specified
in nodes.cfg while generating the imm.xml.
(b) When opensaf cluster nodes are dynamically added at runtime:
the /etc/opensaf/node_name should contain the rdn value.
So, to the end user, the log messages should convey the relationship with the
node_name file in some grammar. I have changed the log to notice though and
reframed with your inputs also in the attached patch.
2) w.r.t the use case
I think it is primarily a case of non-existent configuration and also a case of
invalid configuration.
In the cluster expansion case, i think the expansion logic should first update
the cluster configuration, because otherwise the node startup will still be
seen as afailure attempt for the "unconfigured" node.
Note: Since, there is a way around the situation, i have changed the prioroity
to major. I will send a formal review request later, but please use the
attached patch(modified your patch with my impressions).
---
** [tickets:#816] CLM causes cluster restart when unknown node tries to join**
**Status:** accepted
**Milestone:** 4.4.1
**Created:** Fri Mar 21, 2014 01:15 PM UTC by Hans Feldt
**Last Updated:** Fri Mar 21, 2014 01:15 PM UTC
**Owner:** Mathi Naickan
When an unconfigured node tries to join an existing 4.4 CLM cluster the
osafclmd process segfaults, after failover the new active osafclmd segfaults
and we get a cluster restart.
Mar 21 14:06:12 SC-1 local0.err osafclmd[418]: ER CLM NodeName: 'PL-6' doesn't
match entry in imm.xml. Specify a correct node name in/etc/opensaf/node_name
Mar 21 14:06:12 SC-1 local0.notice osafamfnd[441]: NO
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
Mar 21 14:06:12 SC-1 local0.err osafamfnd[441]: ER
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
Mar 21 14:06:12 SC-1 local0.crit osafamfnd[441]: Rebooting OpenSAF NodeId =
131343 EE Name = , Reason: Component faulted: recovery is node failfast,
OwnNodeId = 131343, SupervisionTime = 60
Mar 21 14:06:35 SC-2 local0.notice osafamfd[431]: NO Node 'SC-1' left the
cluster
Mar 21 14:06:37 SC-2 local0.err osafclmd[415]: ER CLM NodeName: 'PL-6' doesn't
match entry in imm.xml. Specify a correct node name in/etc/opensaf/node_name
Mar 21 14:06:37 SC-2 local0.notice osafamfnd[439]: NO
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' :
Recovery is 'nodeFailfast'
Mar 21 14:06:37 SC-2 local0.err osafamfnd[439]: ER
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery
is:nodeFailfast
Mar 21 14:06:37 SC-2 local0.crit osafamfnd[439]: Rebooting OpenSAF NodeId =
131599 EE Name = , Reason: Component faulted: recovery is node failfast,
OwnNodeId = 131599, SupervisionTime = 60
The log entry is also wrong. It has the wrong level ER. It does not have to be
an error, this would happen during scale out - adding a new node. Should be
notice. The text itself is also not correct since it is normally not related to
imm.xml or contents of node_name.
I suggest the following log instead; "NO '<RDN value>' is not a configured
cluster node"
This is a regression, it works with 4.3
Patch attached with proposed solution.
---
Sent from sourceforge.net because [email protected] is
subscribed to https://sourceforge.net/p/opensaf/tickets/
To unsubscribe from further messages, a project admin can change settings at
https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets