- **status**: unassigned --> accepted
- **assigned_to**: Mathi Naickan
- **Part**: - --> d
- **Priority**: critical --> major
- **Milestone**: future --> 4.4.1
- **Comment**:

Hi,

I have attached a patch(will send it out separately when i get my hands on the 
latest review tool). However, please find my comments 
1) w.r.t the log string and its relevance to /etc/opensaf/node_name:

The /etc/opensaf/node_name is an user exposed configuration file.
The node_name file contains the RDN value of the CLM node name.
(a) When opensaf cluster configuration is pre-provisioned using the OpenSAF IMM 
tools:
the /etc/opensaf/node_name should contain one of the values specified
in nodes.cfg while generating the imm.xml.
(b) When opensaf cluster nodes are dynamically added at runtime:
the /etc/opensaf/node_name should contain the rdn value.

So, to the end user, the log messages should convey the relationship with the 
node_name file in some grammar. I have changed the  log to notice though and 
reframed with your inputs also in the attached patch.

2) w.r.t the use case

I think it is primarily a case of non-existent configuration and also a case of 
invalid configuration. 
In the cluster expansion case, i think the expansion logic should first update 
the cluster configuration, because otherwise the node startup will still be 
seen as afailure attempt for the "unconfigured" node. 


Note: Since, there is a way around the situation, i have changed the prioroity 
to major. I will send a formal review request later, but please use the 
attached patch(modified your patch with my impressions).





---

** [tickets:#816] CLM causes cluster restart when unknown node tries to join**

**Status:** accepted
**Milestone:** 4.4.1
**Created:** Fri Mar 21, 2014 01:15 PM UTC by Hans Feldt
**Last Updated:** Fri Mar 21, 2014 01:15 PM UTC
**Owner:** Mathi Naickan

When an unconfigured node tries to join an existing 4.4 CLM cluster the 
osafclmd process segfaults, after failover the new active osafclmd segfaults 
and we get a cluster restart.

Mar 21 14:06:12 SC-1 local0.err osafclmd[418]: ER CLM NodeName: 'PL-6' doesn't 
match entry in imm.xml. Specify a correct node name in/etc/opensaf/node_name
Mar 21 14:06:12 SC-1 local0.notice osafamfnd[441]: NO 
'safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Mar 21 14:06:12 SC-1 local0.err osafamfnd[441]: ER 
safComp=CLM,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Mar 21 14:06:12 SC-1 local0.crit osafamfnd[441]: Rebooting OpenSAF NodeId = 
131343 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131343, SupervisionTime = 60

Mar 21 14:06:35 SC-2 local0.notice osafamfd[431]: NO Node 'SC-1' left the 
cluster
Mar 21 14:06:37 SC-2 local0.err osafclmd[415]: ER CLM NodeName: 'PL-6' doesn't 
match entry in imm.xml. Specify a correct node name in/etc/opensaf/node_name
Mar 21 14:06:37 SC-2 local0.notice osafamfnd[439]: NO 
'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
Mar 21 14:06:37 SC-2 local0.err osafamfnd[439]: ER 
safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
Mar 21 14:06:37 SC-2 local0.crit osafamfnd[439]: Rebooting OpenSAF NodeId = 
131599 EE Name = , Reason: Component faulted: recovery is node failfast, 
OwnNodeId = 131599, SupervisionTime = 60

The log entry is also wrong. It has the wrong level ER. It does not have to be 
an error, this would happen during scale out - adding a new node. Should be 
notice. The text itself is also not correct since it is normally not related to 
imm.xml or contents of node_name.

I suggest the following log instead; "NO '<RDN value>' is not a configured 
cluster node"

This is a regression, it works with 4.3

Patch attached with proposed solution.



---

Sent from sourceforge.net because [email protected] is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.
------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
Opensaf-tickets mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets

Reply via email to