[tickets] [opensaf:tickets] #2052 immtools: SC/PL field in nodes.cfg is not used
Zoran, Node reboot recovery is to be followed, when the system cannot recover from the observed fault. For a fault like amfd crashing, node reboot can be followed. But in the current scenario, upon reboot same configuration exists and node shall go for reboot as opensafd is enabled in the runlevel by default. If the system has the same environment after reboot, then it doesn't help user / system by rebooting to recover from a misconfiguration or even a fault. My expectation is that node shouldn't go for reboot and opensafd should be either running in a suspended way or can even be stopped. This issue is observed mainly for newbies. Rebooting a node upon starting opensaf for misconfiguration doesn't look good. --- ** [tickets:#2052] immtools: SC/PL field in nodes.cfg is not used** **Status:** unassigned **Milestone:** 5.0.2 **Created:** Tue Sep 20, 2016 09:41 AM UTC by Ritu Raj **Last Updated:** Tue Nov 01, 2016 07:26 AM UTC **Owner:** nobody # Environment details OS : Suse 64bit Changeset : 7997 ( 5.1.FC) # Summary Controller able to join with invalid node_name # Steps followed & Observed behaviour 1. Mistakenly configured controller node_name with PL-3 and the remaining configuration files are properly installed and updated apart from /etc/opensaf/node_name. 2. Bringup OpenSAF, OpneSAF still able to comeup with misconfigured node_name Opensaf status: fos1:/opt/goahead/tetware/opensaffire/suites/avsv/api/suites # /etc/init.d/opensafd status safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) # Expected OpenSAF should come up with only SC-1 / SC-2, as immxml generated with : ./immxml-clustersize -s 2 -p 2 ./immxml-configure --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2052 immtools: SC/PL field in nodes.cfg is not used
I think, the discussion got deviated by the usage of PL string in nodes.cfg. On the fist node in the opensaf cluster, the following info is filled up in opensaf cfg files. cat /usr/share/opensaf/immxml/nodes.cfg SC node-1 node-1 SC node-2 node-2 PL node-3 node-3 PL node-4 node-4 PL node-5 node-5 PL node-6 node-6 cat /etc/opensaf/slot_id 1 cat /etc/opensaf/node_name node-3 cat /etc/opensaf/node_type controller -> Opensafd starts successfully, but with the following output safSISU=safSu=node-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) -> After a timegap of 5 minutes, the node went for reboot with the following output. Nov 1 12:31:22 CONTROLLER-1 osaffmd[3945]: Rebooting OpenSAF NodeId = 0 EE Name = No EE Mapped, Reason: Activation timer supervision expired: no ACTIVE assignment received within the time limit, OwnNodeId = 131343, SupervisionTime = 60 Nov 1 12:31:22 CONTROLLER-1 opensaf_reboot: Rebooting local node; timeout=60 Observed behavior : If user mistakenly populates the node_name with the payload's node_name and starts the opensafd script, then user shall not be informed about mis-configuration. The node reboots continuously as opensafd is enabled in runtime by default during RPM installation. Expected behavior : Either fms / imm / amf should detect that the node_name used in bringing up is intended for payload, but not for controller. More importantly, the node should not go for reboot. --- ** [tickets:#2052] immtools: SC/PL field in nodes.cfg is not used** **Status:** unassigned **Milestone:** 5.0.2 **Created:** Tue Sep 20, 2016 09:41 AM UTC by Ritu Raj **Last Updated:** Tue Sep 20, 2016 05:49 PM UTC **Owner:** nobody # Environment details OS : Suse 64bit Changeset : 7997 ( 5.1.FC) # Summary Controller able to join with invalid node_name # Steps followed & Observed behaviour 1. Mistakenly configured controller node_name with PL-3 and the remaining configuration files are properly installed and updated apart from /etc/opensaf/node_name. 2. Bringup OpenSAF, OpneSAF still able to comeup with misconfigured node_name Opensaf status: fos1:/opt/goahead/tetware/opensaffire/suites/avsv/api/suites # /etc/init.d/opensafd status safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) # Expected OpenSAF should come up with only SC-1 / SC-2, as immxml generated with : ./immxml-clustersize -s 2 -p 2 ./immxml-configure --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2052 immtools: SC/PL field in nodes.cfg is not used
- **Milestone**: 4.7.2 --> 5.0.2 --- ** [tickets:#2052] immtools: SC/PL field in nodes.cfg is not used** **Status:** unassigned **Milestone:** 5.0.2 **Created:** Tue Sep 20, 2016 09:41 AM UTC by Ritu Raj **Last Updated:** Tue Sep 20, 2016 02:00 PM UTC **Owner:** nobody # Environment details OS : Suse 64bit Changeset : 7997 ( 5.1.FC) # Summary Controller able to join with invalid node_name # Steps followed & Observed behaviour 1. Mistakenly configured controller node_name with PL-3 and the remaining configuration files are properly installed and updated apart from /etc/opensaf/node_name. 2. Bringup OpenSAF, OpneSAF still able to comeup with misconfigured node_name Opensaf status: fos1:/opt/goahead/tetware/opensaffire/suites/avsv/api/suites # /etc/init.d/opensafd status safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) # Expected OpenSAF should come up with only SC-1 / SC-2, as immxml generated with : ./immxml-clustersize -s 2 -p 2 ./immxml-configure --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2052 immtools: SC/PL field in nodes.cfg is not used
Hi, I'm not playing a lot with nodes.cfg, but as I know, the first column tells if a node is a system controller or a payload. Base on the first column, immxml tools knows which template will be used. The second column is AMF node name. The third column is CLM node name. AMF and CLM node don't need to be the same. If you set that a system controller node name is PL-3 then a node with node name PL-3 is a system controller. Node names don't need to start with SC or PL. It can be any name. --- ** [tickets:#2052] immtools: SC/PL field in nodes.cfg is not used** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 20, 2016 09:41 AM UTC by Ritu Raj **Last Updated:** Tue Sep 20, 2016 12:20 PM UTC **Owner:** nobody # Environment details OS : Suse 64bit Changeset : 7997 ( 5.1.FC) # Summary Controller able to join with invalid node_name # Steps followed & Observed behaviour 1. Mistakenly configured controller node_name with PL-3 and the remaining configuration files are properly installed and updated apart from /etc/opensaf/node_name. 2. Bringup OpenSAF, OpneSAF still able to comeup with misconfigured node_name Opensaf status: fos1:/opt/goahead/tetware/opensaffire/suites/avsv/api/suites # /etc/init.d/opensafd status safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) # Expected OpenSAF should come up with only SC-1 / SC-2, as immxml generated with : ./immxml-clustersize -s 2 -p 2 ./immxml-configure --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2052 immtools: SC/PL field in nodes.cfg is not used
I want to add this one too: So, if we start second node SC-2, it will failed to join the cluster And both node will go for reboot **and finally after reboot when node join back: >>SC-2 will join with "ACTIVE" role and first node(PL-3) will join as >>"QUIESCED" Syslog of SC-2: Sep 20 17:27:18 TestBed-R2 osafimmd[27361]: ER Failed to find candidate for new IMMND coordinator (ScAbsenceAllowed:0 RulingEpoch:0 Sep 20 17:27:18 TestBed-R2 osafimmd[27361]: ER Active IMMD has to restart the IMMSv. All IMMNDs will restart Sep 20 17:27:18 TestBed-R2 osafimmd[27361]: NO Cluster failed to load => IMMDs will not exit. Sep 20 17:27:18 TestBed-R2 osafimmd[27361]: NO MDS event from svc_id 25 (change:4, dest:564114851160080) Sep 20 17:27:18 TestBed-R2 osafimmd[27361]: IN Added IMMND node with dest 564114851160080 Sep 20 17:27:18 TestBed-R2 osafimmd[27361]: IN Added IMMND node with dest 565216431636496 Sep 20 17:27:18 TestBed-R2 osafimmd[27361]: WA Error returned from processing message err:0 msg-type:14 Sep 20 17:27:18 TestBed-R2 osafimmnd[27372]: ER IMMND forced to restart on order from IMMD, exiting Sep 20 17:27:18 TestBed-R2 osafimmd[27361]: NO MDS event from svc_id 25 (change:4, dest:565216431636496) Sep 20 17:27:18 TestBed-R2 osafamfnd[27422]: NO 'safSu=SC-2,safSg=NoRed,safApp=OpenSAF' component restart probation timer started (timeout: 600 ns) Sep 20 17:27:18 TestBed-R2 osafamfnd[27422]: NO Restarting a component of 'safSu=SC-2,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1) Sep 20 17:27:18 TestBed-R2 osafamfnd[27422]: NO 'safComp=IMMND,safSu=SC-2,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart . Sep 20 17:27:23 TestBed-R2 osafclmd[27402]: NO ERR_INVALID_PARAM: Implementer safClmService already set for this handle when trying to set safClmService Sep 20 17:27:23 TestBed-R2 osafclmd[27402]: ER saImmOiImplementerSet failed, rc = 7 Sep 20 17:27:23 TestBed-R2 osafamfnd[27422]: NO 'safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Sep 20 17:27:23 TestBed-R2 osafamfnd[27422]: ER safComp=CLM,safSu=SC-2,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Sep 20 17:27:23 TestBed-R2 osafamfnd[27422]: Rebooting OpenSAF NodeId = 131599 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131599, SupervisionTime = 60 Sep 20 17:27:23 TestBed-R2 opensaf_reboot: Rebooting local node; timeout=60 Syslog of firstnode: Sep 20 17:28:10 TestBed-R1 osafimmnd[31481]: ER No IMMD service => cluster restart, exiting Sep 20 17:28:10 TestBed-R1 osafamfnd[30949]: NO Restarting a component of 'safSu=PL-3,safSg=NoRed,safApp=OpenSAF' (comp restart count: 2) Sep 20 17:28:10 TestBed-R1 osafamfnd[30949]: NO 'safComp=IMMND,safSu=PL-3,safSg=NoRed,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'componentRestart' Sep 20 17:28:10 TestBed-R1 osafntfimcnd[31487]: NO saImmOiDispatch() Fail SA_AIS_ERR_BAD_HANDLE (9) Sep 20 17:28:10 TestBed-R1 osafamfd[30935]: NO Node 'SC-2' left the cluster Sep 20 17:28:10 TestBed-R1 osafamfd[30935]: safSu=SC-2,safSg=2N,safApp=OpenSAF OperState ENABLED => DISABLED Sep 20 17:28:10 TestBed-R1 opensaf_reboot: Rebooting local node; timeout=60 Sep 20 17:28:10 TestBed-R1 osafamfd[30935]: ER sendStateChangeNotificationAvd: saNtfNotificationSend Failed (6) Sep 20 17:28:10 TestBed-R1 osafamfd[30935]: safSu=SC-2,safSg=2N,safApp=OpenSAF PresenceState INSTANTIATED => UNINSTANTIATED Sep 20 17:28:10 TestBed-R1 osafamfd[30935]: ER sendStateChangeNotificationAvd: saNtfNotificationSend Failed (6) Sep 20 17:28:10 TestBed-R1 osafamfd[30935]: safSu=SC-2,safSg=2N,safApp=OpenSAF ReadinessState IN_SERVICE => OUT_OF_SERVICE --- ** [tickets:#2052] immtools: SC/PL field in nodes.cfg is not used** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 20, 2016 09:41 AM UTC by Ritu Raj **Last Updated:** Tue Sep 20, 2016 11:58 AM UTC **Owner:** nobody # Environment details OS : Suse 64bit Changeset : 7997 ( 5.1.FC) # Summary Controller able to join with invalid node_name # Steps followed & Observed behaviour 1. Mistakenly configured controller node_name with PL-3 and the remaining configuration files are properly installed and updated apart from /etc/opensaf/node_name. 2. Bringup OpenSAF, OpneSAF still able to comeup with misconfigured node_name Opensaf status: fos1:/opt/goahead/tetware/opensaffire/suites/avsv/api/suites # /etc/init.d/opensafd status safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) # Expected OpenSAF should come up with only SC-1 / SC-2, as immxml generated with : ./immxml-clustersize -s 2 -p 2 ./immxml-configure --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at
[tickets] [opensaf:tickets] #2052 immtools: SC/PL field in nodes.cfg is not used
- **summary**: Controller able to join with invalid node_name --> immtools: SC/PL field in nodes.cfg is not used - **Type**: defect --> discussion - **Comment**: Had a discussion with ritu and Tagging this ticket as a discussion topic and assigning to immtools. The issue can be reproduced as below: Generate imm.xml for 4 nodes with names set to SC-1, SC-2, PL-3 ,PL-4 in the nodes.cfg SC SC-1 SC-1 SC SC-2 SC-2 PL PL-3 PL-3 PL PL-4 PL-4 Now, start the first node with node_name set to PL-4. OpenSAF comes up fine. Since the nodes.cfg is exposed to the end user, I guess Ritu is questioning the need for the first column in nodes.cfg i.e. 'differentiation based on 'SC' versus 'PL'. This could be discussed further. --- ** [tickets:#2052] immtools: SC/PL field in nodes.cfg is not used** **Status:** unassigned **Milestone:** 4.7.2 **Created:** Tue Sep 20, 2016 09:41 AM UTC by Ritu Raj **Last Updated:** Tue Sep 20, 2016 09:41 AM UTC **Owner:** nobody # Environment details OS : Suse 64bit Changeset : 7997 ( 5.1.FC) # Summary Controller able to join with invalid node_name # Steps followed & Observed behaviour 1. Mistakenly configured controller node_name with PL-3 and the remaining configuration files are properly installed and updated apart from /etc/opensaf/node_name. 2. Bringup OpenSAF, OpneSAF still able to comeup with misconfigured node_name Opensaf status: fos1:/opt/goahead/tetware/opensaffire/suites/avsv/api/suites # /etc/init.d/opensafd status safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF saAmfSISUHAState=ACTIVE(1) # Expected OpenSAF should come up with only SC-1 / SC-2, as immxml generated with : ./immxml-clustersize -s 2 -p 2 ./immxml-configure --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets