[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync
- **status**: review --> fixed - **assigned_to**: Minh Hon Chau --> nobody - **Comment**: changeset: 8587:fe8019855613 tag: tip user:Minh Hon Chaudate:Fri Feb 17 14:07:55 2017 +1100 summary: AMFD: Fix SC failover during headless sync at INIT_DONE state [#2162] changeset: 8584:56a4eb3607aa user:Minh Hon Chau date:Thu Feb 16 17:12:45 2017 +1100 summary: AMFND: Fix SC failover during headless sync before standby AMFD comes up [#2162] --- ** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless sync** **Status:** fixed **Milestone:** 5.2.FC **Labels:** headless recovery **Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau **Last Updated:** Tue Feb 07, 2017 03:48 AM UTC **Owner:** nobody **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) (1.4 MB; application/x-compressed) Test steps: - Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 (standby assignment) - Stop SCs - Stop PL4 - Restart SC1 - Restart SC2 - Since PL4 is stopped, headless sync will be time out in 10 secs. During this 10 secs, reboot SC1 to trigger SC failover Observation: SC2 becomes active controller, cold sync complete, but SU5 still has standby assignment. When SC2 becomes active controller, the part of code that performs headless recovery is not executed (function failover_absent_assignment()). Therefore, the transient assignments remain after SC failover. Log/trace are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync
Hi Nagu, There were response emails to the provided log, you can view them in the below links also https://sourceforge.net/p/opensaf/mailman/message/35621982/ https://sourceforge.net/p/opensaf/mailman/message/35636549/ Thanks, Minh --- ** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless sync** **Status:** review **Milestone:** 5.2.FC **Labels:** headless recovery **Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau **Last Updated:** Mon Jan 23, 2017 06:42 AM UTC **Owner:** Minh Hon Chau **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) (1.4 MB; application/x-compressed) Test steps: - Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 (standby assignment) - Stop SCs - Stop PL4 - Restart SC1 - Restart SC2 - Since PL4 is stopped, headless sync will be time out in 10 secs. During this 10 secs, reboot SC1 to trigger SC failover Observation: SC2 becomes active controller, cold sync complete, but SU5 still has standby assignment. When SC2 becomes active controller, the part of code that performs headless recovery is not executed (function failover_absent_assignment()). Therefore, the transient assignments remain after SC failover. Log/trace are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync
Please find the logs attached for TC mentioned in the email. Attachments: - [Logs-tc.rar](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/f093e418/d8dc/attachment/Logs-tc.rar) (477.7 kB; application/octet-stream) --- ** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless sync** **Status:** review **Milestone:** 5.2.FC **Labels:** headless recovery **Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau **Last Updated:** Mon Jan 09, 2017 11:24 AM UTC **Owner:** Minh Hon Chau **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) (1.4 MB; application/x-compressed) Test steps: - Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 (standby assignment) - Stop SCs - Stop PL4 - Restart SC1 - Restart SC2 - Since PL4 is stopped, headless sync will be time out in 10 secs. During this 10 secs, reboot SC1 to trigger SC failover Observation: SC2 becomes active controller, cold sync complete, but SU5 still has standby assignment. When SC2 becomes active controller, the part of code that performs headless recovery is not executed (function failover_absent_assignment()). Therefore, the transient assignments remain after SC failover. Log/trace are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync
I am reviewing it. Thanks -Nagu --- ** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless sync** **Status:** review **Milestone:** 5.2.FC **Labels:** headless recovery **Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau **Last Updated:** Tue Nov 08, 2016 03:29 AM UTC **Owner:** Minh Hon Chau **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) (1.4 MB; application/x-compressed) Test steps: - Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 (standby assignment) - Stop SCs - Stop PL4 - Restart SC1 - Restart SC2 - Since PL4 is stopped, headless sync will be time out in 10 secs. During this 10 secs, reboot SC1 to trigger SC failover Observation: SC2 becomes active controller, cold sync complete, but SU5 still has standby assignment. When SC2 becomes active controller, the part of code that performs headless recovery is not executed (function failover_absent_assignment()). Therefore, the transient assignments remain after SC failover. Log/trace are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync
- **status**: assigned --> review --- ** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless sync** **Status:** review **Milestone:** 5.2.FC **Labels:** headless recovery **Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau **Last Updated:** Mon Nov 07, 2016 05:11 AM UTC **Owner:** Minh Hon Chau **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) (1.4 MB; application/x-compressed) Test steps: - Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 (standby assignment) - Stop SCs - Stop PL4 - Restart SC1 - Restart SC2 - Since PL4 is stopped, headless sync will be time out in 10 secs. During this 10 secs, reboot SC1 to trigger SC failover Observation: SC2 becomes active controller, cold sync complete, but SU5 still has standby assignment. When SC2 becomes active controller, the part of code that performs headless recovery is not executed (function failover_absent_assignment()). Therefore, the transient assignments remain after SC failover. Log/trace are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync
A simple start up sequence after headless as in 2162.seq, 3 major cases that SC-1 can down from start up till headless recovery: 1- SC1 goes down before SC2 is successfully indicated as standby controller, which happens before standby assignment for 2N Opensaf SU 2- SC2 is standby controller, SC1 goes down before SC2 complete cold sync 3- SC2 completes cold sync, SC1 goes down before cluster initiation is done/timeout If case 2 happens, SC2 is rebooted as expected result today Case 1 is reproduced, see attached file c1.tgz, the result is that SC2 gets stuck in receiving node_up msg from all nodes, the cause is 2N Opensaf SU in SC2 could not be assigned as active Case 3 is primarily reported in this ticket. Attachments: - [2162.seq](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/f093e418/493c/attachment/2162.seq) (1.9 kB; application/octet-stream) --- ** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless sync** **Status:** assigned **Milestone:** 5.2.FC **Labels:** headless recovery **Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau **Last Updated:** Thu Nov 03, 2016 11:01 AM UTC **Owner:** Minh Hon Chau **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) (1.4 MB; application/x-compressed) Test steps: - Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 (standby assignment) - Stop SCs - Stop PL4 - Restart SC1 - Restart SC2 - Since PL4 is stopped, headless sync will be time out in 10 secs. During this 10 secs, reboot SC1 to trigger SC failover Observation: SC2 becomes active controller, cold sync complete, but SU5 still has standby assignment. When SC2 becomes active controller, the part of code that performs headless recovery is not executed (function failover_absent_assignment()). Therefore, the transient assignments remain after SC failover. Log/trace are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync
--- ** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless sync** **Status:** assigned **Milestone:** 5.2.FC **Labels:** headless recovery **Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau **Last Updated:** Thu Nov 03, 2016 11:01 AM UTC **Owner:** Minh Hon Chau **Attachments:** - [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) (1.4 MB; application/x-compressed) Test steps: - Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 (standby assignment) - Stop SCs - Stop PL4 - Restart SC1 - Restart SC2 - Since PL4 is stopped, headless sync will be time out in 10 secs. During this 10 secs, reboot SC1 to trigger SC failover Observation: SC2 becomes active controller, cold sync complete, but SU5 still has standby assignment. When SC2 becomes active controller, the part of code that performs headless recovery is not executed (function failover_absent_assignment()). Therefore, the transient assignments remain after SC failover. Log/trace are attached. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Developer Access Program for Intel Xeon Phi Processors Access to Intel Xeon Phi processor-based developer platforms. With one year of Intel Parallel Studio XE. Training and support from Colfax. Order your platform today. http://sdm.link/xeonphi___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets