[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync

2017-02-16 Thread Minh Hon Chau
- **status**: review --> fixed
- **assigned_to**: Minh Hon Chau -->  nobody 
- **Comment**:

changeset:   8587:fe8019855613
tag: tip
user:Minh Hon Chau 
date:Fri Feb 17 14:07:55 2017 +1100
summary: AMFD: Fix SC failover during headless sync at INIT_DONE state 
[#2162]

changeset:   8584:56a4eb3607aa
user:Minh Hon Chau 
date:Thu Feb 16 17:12:45 2017 +1100
summary: AMFND: Fix SC failover during headless sync before standby AMFD 
comes up [#2162]




---

** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless 
sync**

**Status:** fixed
**Milestone:** 5.2.FC
**Labels:** headless recovery 
**Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau
**Last Updated:** Tue Feb 07, 2017 03:48 AM UTC
**Owner:** nobody
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) 
(1.4 MB; application/x-compressed)


Test steps:
- Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 
(standby assignment)
- Stop SCs
- Stop PL4
- Restart SC1
- Restart SC2
- Since PL4 is stopped, headless sync will be time out in 10 secs. During this 
10 secs, reboot SC1 to trigger SC failover
Observation: SC2 becomes active controller, cold sync complete, but SU5 still 
has standby assignment.

When SC2 becomes active controller, the part of code that performs headless 
recovery is not executed (function failover_absent_assignment()). Therefore, 
the transient assignments remain after SC failover.

Log/trace are attached.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync

2017-02-06 Thread Minh Hon Chau
Hi Nagu,

There were response emails to the provided log, you can view them in the below 
links also

https://sourceforge.net/p/opensaf/mailman/message/35621982/
https://sourceforge.net/p/opensaf/mailman/message/35636549/

Thanks,
Minh


---

** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless 
sync**

**Status:** review
**Milestone:** 5.2.FC
**Labels:** headless recovery 
**Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau
**Last Updated:** Mon Jan 23, 2017 06:42 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) 
(1.4 MB; application/x-compressed)


Test steps:
- Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 
(standby assignment)
- Stop SCs
- Stop PL4
- Restart SC1
- Restart SC2
- Since PL4 is stopped, headless sync will be time out in 10 secs. During this 
10 secs, reboot SC1 to trigger SC failover
Observation: SC2 becomes active controller, cold sync complete, but SU5 still 
has standby assignment.

When SC2 becomes active controller, the part of code that performs headless 
recovery is not executed (function failover_absent_assignment()). Therefore, 
the transient assignments remain after SC failover.

Log/trace are attached.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync

2017-01-22 Thread Nagendra Kumar
Please find the logs attached for TC mentioned in the email.


Attachments:

- 
[Logs-tc.rar](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/f093e418/d8dc/attachment/Logs-tc.rar)
 (477.7 kB; application/octet-stream)


---

** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless 
sync**

**Status:** review
**Milestone:** 5.2.FC
**Labels:** headless recovery 
**Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau
**Last Updated:** Mon Jan 09, 2017 11:24 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) 
(1.4 MB; application/x-compressed)


Test steps:
- Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 
(standby assignment)
- Stop SCs
- Stop PL4
- Restart SC1
- Restart SC2
- Since PL4 is stopped, headless sync will be time out in 10 secs. During this 
10 secs, reboot SC1 to trigger SC failover
Observation: SC2 becomes active controller, cold sync complete, but SU5 still 
has standby assignment.

When SC2 becomes active controller, the part of code that performs headless 
recovery is not executed (function failover_absent_assignment()). Therefore, 
the transient assignments remain after SC failover.

Log/trace are attached.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync

2017-01-09 Thread Nagendra Kumar
I am reviewing it.
Thanks
-Nagu


---

** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless 
sync**

**Status:** review
**Milestone:** 5.2.FC
**Labels:** headless recovery 
**Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau
**Last Updated:** Tue Nov 08, 2016 03:29 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) 
(1.4 MB; application/x-compressed)


Test steps:
- Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 
(standby assignment)
- Stop SCs
- Stop PL4
- Restart SC1
- Restart SC2
- Since PL4 is stopped, headless sync will be time out in 10 secs. During this 
10 secs, reboot SC1 to trigger SC failover
Observation: SC2 becomes active controller, cold sync complete, but SU5 still 
has standby assignment.

When SC2 becomes active controller, the part of code that performs headless 
recovery is not executed (function failover_absent_assignment()). Therefore, 
the transient assignments remain after SC failover.

Log/trace are attached.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync

2016-11-07 Thread Minh Hon Chau
- **status**: assigned --> review



---

** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless 
sync**

**Status:** review
**Milestone:** 5.2.FC
**Labels:** headless recovery 
**Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau
**Last Updated:** Mon Nov 07, 2016 05:11 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) 
(1.4 MB; application/x-compressed)


Test steps:
- Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 
(standby assignment)
- Stop SCs
- Stop PL4
- Restart SC1
- Restart SC2
- Since PL4 is stopped, headless sync will be time out in 10 secs. During this 
10 secs, reboot SC1 to trigger SC failover
Observation: SC2 becomes active controller, cold sync complete, but SU5 still 
has standby assignment.

When SC2 becomes active controller, the part of code that performs headless 
recovery is not executed (function failover_absent_assignment()). Therefore, 
the transient assignments remain after SC failover.

Log/trace are attached.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync

2016-11-06 Thread Minh Hon Chau
A simple start up sequence after headless as in 2162.seq, 3 major cases that 
SC-1 can down from start up till headless recovery:
1- SC1 goes down before SC2 is successfully indicated as standby controller, 
which happens before standby assignment for 2N Opensaf SU
2- SC2 is standby controller, SC1 goes down before SC2 complete cold sync
3- SC2 completes cold sync, SC1 goes down before cluster initiation is 
done/timeout

If case 2 happens, SC2 is rebooted as expected result today
Case 1 is reproduced, see attached file c1.tgz, the result is that SC2 gets 
stuck in receiving node_up msg from all nodes, the cause is 2N Opensaf SU in 
SC2 could not be assigned as active
Case 3 is primarily reported in this ticket.


Attachments:

- 
[2162.seq](https://sourceforge.net/p/opensaf/tickets/_discuss/thread/f093e418/493c/attachment/2162.seq)
 (1.9 kB; application/octet-stream)


---

** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless 
sync**

**Status:** assigned
**Milestone:** 5.2.FC
**Labels:** headless recovery 
**Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau
**Last Updated:** Thu Nov 03, 2016 11:01 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) 
(1.4 MB; application/x-compressed)


Test steps:
- Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 
(standby assignment)
- Stop SCs
- Stop PL4
- Restart SC1
- Restart SC2
- Since PL4 is stopped, headless sync will be time out in 10 secs. During this 
10 secs, reboot SC1 to trigger SC failover
Observation: SC2 becomes active controller, cold sync complete, but SU5 still 
has standby assignment.

When SC2 becomes active controller, the part of code that performs headless 
recovery is not executed (function failover_absent_assignment()). Therefore, 
the transient assignments remain after SC failover.

Log/trace are attached.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2162 AMF: Headless recovery failed if SC failover during headless sync

2016-11-03 Thread Minh Hon Chau



---

** [tickets:#2162] AMF: Headless recovery failed if SC failover during headless 
sync**

**Status:** assigned
**Milestone:** 5.2.FC
**Labels:** headless recovery 
**Created:** Thu Nov 03, 2016 11:01 AM UTC by Minh Hon Chau
**Last Updated:** Thu Nov 03, 2016 11:01 AM UTC
**Owner:** Minh Hon Chau
**Attachments:**

- [log.tgz](https://sourceforge.net/p/opensaf/tickets/2162/attachment/log.tgz) 
(1.4 MB; application/x-compressed)


Test steps:
- Set up 2N assignment, PL4 hosts SU4 (active assignment), PL5 host SU5 
(standby assignment)
- Stop SCs
- Stop PL4
- Restart SC1
- Restart SC2
- Since PL4 is stopped, headless sync will be time out in 10 secs. During this 
10 secs, reboot SC1 to trigger SC failover
Observation: SC2 becomes active controller, cold sync complete, but SU5 still 
has standby assignment.

When SC2 becomes active controller, the part of code that performs headless 
recovery is not executed (function failover_absent_assignment()). Therefore, 
the transient assignments remain after SC failover.

Log/trace are attached.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets