[Bug 1318441] Re: Precise corosync dies if failed_to_recv is set
This bug was fixed in the package corosync - 1.4.2-2ubuntu0.2 --- corosync (1.4.2-2ubuntu0.2) precise; urgency=medium * Fixed consensus being empty in case failed_to_recv is set (LP: #1318441) -- Rafael David Tinoco rafael.tin...@canonical.com Mon, 12 May 2014 09:37:06 -0500 ** Changed in: corosync (Ubuntu Precise) Status: Fix Committed = Fix Released -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1318441 Title: Precise corosync dies if failed_to_recv is set To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1318441/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1318441] Re: Precise corosync dies if failed_to_recv is set
** Changed in: corosync (Ubuntu Precise) Assignee: Rafael David Tinoco (inaddy) = (unassigned) -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1318441 Title: Precise corosync dies if failed_to_recv is set To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1318441/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1318441] Re: Precise corosync dies if failed_to_recv is set
Brian, I've made several tests on this and everything works like expected. Changing tag. Thanks ** Tags removed: verification-needed ** Tags added: verification-done -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1318441 Title: Precise corosync dies if failed_to_recv is set To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1318441/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1318441] Re: Precise corosync dies if failed_to_recv is set
Hello Rafael, or anyone else affected, Accepted corosync into precise-proposed. The package will build now and be available at http://launchpad.net/ubuntu/+source/corosync/1.4.2-2ubuntu0.2 in a few hours, and then in the -proposed repository. Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users. If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision. Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance! ** Changed in: corosync (Ubuntu Precise) Status: In Progress = Fix Committed ** Tags added: verification-needed -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1318441 Title: Precise corosync dies if failed_to_recv is set To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1318441/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1318441] Re: Precise corosync dies if failed_to_recv is set
** Branch linked: lp:ubuntu/precise-proposed/corosync -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1318441 Title: Precise corosync dies if failed_to_recv is set To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1318441/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1318441] Re: Precise corosync dies if failed_to_recv is set
Attaching patch. ** Patch added: corosync_1.4.2-2ubuntu0.2.diff https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1318441/+attachment/4110673/+files/corosync_1.4.2-2ubuntu0.2.diff ** Description changed: [Impact] - * On certain conditions corosync daemon may quit if it detects itself as not -being able to receive messages. The logic asserts the existence of at least -one functional node but the node is marking itself as a failed node (not -following the specification). It is safe not to assert this if failed_to_recv -is set. + * On certain conditions *precise* corosync daemon may quit if it detects itself +as not being able to receive messages. The logic asserts the existence of +at least one functional node but the node is marking itself as a failed node +(not following the specification). It is safe not to assert this if +failed_to_recv is set. [Test Case] - * Using corosync test suite on precise-test machine: + * Using corosync test suite on precise-test machine: -- Make sure to set ssh keys so precise-test can access precise-cluster-{01,02}. -- Make sure only failed-to-receive-crash.sh is executable on tests dir. -- Make sure precise-cluster-{01,02} nodes have build-dep for corosync installed. -- sudo ./run-tests.sh -c flatiron -n precise-cluster-01 precise-cluster-02 -- Check corosync log messages to see precise-cluster-01 corosync dieing. + - Make sure to set ssh keys so precise-test can access precise-cluster-{01,02}. + - Make sure only failed-to-receive-crash.sh is executable on tests dir. + - Make sure precise-cluster-{01,02} nodes have build-dep for corosync installed. + - sudo ./run-tests.sh -c flatiron -n precise-cluster-01 precise-cluster-02 + - Check corosync log messages to see precise-cluster-01 corosync dieing. [Regression Potential] - * We are not asserting the existence of at least 1 node in corosync cluster -anymore. Since there is always 1 node in the cluster (the node itself) it -is very unlikely this change alters corosync logic for membership. If it -does it is likely corosync will recover from the error and reestablish new -membership (with 1 or more nodes). + * We are not asserting the existence of at least 1 node in corosync cluster + anymore. Since there is always 1 node in the cluster (the node itself) it + is very unlikely this change alters corosync logic for membership. If it + does it is likely corosync will recover from the error and reestablish new + membership (with 1 or more nodes). [Other Info] - * n/a + * n/a -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1318441 Title: Precise corosync dies if failed_to_recv is set To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1318441/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1318441] Re: Precise corosync dies if failed_to_recv is set
Tests before the patch: # # NODE 1 # --- MARKER --- ./failed-to-receive-crash.sh at 2014-05-09-17:33:04 --- MARKER --- May 09 17:33:04 corosync [MAIN]: ] Corosync Cluster Engine ('1.4.2'): started and ready to provide service. May 09 17:33:04 corosync [MAIN]: ] Corosync built-in features: nss May 09 17:33:04 corosync [MAIN]: ] Successfully read main configuration file '/etc/corosync/corosync.conf'. May 09 17:33:04 corosync [TOTEM]: ] Initializing transport (UDP/IP Multicast). May 09 17:33:04 corosync [TOTEM]: ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). May 09 17:33:04 corosync [TOTEM]: ] The network interface [192.168.168.1] is now up. May 09 17:33:04 corosync [SERV]: ] Service engine loaded: openais checkpoint service B.01.01 May 09 17:33:04 corosync [SERV]: ] Service engine loaded: corosync extended virtual synchrony service May 09 17:33:04 corosync [SERV]: ] Service engine loaded: corosync configuration service May 09 17:33:04 corosync [SERV]: ] Service engine loaded: corosync cluster closed process group service v1.01 May 09 17:33:04 corosync [SERV]: ] Service engine loaded: corosync cluster config database access v1.01 May 09 17:33:04 corosync [SERV]: ] Service engine loaded: corosync profile loading service May 09 17:33:04 corosync [SERV]: ] Service engine loaded: corosync cluster quorum service v0.1 May 09 17:33:04 corosync [MAIN]: ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. May 09 17:33:04 corosync [TOTEM]: ] A processor joined or left the membership and a new membership was formed. May 09 17:33:04 corosync [CPG]: ] chosen downlist: sender r(0) ip(192.168.168.1) ; members(old:0 left:0) May 09 17:33:04 corosync [MAIN]: ] Completed service synchronization, ready to provide service. May 09 17:33:05 corosync [TOTEM]: ] A processor joined or left the membership and a new membership was formed. May 09 17:33:05 corosync [CPG]: ] chosen downlist: sender r(0) ip(192.168.168.1) ; members(old:1 left:0) May 09 17:33:05 corosync [MAIN]: ] Completed service synchronization, ready to provide service. May 09 17:33:10 corosync [TOTEM]: ] FAILED TO RECEIVE # COROSYNC HAS DIED BEFORE TEST CASE TRIES TO STOP IT root@precise-cluster-01:~# ps -ef | grep corosync root 1414 1306 0 17:31 pts/000:00:00 tail -f /var/log/cluster/corosync.log root 4712 1306 0 17:33 pts/000:00:00 grep --color=auto corosync Tests after the patch: May 11 22:27:48 corosync [MAIN]: ] Corosync Cluster Engine ('1.4.2'): started and ready to provide service. May 11 22:27:48 corosync [MAIN]: ] Corosync built-in features: nss May 11 22:27:48 corosync [MAIN]: ] Successfully read main configuration file '/etc/corosync/corosync.conf'. May 11 22:27:48 corosync [TOTEM]: ] Initializing transport (UDP/IP Multicast). May 11 22:27:48 corosync [TOTEM]: ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0). May 11 22:27:48 corosync [TOTEM]: ] The network interface [192.168.168.1] is now up. May 11 22:27:48 corosync [SERV]: ] Service engine loaded: openais checkpoint service B.01.01 May 11 22:27:48 corosync [SERV]: ] Service engine loaded: corosync extended virtual synchrony service May 11 22:27:48 corosync [SERV]: ] Service engine loaded: corosync configuration service May 11 22:27:48 corosync [SERV]: ] Service engine loaded: corosync cluster closed process group service v1.01 May 11 22:27:48 corosync [SERV]: ] Service engine loaded: corosync cluster config database access v1.01 May 11 22:27:49 corosync [SERV]: ] Service engine loaded: corosync profile loading service May 11 22:27:49 corosync [SERV]: ] Service engine loaded: corosync cluster quorum service v0.1 May 11 22:27:49 corosync [MAIN]: ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine. May 11 22:27:49 corosync [TOTEM]: ] A processor joined or left the membership and a new membership was formed. May 11 22:27:49 corosync [CPG]: ] chosen downlist: sender r(0) ip(192.168.168.1) ; members(old:0 left:0) May 11 22:27:49 corosync [MAIN]: ] Completed service synchronization, ready to provide service. May 11 22:27:49 corosync [TOTEM]: ] A processor joined or left the membership and a new membership was formed. May 11 22:27:49 corosync [CPG]: ] chosen downlist: sender r(0) ip(192.168.168.1) ; members(old:1 left:0) May 11 22:27:49 corosync [MAIN]: ] Completed service synchronization, ready to provide service. May 11 22:27:54 corosync [TOTEM]: ] FAILED TO RECEIVE May 11 22:27:55 corosync [TOTEM]: ] A processor joined or left the membership and a new membership was formed. May 11 22:27:55 corosync [CPG]: ] chosen downlist: sender r(0) ip(192.168.168.1) ; members(old:2 left:1) May 11 22:27:55 corosync [MAIN]: ] Completed service synchronization, ready to provide service. May 11 22:27:57 corosync [TOTEM]: ] A
[Bug 1318441] Re: Precise corosync dies if failed_to_recv is set
** Also affects: corosync (Ubuntu Precise) Importance: Undecided Status: New ** Changed in: corosync (Ubuntu Precise) Assignee: (unassigned) = Rafael David Tinoco (inaddy) ** Changed in: corosync (Ubuntu) Status: In Progress = Fix Released ** Changed in: corosync (Ubuntu Precise) Status: New = In Progress ** Changed in: corosync (Ubuntu Precise) Importance: Undecided = Medium ** Changed in: corosync (Ubuntu) Assignee: Rafael David Tinoco (inaddy) = (unassigned) -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1318441 Title: Precise corosync dies if failed_to_recv is set To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1318441/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1318441] Re: Precise corosync dies if failed_to_recv is set
Sponsored for Precise. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1318441 Title: Precise corosync dies if failed_to_recv is set To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1318441/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs
[Bug 1318441] Re: Precise corosync dies if failed_to_recv is set
** Description changed: - If node detects itself not able to receive message it asserts the number of failed members considering itself and dies. - I'll write more information (and the fix) in a few minutes. + If node detects itself not able to receive message it asserts the number + of failed members considering itself and dies. + + - Testing bugfix. To be released soon. -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to corosync in Ubuntu. https://bugs.launchpad.net/bugs/1318441 Title: Precise corosync dies if failed_to_recv is set To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/corosync/+bug/1318441/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs