Re: [openstack-dev] [Fuel] Rabbitmq 3.4.0 upgrade for the 6.1 release, is it worth it?
-1 for upgrading it in 6.1. Known devil is better than unknown angel :) In 7.0 we can try 3.5.0 with updated Erlang. ~thanks -- Best regards, Sergii Golovatiuk, Skype #golserge IRC #holser On Wed, Apr 29, 2015 at 12:20 PM, Davanum Srinivas dsrini...@mirantis.com wrote: Bogdan, Pacemaker, corosync etc, we picked vivid packages right? So don't we test what's in vivid for this too? Apparently it's 3.4.3-2 per [0]. I agree, we should not do this in 6.1, However, we should start testing this ASAP. Another data point, Alexander Nevenchannyy pointed out to me that 3.5.0 came with an updated Erlang that has the following fix: OTP-11497 To prevent a race condition if there is a short communication problem when node-down and node-up events are received. They are now stored and later checked if the node came up just before mnesia flagged the node as down. (Thanks to Jonas Falkevik ) which seems interesting as well. thanks, dims [0] https://launchpad.net/ubuntu/vivid/+package/rabbitmq-server [1] http://www.erlang.org/download/otp_src_17.0.readme On Wed, Apr 29, 2015 at 4:04 AM, Bogdan Dobrelya bdobre...@mirantis.com wrote: Hello. There are several concerns why we have to upgrade RabbitMQ to 3.4.0 [0]: 1) At least two bugfixes related to the current high-load issue with MQ [1]: - 26404 prevent queue synchronisation from hanging if there is a very short partition just as it starts (since 3.1.0) - 26368 prevent autoheal from hanging when loser shuts down before the winner learns it is the winner (since 3.1.0) 2) We should as well check how the new 'pause-if-all-down' option works for split brain recovery. 3) We should address the 'force_boot' recommendations from this mail thread [2] to speed up the MQ cluster assemble time. The question is - is it worth it to do this in the 6.1 release scope? I vote to postpone this for the 7.0 dev cycle as the impact of such changes might be unpredictable. [0] https://www.rabbitmq.com/release-notes/README-3.4.0.txt [1] https://bugs.launchpad.net/fuel/+bug/1447619 [2] http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg51625.html -- Best regards, Bogdan Dobrelya, Skype #bogdando_at_yahoo.com Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Fuel] Rabbitmq 3.4.0 upgrade for the 6.1 release, is it worth it?
I’m -1 for it. Considering how much time we needed to troubleshoot the problems already, I don’t think we have time to properly test the upgrade. On 29 Apr 2015, at 12:37, Sergii Golovatiuk sgolovat...@mirantis.com wrote: -1 for upgrading it in 6.1. Known devil is better than unknown angel :) In 7.0 we can try 3.5.0 with updated Erlang. ~thanks -- Best regards, Sergii Golovatiuk, Skype #golserge IRC #holser On Wed, Apr 29, 2015 at 12:20 PM, Davanum Srinivas dsrini...@mirantis.com wrote: Bogdan, Pacemaker, corosync etc, we picked vivid packages right? So don't we test what's in vivid for this too? Apparently it's 3.4.3-2 per [0]. I agree, we should not do this in 6.1, However, we should start testing this ASAP. Another data point, Alexander Nevenchannyy pointed out to me that 3.5.0 came with an updated Erlang that has the following fix: OTP-11497 To prevent a race condition if there is a short communication problem when node-down and node-up events are received. They are now stored and later checked if the node came up just before mnesia flagged the node as down. (Thanks to Jonas Falkevik ) which seems interesting as well. thanks, dims [0] https://launchpad.net/ubuntu/vivid/+package/rabbitmq-server [1] http://www.erlang.org/download/otp_src_17.0.readme On Wed, Apr 29, 2015 at 4:04 AM, Bogdan Dobrelya bdobre...@mirantis.com wrote: Hello. There are several concerns why we have to upgrade RabbitMQ to 3.4.0 [0]: 1) At least two bugfixes related to the current high-load issue with MQ [1]: - 26404 prevent queue synchronisation from hanging if there is a very short partition just as it starts (since 3.1.0) - 26368 prevent autoheal from hanging when loser shuts down before the winner learns it is the winner (since 3.1.0) 2) We should as well check how the new 'pause-if-all-down' option works for split brain recovery. 3) We should address the 'force_boot' recommendations from this mail thread [2] to speed up the MQ cluster assemble time. The question is - is it worth it to do this in the 6.1 release scope? I vote to postpone this for the 7.0 dev cycle as the impact of such changes might be unpredictable. [0] https://www.rabbitmq.com/release-notes/README-3.4.0.txt [1] https://bugs.launchpad.net/fuel/+bug/1447619 [2] http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg51625.html -- Best regards, Bogdan Dobrelya, Skype #bogdando_at_yahoo.com Irc #bogdando -- Tomasz 'Zen' Napierala Product Engineering - Poland __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Fuel] Rabbitmq 3.4.0 upgrade for the 6.1 release, is it worth it?
Alexei, actually we do not insist this should b done for the MOS 6.1. That was the question to the audience if someone is having other idea. All these discussions have roots in the bug https://bugs.launchpad.net/fuel/+bug/1447619 - we have found the issue with RabbitMQ cluster behaviour under the networking load and Bogdan, Alexei Khivin and Alex Nevenchannyy found that it possibly might be fixed upgrading the RabbitMQ (trying it now). And actually this RabbitMQ release contains lots of crucial bug fixes even without mentioned by Bogdan. For 6.1 the found workaround might be used, section in the docs written, etc. The question to the company is if that will be enough and what are we going to do with it in future. Cheers, Dina On Wed, Apr 29, 2015 at 11:24 AM, Alexei Sheplyakov asheplya...@mirantis.com wrote: Hi, Given that - MOS 6.1 should be released in a few weeks - rabbitmq is kind of a heart of OpenStack upgrading rabbitmq in MOS 6.1 seems to be an extremely bad idea. There will be always some bugs (both known and unknown), but we can't keep updating various components forever and should stop at some moment (which is presumably called `soft code freeze'). Best regards, Alexei On Wed, Apr 29, 2015 at 11:04 AM, Bogdan Dobrelya bdobre...@mirantis.com wrote: Hello. There are several concerns why we have to upgrade RabbitMQ to 3.4.0 [0]: 1) At least two bugfixes related to the current high-load issue with MQ [1]: - 26404 prevent queue synchronisation from hanging if there is a very short partition just as it starts (since 3.1.0) - 26368 prevent autoheal from hanging when loser shuts down before the winner learns it is the winner (since 3.1.0) 2) We should as well check how the new 'pause-if-all-down' option works for split brain recovery. 3) We should address the 'force_boot' recommendations from this mail thread [2] to speed up the MQ cluster assemble time. The question is - is it worth it to do this in the 6.1 release scope? I vote to postpone this for the 7.0 dev cycle as the impact of such changes might be unpredictable. [0] https://www.rabbitmq.com/release-notes/README-3.4.0.txt [1] https://bugs.launchpad.net/fuel/+bug/1447619 [2] http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg51625.html -- Best regards, Bogdan Dobrelya, Skype #bogdando_at_yahoo.com Irc #bogdando -- Best regards, Dina Belova Software Engineer Mirantis Inc. __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [Fuel] Rabbitmq 3.4.0 upgrade for the 6.1 release, is it worth it?
Hello. There are several concerns why we have to upgrade RabbitMQ to 3.4.0 [0]: 1) At least two bugfixes related to the current high-load issue with MQ [1]: - 26404 prevent queue synchronisation from hanging if there is a very short partition just as it starts (since 3.1.0) - 26368 prevent autoheal from hanging when loser shuts down before the winner learns it is the winner (since 3.1.0) 2) We should as well check how the new 'pause-if-all-down' option works for split brain recovery. 3) We should address the 'force_boot' recommendations from this mail thread [2] to speed up the MQ cluster assemble time. The question is - is it worth it to do this in the 6.1 release scope? I vote to postpone this for the 7.0 dev cycle as the impact of such changes might be unpredictable. [0] https://www.rabbitmq.com/release-notes/README-3.4.0.txt [1] https://bugs.launchpad.net/fuel/+bug/1447619 [2] http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg51625.html -- Best regards, Bogdan Dobrelya, Skype #bogdando_at_yahoo.com Irc #bogdando __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Fuel] Rabbitmq 3.4.0 upgrade for the 6.1 release, is it worth it?
I am 100% against the upgrade, folks. We need to ensure that user can use different network for GRE segmentation for high-load cases and mention this in Installation and Operations Guides - there is no time for it. On Wed, Apr 29, 2015 at 2:33 PM, Tomasz Napierala tnapier...@mirantis.com wrote: I’m -1 for it. Considering how much time we needed to troubleshoot the problems already, I don’t think we have time to properly test the upgrade. On 29 Apr 2015, at 12:37, Sergii Golovatiuk sgolovat...@mirantis.com wrote: -1 for upgrading it in 6.1. Known devil is better than unknown angel :) In 7.0 we can try 3.5.0 with updated Erlang. ~thanks -- Best regards, Sergii Golovatiuk, Skype #golserge IRC #holser On Wed, Apr 29, 2015 at 12:20 PM, Davanum Srinivas dsrini...@mirantis.com wrote: Bogdan, Pacemaker, corosync etc, we picked vivid packages right? So don't we test what's in vivid for this too? Apparently it's 3.4.3-2 per [0]. I agree, we should not do this in 6.1, However, we should start testing this ASAP. Another data point, Alexander Nevenchannyy pointed out to me that 3.5.0 came with an updated Erlang that has the following fix: OTP-11497 To prevent a race condition if there is a short communication problem when node-down and node-up events are received. They are now stored and later checked if the node came up just before mnesia flagged the node as down. (Thanks to Jonas Falkevik ) which seems interesting as well. thanks, dims [0] https://launchpad.net/ubuntu/vivid/+package/rabbitmq-server [1] http://www.erlang.org/download/otp_src_17.0.readme On Wed, Apr 29, 2015 at 4:04 AM, Bogdan Dobrelya bdobre...@mirantis.com wrote: Hello. There are several concerns why we have to upgrade RabbitMQ to 3.4.0 [0]: 1) At least two bugfixes related to the current high-load issue with MQ [1]: - 26404 prevent queue synchronisation from hanging if there is a very short partition just as it starts (since 3.1.0) - 26368 prevent autoheal from hanging when loser shuts down before the winner learns it is the winner (since 3.1.0) 2) We should as well check how the new 'pause-if-all-down' option works for split brain recovery. 3) We should address the 'force_boot' recommendations from this mail thread [2] to speed up the MQ cluster assemble time. The question is - is it worth it to do this in the 6.1 release scope? I vote to postpone this for the 7.0 dev cycle as the impact of such changes might be unpredictable. [0] https://www.rabbitmq.com/release-notes/README-3.4.0.txt [1] https://bugs.launchpad.net/fuel/+bug/1447619 [2] http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg51625.html -- Best regards, Bogdan Dobrelya, Skype #bogdando_at_yahoo.com Irc #bogdando -- Tomasz 'Zen' Napierala Product Engineering - Poland -- Yours Faithfully, Vladimir Kuklin, Fuel Library Tech Lead, Mirantis, Inc. +7 (495) 640-49-04 +7 (926) 702-39-68 Skype kuklinvv 35bk3, Vorontsovskaya Str. Moscow, Russia, www.mirantis.com http://www.mirantis.ru/ www.mirantis.ru vkuk...@mirantis.com __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Fuel] Rabbitmq 3.4.0 upgrade for the 6.1 release, is it worth it?
Agree with Vova and Tomasz. It's too late and risky for 6.1, in my opinion. Best, -jay On 04/29/2015 07:55 AM, Vladimir Kuklin wrote: I am 100% against the upgrade, folks. We need to ensure that user can use different network for GRE segmentation for high-load cases and mention this in Installation and Operations Guides - there is no time for it. On Wed, Apr 29, 2015 at 2:33 PM, Tomasz Napierala tnapier...@mirantis.com mailto:tnapier...@mirantis.com wrote: I’m -1 for it. Considering how much time we needed to troubleshoot the problems already, I don’t think we have time to properly test the upgrade. On 29 Apr 2015, at 12:37, Sergii Golovatiuk sgolovat...@mirantis.com mailto:sgolovat...@mirantis.com wrote: -1 for upgrading it in 6.1. Known devil is better than unknown angel :) In 7.0 we can try 3.5.0 with updated Erlang. ~thanks -- Best regards, Sergii Golovatiuk, Skype #golserge IRC #holser On Wed, Apr 29, 2015 at 12:20 PM, Davanum Srinivas dsrini...@mirantis.com mailto:dsrini...@mirantis.com wrote: Bogdan, Pacemaker, corosync etc, we picked vivid packages right? So don't we test what's in vivid for this too? Apparently it's 3.4.3-2 per [0]. I agree, we should not do this in 6.1, However, we should start testing this ASAP. Another data point, Alexander Nevenchannyy pointed out to me that 3.5.0 came with an updated Erlang that has the following fix: OTP-11497 To prevent a race condition if there is a short communication problem when node-down and node-up events are received. They are now stored and later checked if the node came up just before mnesia flagged the node as down. (Thanks to Jonas Falkevik ) which seems interesting as well. thanks, dims [0] https://launchpad.net/ubuntu/vivid/+package/rabbitmq-server [1] http://www.erlang.org/download/otp_src_17.0.readme On Wed, Apr 29, 2015 at 4:04 AM, Bogdan Dobrelya bdobre...@mirantis.com mailto:bdobre...@mirantis.com wrote: Hello. There are several concerns why we have to upgrade RabbitMQ to 3.4.0 [0]: 1) At least two bugfixes related to the current high-load issue with MQ [1]: - 26404 prevent queue synchronisation from hanging if there is a very short partition just as it starts (since 3.1.0) - 26368 prevent autoheal from hanging when loser shuts down before the winner learns it is the winner (since 3.1.0) 2) We should as well check how the new 'pause-if-all-down' option works for split brain recovery. 3) We should address the 'force_boot' recommendations from this mail thread [2] to speed up the MQ cluster assemble time. The question is - is it worth it to do this in the 6.1 release scope? I vote to postpone this for the 7.0 dev cycle as the impact of such changes might be unpredictable. [0] https://www.rabbitmq.com/release-notes/README-3.4.0.txt [1] https://bugs.launchpad.net/fuel/+bug/1447619 [2] http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg51625.html -- Best regards, Bogdan Dobrelya, Skype #bogdando_at_yahoo.com http://bogdando_at_yahoo.com Irc #bogdando -- Tomasz 'Zen' Napierala Product Engineering - Poland -- Yours Faithfully, Vladimir Kuklin, Fuel Library Tech Lead, Mirantis, Inc. +7 (495) 640-49-04 +7 (926) 702-39-68 Skype kuklinvv 35bk3, Vorontsovskaya Str. Moscow, Russia, www.mirantis.com http://www.mirantis.ru/ www.mirantis.ru http://www.mirantis.ru/ vkuk...@mirantis.com mailto:vkuk...@mirantis.com __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev