Re: [openstack-dev] [neutron][stable] - exception for stable/kilo DVR back-ports
So I was doing some testing and this looks like it will happen very quickly after an agent restarts. The rule priorities are stored as a set and are allocated from by calling .pop(). In local testing, it looks like that pretty consistently pulls the first priority off of the range. http://paste.openstack.org/show/496168/ This means that after restarting an l3 agent, a newly associated floating IP will likely have the same priority as another one already on the agent. Then when either floating IP is disassociated, it will break the other one. On Wed, May 4, 2016 at 3:20 PM, Armando M. wrote: > > > On 4 May 2016 at 14:26, Assaf Muller wrote: > >> On Wed, May 4, 2016 at 4:54 PM, Kevin Benton wrote: >> > Hello, >> > >> > I would like to propose a freeze exception for >> > https://review.openstack.org/#/c/312253/ and >> > https://review.openstack.org/#/c/312254/ . They address a bug in DVR >> that >> > causes floating IPs to eventually break after an L3 agent has been >> > restarted. It's a serious bug but it's very subtle because it takes a >> busy >> > system and bad luck to trigger it. >> > >> > If we decide against the back-port a workaround could be to advise all >> > distros/operators to call the namespace cleanup script every time the l3 >> > agent is restarted, which would prevent this issue, but at the cost of >> > disrupting traffic on the agent restart. >> >> That's not something I could seriously suggest to users, meaning that >> said users will just cherry pick these patches anyway. Might as well >> prevent the pain proactively and merge it to stable/kilo. >> > > Bear in mind that we don't test DVR on kilo [1] and taking into > consideration also the likelihood of the issue, I would advise against the > backport. > > [1] > https://github.com/openstack-infra/project-config/blob/master/zuul/layout.yaml#L2261 > > >> >> > >> > >> > Cheers, >> > Kevin Benton >> > >> > >> __ >> > OpenStack Development Mailing List (not for usage questions) >> > Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > >> >> __ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: >> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][stable] - exception for stable/kilo DVR back-ports
On 4 May 2016 at 14:26, Assaf Muller wrote: > On Wed, May 4, 2016 at 4:54 PM, Kevin Benton wrote: > > Hello, > > > > I would like to propose a freeze exception for > > https://review.openstack.org/#/c/312253/ and > > https://review.openstack.org/#/c/312254/ . They address a bug in DVR > that > > causes floating IPs to eventually break after an L3 agent has been > > restarted. It's a serious bug but it's very subtle because it takes a > busy > > system and bad luck to trigger it. > > > > If we decide against the back-port a workaround could be to advise all > > distros/operators to call the namespace cleanup script every time the l3 > > agent is restarted, which would prevent this issue, but at the cost of > > disrupting traffic on the agent restart. > > That's not something I could seriously suggest to users, meaning that > said users will just cherry pick these patches anyway. Might as well > prevent the pain proactively and merge it to stable/kilo. > Bear in mind that we don't test DVR on kilo [1] and taking into consideration also the likelihood of the issue, I would advise against the backport. [1] https://github.com/openstack-infra/project-config/blob/master/zuul/layout.yaml#L2261 > > > > > > > Cheers, > > Kevin Benton > > > > > __ > > OpenStack Development Mailing List (not for usage questions) > > Unsubscribe: > openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [neutron][stable] - exception for stable/kilo DVR back-ports
On Wed, May 4, 2016 at 4:54 PM, Kevin Benton wrote: > Hello, > > I would like to propose a freeze exception for > https://review.openstack.org/#/c/312253/ and > https://review.openstack.org/#/c/312254/ . They address a bug in DVR that > causes floating IPs to eventually break after an L3 agent has been > restarted. It's a serious bug but it's very subtle because it takes a busy > system and bad luck to trigger it. > > If we decide against the back-port a workaround could be to advise all > distros/operators to call the namespace cleanup script every time the l3 > agent is restarted, which would prevent this issue, but at the cost of > disrupting traffic on the agent restart. That's not something I could seriously suggest to users, meaning that said users will just cherry pick these patches anyway. Might as well prevent the pain proactively and merge it to stable/kilo. > > > Cheers, > Kevin Benton > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [neutron][stable] - exception for stable/kilo DVR back-ports
Hello, I would like to propose a freeze exception for https://review.openstack.org/#/c/312253/ and https://review.openstack.org/#/c/312254/ . They address a bug in DVR that causes floating IPs to eventually break after an L3 agent has been restarted. It's a serious bug but it's very subtle because it takes a busy system and bad luck to trigger it. If we decide against the back-port a workaround could be to advise all distros/operators to call the namespace cleanup script every time the l3 agent is restarted, which would prevent this issue, but at the cost of disrupting traffic on the agent restart. Cheers, Kevin Benton __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev