Re: [openstack-dev] [neutron][stable] - exception for stable/kilo DVR back-ports

2016-05-04 Thread Kevin Benton
So I was doing some testing and this looks like it will happen very quickly
after an agent restarts.

The rule priorities are stored as a set and are allocated from by calling
.pop(). In local testing, it looks like that pretty consistently pulls the
first priority off of the range. http://paste.openstack.org/show/496168/

This means that after restarting an l3 agent, a newly associated floating
IP will likely have the same priority as another one already on the agent.
Then when either floating IP is disassociated, it will break the other one.

On Wed, May 4, 2016 at 3:20 PM, Armando M.  wrote:

>
>
> On 4 May 2016 at 14:26, Assaf Muller  wrote:
>
>> On Wed, May 4, 2016 at 4:54 PM, Kevin Benton  wrote:
>> > Hello,
>> >
>> > I would like to propose a freeze exception for
>> > https://review.openstack.org/#/c/312253/ and
>> > https://review.openstack.org/#/c/312254/ . They address a bug in DVR
>> that
>> > causes floating IPs to eventually break after an L3 agent has been
>> > restarted. It's a serious bug but it's very subtle because it takes a
>> busy
>> > system and bad luck to trigger it.
>> >
>> > If we decide against the back-port a workaround could be to advise all
>> > distros/operators to call the namespace cleanup script every time the l3
>> > agent is restarted, which would prevent this issue, but at the cost of
>> > disrupting traffic on the agent restart.
>>
>> That's not something I could seriously suggest to users, meaning that
>> said users will just cherry pick these patches anyway. Might as well
>> prevent the pain proactively and merge it to stable/kilo.
>>
>
> Bear in mind that we don't test DVR on kilo [1] and taking into
> consideration also the likelihood of the issue, I would advise against the
> backport.
>
> [1]
> https://github.com/openstack-infra/project-config/blob/master/zuul/layout.yaml#L2261
>
>
>>
>> >
>> >
>> > Cheers,
>> > Kevin Benton
>> >
>> >
>> __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][stable] - exception for stable/kilo DVR back-ports

2016-05-04 Thread Armando M.
On 4 May 2016 at 14:26, Assaf Muller  wrote:

> On Wed, May 4, 2016 at 4:54 PM, Kevin Benton  wrote:
> > Hello,
> >
> > I would like to propose a freeze exception for
> > https://review.openstack.org/#/c/312253/ and
> > https://review.openstack.org/#/c/312254/ . They address a bug in DVR
> that
> > causes floating IPs to eventually break after an L3 agent has been
> > restarted. It's a serious bug but it's very subtle because it takes a
> busy
> > system and bad luck to trigger it.
> >
> > If we decide against the back-port a workaround could be to advise all
> > distros/operators to call the namespace cleanup script every time the l3
> > agent is restarted, which would prevent this issue, but at the cost of
> > disrupting traffic on the agent restart.
>
> That's not something I could seriously suggest to users, meaning that
> said users will just cherry pick these patches anyway. Might as well
> prevent the pain proactively and merge it to stable/kilo.
>

Bear in mind that we don't test DVR on kilo [1] and taking into
consideration also the likelihood of the issue, I would advise against the
backport.

[1]
https://github.com/openstack-infra/project-config/blob/master/zuul/layout.yaml#L2261


>
> >
> >
> > Cheers,
> > Kevin Benton
> >
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][stable] - exception for stable/kilo DVR back-ports

2016-05-04 Thread Assaf Muller
On Wed, May 4, 2016 at 4:54 PM, Kevin Benton  wrote:
> Hello,
>
> I would like to propose a freeze exception for
> https://review.openstack.org/#/c/312253/ and
> https://review.openstack.org/#/c/312254/ . They address a bug in DVR that
> causes floating IPs to eventually break after an L3 agent has been
> restarted. It's a serious bug but it's very subtle because it takes a busy
> system and bad luck to trigger it.
>
> If we decide against the back-port a workaround could be to advise all
> distros/operators to call the namespace cleanup script every time the l3
> agent is restarted, which would prevent this issue, but at the cost of
> disrupting traffic on the agent restart.

That's not something I could seriously suggest to users, meaning that
said users will just cherry pick these patches anyway. Might as well
prevent the pain proactively and merge it to stable/kilo.

>
>
> Cheers,
> Kevin Benton
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [neutron][stable] - exception for stable/kilo DVR back-ports

2016-05-04 Thread Kevin Benton
Hello,

I would like to propose a freeze exception for
https://review.openstack.org/#/c/312253/ and
https://review.openstack.org/#/c/312254/ . They address a bug in DVR that
causes floating IPs to eventually break after an L3 agent has been
restarted. It's a serious bug but it's very subtle because it takes a busy
system and bad luck to trigger it.

If we decide against the back-port a workaround could be to advise all
distros/operators to call the namespace cleanup script every time the l3
agent is restarted, which would prevent this issue, but at the cost of
disrupting traffic on the agent restart.


Cheers,
Kevin Benton
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev