Re: [Openstack-operators] Liberty and OVS Agent restarts

2016-02-13 Thread Jian Wen
Good to see the priority of the bug has been upgraded.

When we hit the bug, everyone was under huge pressure.

On Fri, Feb 12, 2016 at 11:01 PM, Clayton O'Neill 
wrote:

> I’ve tried it with both a blank value and the specific value.  It
> doesn’t appear to make a difference.
>
> In other news, Assaf Muller has upgraded the priority of the bug from
> low to high.
>
> On Fri, Feb 12, 2016 at 9:27 AM, Matt Kassawara 
> wrote:
> > Out of curiosity, what do you have for the "external_network_bridge"
> option
> > in the L3 agent config?
> >
> > On Wed, Feb 10, 2016 at 2:42 PM, Bajin, Joseph 
> wrote:
> >>
> >> Clayton,
> >>
> >> This is really good information.
> >>
> >> I’m wondering how we can help support you and get the necessary dev
> >> support to get this resolved sooner than later. I totally agree with you
> >> that this should be backported to at least Liberty.
> >>
> >> Please let me know how I and other can help!
> >>
> >> —Joe
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> On 2/10/16, 8:55 AM, "Clayton O'Neill"  wrote:
> >>
> >> >Summary: Liberty OVS agent restarts are better, but still need work.
> >> >See: https://bugs.launchpad.net/neutron/+bug/1514056
> >> >
> >> >As many of you know, Liberty has a fix for OVS agent restarts such
> >> >that it doesn’t dump all flows when starting, resulting in a loss of
> >> >traffic.  Unfortunately, Liberty neutron still has issues with OVS
> >> >agent restarts.  The fix that went into Liberty prevents it from
> >> >dropping flows on the br-tun and br-int bridges and that helps
> >> >greatly, but the br-ex bridge still has it’s flows cleared on startup.
> >> >
> >> >You may be thinking: Wait, br-ex only has like 3 flows on it, how can
> >> >that be a problem?  The issue appears to be that the br-ex flows are
> >> >cleared early and not setup again until late in the process.  This
> >> >means that routers on the node where OVS agent is lose network
> >> >connectivity for the majority of the restart time.
> >> >
> >> >I did some testing with this yesterday, comparing a few scenarios with
> >> >100 FIPS, 100 instances and various scenarios for routers.  You can
> >> >find the the complete data here:
> >>
> >> > >
> https://docs.google.com/spreadsheets/d/1ZGra_MszBlL0fNsFqd4nOvh1PsgWu58-GxEeh1m1BPw/edit?usp=sharing
> >> >
> >> >The summary looks like this:
> >> >100 routers, 100 networks, 100 floating ips, 100 instances, single node
> >> > test:
> >> >Kilo average outage time: 47 seconds
> >> >Liberty average outage time: 37 seconds
> >> >
> >> >1 router, 1 network, 100 floating ips, 100 instances, single node test:
> >> >Kilo average outage time: 46 seconds
> >> >Liberty average outage time: 13 seconds
> >> >
> >> >1 router, 1 network, 100 floating its, 100 instances, router on a
> >> >separate node, all instances on a single node, OVS restart on compute
> >> >node:
> >> >Kilo average outage time: 25 seconds
> >> >Liberty average outage time: 0 to 1 seconds
> >> >
> >> >I did my testing using 1 second pings using fping to all of the
> >> >floating IPs.  With the last test, it frequently lost no packets, and
> >> >as a result I was not really able to test the scenario other than to
> >> >qualify it as good.
> >> >
> >> >This is a huge operational issue for us and I suspect for many of the
> >> >rest of you using OVS.  I’d encourage everyone that is using OVS to
> >> >register interest in having this fixed in the LP bug
> >> >(https://bugs.launchpad.net/neutron/+bug/1514056).  Right now this bug
> >> >as marked as low priority.
> >> >
> >> >___
> >> >OpenStack-operators mailing list
> >> >OpenStack-operators@lists.openstack.org
> >> >
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >>
> >> ___
> >> OpenStack-operators mailing list
> >> OpenStack-operators@lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >>
> >
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>



-- 
Best,

Jian
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Liberty and OVS Agent restarts

2016-02-12 Thread Matt Kassawara
Out of curiosity, what do you have for the "external_network_bridge" option
in the L3 agent config?

On Wed, Feb 10, 2016 at 2:42 PM, Bajin, Joseph  wrote:

> Clayton,
>
> This is really good information.
>
> I’m wondering how we can help support you and get the necessary dev
> support to get this resolved sooner than later. I totally agree with you
> that this should be backported to at least Liberty.
>
> Please let me know how I and other can help!
>
> —Joe
>
>
>
>
>
>
>
>
>
> On 2/10/16, 8:55 AM, "Clayton O'Neill"  wrote:
>
> >Summary: Liberty OVS agent restarts are better, but still need work.
> >See: https://bugs.launchpad.net/neutron/+bug/1514056
> >
> >As many of you know, Liberty has a fix for OVS agent restarts such
> >that it doesn’t dump all flows when starting, resulting in a loss of
> >traffic.  Unfortunately, Liberty neutron still has issues with OVS
> >agent restarts.  The fix that went into Liberty prevents it from
> >dropping flows on the br-tun and br-int bridges and that helps
> >greatly, but the br-ex bridge still has it’s flows cleared on startup.
> >
> >You may be thinking: Wait, br-ex only has like 3 flows on it, how can
> >that be a problem?  The issue appears to be that the br-ex flows are
> >cleared early and not setup again until late in the process.  This
> >means that routers on the node where OVS agent is lose network
> >connectivity for the majority of the restart time.
> >
> >I did some testing with this yesterday, comparing a few scenarios with
> >100 FIPS, 100 instances and various scenarios for routers.  You can
> >find the the complete data here:
> >
> https://docs.google.com/spreadsheets/d/1ZGra_MszBlL0fNsFqd4nOvh1PsgWu58-GxEeh1m1BPw/edit?usp=sharing
> >
> >The summary looks like this:
> >100 routers, 100 networks, 100 floating ips, 100 instances, single node
> test:
> >Kilo average outage time: 47 seconds
> >Liberty average outage time: 37 seconds
> >
> >1 router, 1 network, 100 floating ips, 100 instances, single node test:
> >Kilo average outage time: 46 seconds
> >Liberty average outage time: 13 seconds
> >
> >1 router, 1 network, 100 floating its, 100 instances, router on a
> >separate node, all instances on a single node, OVS restart on compute
> >node:
> >Kilo average outage time: 25 seconds
> >Liberty average outage time: 0 to 1 seconds
> >
> >I did my testing using 1 second pings using fping to all of the
> >floating IPs.  With the last test, it frequently lost no packets, and
> >as a result I was not really able to test the scenario other than to
> >qualify it as good.
> >
> >This is a huge operational issue for us and I suspect for many of the
> >rest of you using OVS.  I’d encourage everyone that is using OVS to
> >register interest in having this fixed in the LP bug
> >(https://bugs.launchpad.net/neutron/+bug/1514056).  Right now this bug
> >as marked as low priority.
> >
> >___
> >OpenStack-operators mailing list
> >OpenStack-operators@lists.openstack.org
> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Liberty and OVS Agent restarts

2016-02-12 Thread Clayton O'Neill
I’ve tried it with both a blank value and the specific value.  It
doesn’t appear to make a difference.

In other news, Assaf Muller has upgraded the priority of the bug from
low to high.

On Fri, Feb 12, 2016 at 9:27 AM, Matt Kassawara  wrote:
> Out of curiosity, what do you have for the "external_network_bridge" option
> in the L3 agent config?
>
> On Wed, Feb 10, 2016 at 2:42 PM, Bajin, Joseph  wrote:
>>
>> Clayton,
>>
>> This is really good information.
>>
>> I’m wondering how we can help support you and get the necessary dev
>> support to get this resolved sooner than later. I totally agree with you
>> that this should be backported to at least Liberty.
>>
>> Please let me know how I and other can help!
>>
>> —Joe
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On 2/10/16, 8:55 AM, "Clayton O'Neill"  wrote:
>>
>> >Summary: Liberty OVS agent restarts are better, but still need work.
>> >See: https://bugs.launchpad.net/neutron/+bug/1514056
>> >
>> >As many of you know, Liberty has a fix for OVS agent restarts such
>> >that it doesn’t dump all flows when starting, resulting in a loss of
>> >traffic.  Unfortunately, Liberty neutron still has issues with OVS
>> >agent restarts.  The fix that went into Liberty prevents it from
>> >dropping flows on the br-tun and br-int bridges and that helps
>> >greatly, but the br-ex bridge still has it’s flows cleared on startup.
>> >
>> >You may be thinking: Wait, br-ex only has like 3 flows on it, how can
>> >that be a problem?  The issue appears to be that the br-ex flows are
>> >cleared early and not setup again until late in the process.  This
>> >means that routers on the node where OVS agent is lose network
>> >connectivity for the majority of the restart time.
>> >
>> >I did some testing with this yesterday, comparing a few scenarios with
>> >100 FIPS, 100 instances and various scenarios for routers.  You can
>> >find the the complete data here:
>>
>> > >https://docs.google.com/spreadsheets/d/1ZGra_MszBlL0fNsFqd4nOvh1PsgWu58-GxEeh1m1BPw/edit?usp=sharing
>> >
>> >The summary looks like this:
>> >100 routers, 100 networks, 100 floating ips, 100 instances, single node
>> > test:
>> >Kilo average outage time: 47 seconds
>> >Liberty average outage time: 37 seconds
>> >
>> >1 router, 1 network, 100 floating ips, 100 instances, single node test:
>> >Kilo average outage time: 46 seconds
>> >Liberty average outage time: 13 seconds
>> >
>> >1 router, 1 network, 100 floating its, 100 instances, router on a
>> >separate node, all instances on a single node, OVS restart on compute
>> >node:
>> >Kilo average outage time: 25 seconds
>> >Liberty average outage time: 0 to 1 seconds
>> >
>> >I did my testing using 1 second pings using fping to all of the
>> >floating IPs.  With the last test, it frequently lost no packets, and
>> >as a result I was not really able to test the scenario other than to
>> >qualify it as good.
>> >
>> >This is a huge operational issue for us and I suspect for many of the
>> >rest of you using OVS.  I’d encourage everyone that is using OVS to
>> >register interest in having this fixed in the LP bug
>> >(https://bugs.launchpad.net/neutron/+bug/1514056).  Right now this bug
>> >as marked as low priority.
>> >
>> >___
>> >OpenStack-operators mailing list
>> >OpenStack-operators@lists.openstack.org
>> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>> ___
>> OpenStack-operators mailing list
>> OpenStack-operators@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Liberty and OVS Agent restarts

2016-02-10 Thread Bajin, Joseph
Clayton, 

This is really good information. 

I’m wondering how we can help support you and get the necessary dev support to 
get this resolved sooner than later. I totally agree with you that this should 
be backported to at least Liberty. 

Please let me know how I and other can help!

—Joe









On 2/10/16, 8:55 AM, "Clayton O'Neill"  wrote:

>Summary: Liberty OVS agent restarts are better, but still need work.
>See: https://bugs.launchpad.net/neutron/+bug/1514056
>
>As many of you know, Liberty has a fix for OVS agent restarts such
>that it doesn’t dump all flows when starting, resulting in a loss of
>traffic.  Unfortunately, Liberty neutron still has issues with OVS
>agent restarts.  The fix that went into Liberty prevents it from
>dropping flows on the br-tun and br-int bridges and that helps
>greatly, but the br-ex bridge still has it’s flows cleared on startup.
>
>You may be thinking: Wait, br-ex only has like 3 flows on it, how can
>that be a problem?  The issue appears to be that the br-ex flows are
>cleared early and not setup again until late in the process.  This
>means that routers on the node where OVS agent is lose network
>connectivity for the majority of the restart time.
>
>I did some testing with this yesterday, comparing a few scenarios with
>100 FIPS, 100 instances and various scenarios for routers.  You can
>find the the complete data here:
>https://docs.google.com/spreadsheets/d/1ZGra_MszBlL0fNsFqd4nOvh1PsgWu58-GxEeh1m1BPw/edit?usp=sharing
>
>The summary looks like this:
>100 routers, 100 networks, 100 floating ips, 100 instances, single node test:
>Kilo average outage time: 47 seconds
>Liberty average outage time: 37 seconds
>
>1 router, 1 network, 100 floating ips, 100 instances, single node test:
>Kilo average outage time: 46 seconds
>Liberty average outage time: 13 seconds
>
>1 router, 1 network, 100 floating its, 100 instances, router on a
>separate node, all instances on a single node, OVS restart on compute
>node:
>Kilo average outage time: 25 seconds
>Liberty average outage time: 0 to 1 seconds
>
>I did my testing using 1 second pings using fping to all of the
>floating IPs.  With the last test, it frequently lost no packets, and
>as a result I was not really able to test the scenario other than to
>qualify it as good.
>
>This is a huge operational issue for us and I suspect for many of the
>rest of you using OVS.  I’d encourage everyone that is using OVS to
>register interest in having this fixed in the LP bug
>(https://bugs.launchpad.net/neutron/+bug/1514056).  Right now this bug
>as marked as low priority.
>
>___
>OpenStack-operators mailing list
>OpenStack-operators@lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators