[Openstack-operators] Liberty and OVS Agent restarts

2016-02-10 Thread Clayton O'Neill
Summary: Liberty OVS agent restarts are better, but still need work.
See: https://bugs.launchpad.net/neutron/+bug/1514056

As many of you know, Liberty has a fix for OVS agent restarts such
that it doesn’t dump all flows when starting, resulting in a loss of
traffic.  Unfortunately, Liberty neutron still has issues with OVS
agent restarts.  The fix that went into Liberty prevents it from
dropping flows on the br-tun and br-int bridges and that helps
greatly, but the br-ex bridge still has it’s flows cleared on startup.

You may be thinking: Wait, br-ex only has like 3 flows on it, how can
that be a problem?  The issue appears to be that the br-ex flows are
cleared early and not setup again until late in the process.  This
means that routers on the node where OVS agent is lose network
connectivity for the majority of the restart time.

I did some testing with this yesterday, comparing a few scenarios with
100 FIPS, 100 instances and various scenarios for routers.  You can
find the the complete data here:
https://docs.google.com/spreadsheets/d/1ZGra_MszBlL0fNsFqd4nOvh1PsgWu58-GxEeh1m1BPw/edit?usp=sharing

The summary looks like this:
100 routers, 100 networks, 100 floating ips, 100 instances, single node test:
Kilo average outage time: 47 seconds
Liberty average outage time: 37 seconds

1 router, 1 network, 100 floating ips, 100 instances, single node test:
Kilo average outage time: 46 seconds
Liberty average outage time: 13 seconds

1 router, 1 network, 100 floating its, 100 instances, router on a
separate node, all instances on a single node, OVS restart on compute
node:
Kilo average outage time: 25 seconds
Liberty average outage time: 0 to 1 seconds

I did my testing using 1 second pings using fping to all of the
floating IPs.  With the last test, it frequently lost no packets, and
as a result I was not really able to test the scenario other than to
qualify it as good.

This is a huge operational issue for us and I suspect for many of the
rest of you using OVS.  I’d encourage everyone that is using OVS to
register interest in having this fixed in the LP bug
(https://bugs.launchpad.net/neutron/+bug/1514056).  Right now this bug
as marked as low priority.

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Anyone using Project Calico for tenant networking?

2016-02-10 Thread Ned Rhudy (BLOOMBERG/ 731 LEX)
Thanks Neil, very helpful.

From: neil.jer...@metaswitch.com 
Subject: Re: [Openstack-operators] Anyone using Project Calico for tenant 
networking?

Hi Ned,

Sorry for the delay in following up here.

On 06/02/16 14:40, Ned Rhudy (BLOOMBERG/ 731 LEX) wrote:
> Thanks. Having read the documentation, I have one question about the
> network design. Basically, our use case specifies that instances be able
> to have a stable IP across terminations; effectively what we'd like to
> do is have a setup where both the fixed and floating IPs are routable
> outside the cluster. Any given instance should get a routable IP when it
> launches, but additionally be able to take a floating IP that would act
> as a stable endpoint for other things to reference.
>
> The Calico docs specify that you can create public/private IPv4 networks
> in Neutron, both with DHCP enabled. Is it possible to accomplish what
> I'm talking about by creating what are two public IPv4 subnets, one with
> DHCP enabled and one with DHCP disabled that would be used as the float
> pool? Or is this not possible?

For the fixed IPs, yes.  For the float pool, no, I'm afraid we don't 
have that in Calico yet, and I'm not sure if it will take precisely that 
form when we do have floating IP support.

There is work in progress on Calico support for floating IPs, and the 
code for this can be seen at https://review.openstack.org/#/c/253634/ 
and https://github.com/projectcalico/calico/pull/848.  I can't yet say 
when this will land, though.

In terms of how floating IPs are represented in the Neutron data model: 
currently they require a relationship between an external Network, a 
Router and a tenant Network.  The floating IP pool is defined as a 
subnet on the external Network; each allocated floating IP maps onto one 
of the fixed IPs of the tenant network; and the agent that implements 
the Router does the inbound DNAT between those two.

As you've written, floating IPs are interesting for external or provider 
networks too, so we'd be interested in an enhancement to the Neutron 
model to allow that, and I believe there are other interested parties 
too.  But that will take time to agree, and it isn't one of my own 
priorities at the moment.

Hope that's useful.  Best wishes,

  Neil

>
> - Original Message -
> From: Neil Jerram  >
> To: EDMUND RHUDY, openstack-operators@lists.openstack.org
> 
> At: 05-Feb-2016 14:11:34
>
> On 05/02/16 19:03, Ned Rhudy (BLOOMBERG/ 731 LEX) wrote:
> > I meant in a general sense of the networking technology that you're
> > using for instance networking, not in the sense of per-tenant networks,
> > though my wording was ambiguous. Part of our larger question centers
> > around the viability of tying instances directly to a provider network.
> > Being that we only operate a private cloud for internal consumption,
> > doing so would have some attractive upsides; tenants clamor for the IP
> > inside their instance to be the same as the floating IP that the outside
> > world sees, but nobody's ever asked us about the ability to roll their
> > own network topology, so we think we could probably do without that.
>
> Cool, IMO that's a good match for what Calico provides.
>


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Liberty and OVS Agent restarts

2016-02-10 Thread Bajin, Joseph
Clayton, 

This is really good information. 

I’m wondering how we can help support you and get the necessary dev support to 
get this resolved sooner than later. I totally agree with you that this should 
be backported to at least Liberty. 

Please let me know how I and other can help!

—Joe









On 2/10/16, 8:55 AM, "Clayton O'Neill"  wrote:

>Summary: Liberty OVS agent restarts are better, but still need work.
>See: https://bugs.launchpad.net/neutron/+bug/1514056
>
>As many of you know, Liberty has a fix for OVS agent restarts such
>that it doesn’t dump all flows when starting, resulting in a loss of
>traffic.  Unfortunately, Liberty neutron still has issues with OVS
>agent restarts.  The fix that went into Liberty prevents it from
>dropping flows on the br-tun and br-int bridges and that helps
>greatly, but the br-ex bridge still has it’s flows cleared on startup.
>
>You may be thinking: Wait, br-ex only has like 3 flows on it, how can
>that be a problem?  The issue appears to be that the br-ex flows are
>cleared early and not setup again until late in the process.  This
>means that routers on the node where OVS agent is lose network
>connectivity for the majority of the restart time.
>
>I did some testing with this yesterday, comparing a few scenarios with
>100 FIPS, 100 instances and various scenarios for routers.  You can
>find the the complete data here:
>https://docs.google.com/spreadsheets/d/1ZGra_MszBlL0fNsFqd4nOvh1PsgWu58-GxEeh1m1BPw/edit?usp=sharing
>
>The summary looks like this:
>100 routers, 100 networks, 100 floating ips, 100 instances, single node test:
>Kilo average outage time: 47 seconds
>Liberty average outage time: 37 seconds
>
>1 router, 1 network, 100 floating ips, 100 instances, single node test:
>Kilo average outage time: 46 seconds
>Liberty average outage time: 13 seconds
>
>1 router, 1 network, 100 floating its, 100 instances, router on a
>separate node, all instances on a single node, OVS restart on compute
>node:
>Kilo average outage time: 25 seconds
>Liberty average outage time: 0 to 1 seconds
>
>I did my testing using 1 second pings using fping to all of the
>floating IPs.  With the last test, it frequently lost no packets, and
>as a result I was not really able to test the scenario other than to
>qualify it as good.
>
>This is a huge operational issue for us and I suspect for many of the
>rest of you using OVS.  I’d encourage everyone that is using OVS to
>register interest in having this fixed in the LP bug
>(https://bugs.launchpad.net/neutron/+bug/1514056).  Right now this bug
>as marked as low priority.
>
>___
>OpenStack-operators mailing list
>OpenStack-operators@lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

smime.p7s
Description: S/MIME cryptographic signature
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Anyone using Project Calico for tenant networking?

2016-02-10 Thread Carl Baldwin
On Wed, Feb 10, 2016 at 5:06 AM, Neil Jerram  wrote:
> In terms of how floating IPs are represented in the Neutron data model:
> currently they require a relationship between an external Network, a
> Router and a tenant Network.  The floating IP pool is defined as a
> subnet on the external Network; each allocated floating IP maps onto one
> of the fixed IPs of the tenant network; and the agent that implements
> the Router does the inbound DNAT between those two.
>
> As you've written, floating IPs are interesting for external or provider
> networks too, so we'd be interested in an enhancement to the Neutron
> model to allow that, and I believe there are other interested parties
> too.  But that will take time to agree, and it isn't one of my own
> priorities at the moment.

This very thing has been on my mind for a while now.  I have had it on
my list to write a spec for this.  My latest thinking is that it would
be good to be able to mark certain subnets on the network as
'floating' and keep them apart from other subnets on the network.
Then, there are certain cases where floating IPs should work without
the router and the tenant network.

One big complication with doing this is whether the instance will
somehow know that it has the floating IP or will we need to do some
sort of NAT at the port.  I've talked with GoDaddy about how they do
it.  Their instances all understand what a floating IP is and when
they have one so that they can accept traffic to the floating IP
without any bind of address translation.

Another benefit to separating the floating IP pool from the other
addresses on the network is that we could use private addresses for
the non-floating pool so that routers connecting to a
"router:external" network could use them and not consume public IP
addresses.  This has been requested countless times.  This would also
solve the problem that DVR has where it is consuming public IP
addresses for no good reason.

It might be as simple as adding a flag to the subnet called
"is_floating" and then look through where IP allocations are done to
make sure that they come from (or at least *prefer* in some cases) the
appropriate pool.

Carl

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] [NFV][Telco] Telco Working Group Meeting for February 10th 2016 - CANCELLED

2016-02-10 Thread Steve Gordon
Hi all,

Unfortunately today's meeting of the Telco Working Group [1] is canceled, my 
apologies for the late notice!

Thanks,

Steve

[1] https://wiki.openstack.org/wiki/TelcoWorkingGroup

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Anyone using Project Calico for tenant networking?

2016-02-10 Thread Neil Jerram
Hi Ned,

Sorry for the delay in following up here.

On 06/02/16 14:40, Ned Rhudy (BLOOMBERG/ 731 LEX) wrote:
> Thanks. Having read the documentation, I have one question about the
> network design. Basically, our use case specifies that instances be able
> to have a stable IP across terminations; effectively what we'd like to
> do is have a setup where both the fixed and floating IPs are routable
> outside the cluster. Any given instance should get a routable IP when it
> launches, but additionally be able to take a floating IP that would act
> as a stable endpoint for other things to reference.
>
> The Calico docs specify that you can create public/private IPv4 networks
> in Neutron, both with DHCP enabled. Is it possible to accomplish what
> I'm talking about by creating what are two public IPv4 subnets, one with
> DHCP enabled and one with DHCP disabled that would be used as the float
> pool? Or is this not possible?

For the fixed IPs, yes.  For the float pool, no, I'm afraid we don't 
have that in Calico yet, and I'm not sure if it will take precisely that 
form when we do have floating IP support.

There is work in progress on Calico support for floating IPs, and the 
code for this can be seen at https://review.openstack.org/#/c/253634/ 
and https://github.com/projectcalico/calico/pull/848.  I can't yet say 
when this will land, though.

In terms of how floating IPs are represented in the Neutron data model: 
currently they require a relationship between an external Network, a 
Router and a tenant Network.  The floating IP pool is defined as a 
subnet on the external Network; each allocated floating IP maps onto one 
of the fixed IPs of the tenant network; and the agent that implements 
the Router does the inbound DNAT between those two.

As you've written, floating IPs are interesting for external or provider 
networks too, so we'd be interested in an enhancement to the Neutron 
model to allow that, and I believe there are other interested parties 
too.  But that will take time to agree, and it isn't one of my own 
priorities at the moment.

Hope that's useful.  Best wishes,

Neil

>
> - Original Message -
> From: Neil Jerram  >
> To: EDMUND RHUDY, openstack-operators@lists.openstack.org
> 
> At: 05-Feb-2016 14:11:34
>
> On 05/02/16 19:03, Ned Rhudy (BLOOMBERG/ 731 LEX) wrote:
> > I meant in a general sense of the networking technology that you're
> > using for instance networking, not in the sense of per-tenant networks,
> > though my wording was ambiguous. Part of our larger question centers
> > around the viability of tying instances directly to a provider network.
> > Being that we only operate a private cloud for internal consumption,
> > doing so would have some attractive upsides; tenants clamor for the IP
> > inside their instance to be the same as the floating IP that the outside
> > world sees, but nobody's ever asked us about the ability to roll their
> > own network topology, so we think we could probably do without that.
>
> Cool, IMO that's a good match for what Calico provides.
>


___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] DVR and public IP consumption

2016-02-10 Thread Carl Baldwin
On Thu, Feb 4, 2016 at 5:41 AM, Tomas Vondra  wrote:
> Hi Carl,
> sorry for the late reply, but these links of yours expanded to about 12 tabs
> in my browser, most with serveral pages of text. "Given lots of thought" may
> be an understatement.
>
> Both the specs sound very resonable to me. The second one is exactly what I
> was saying here before. (Evidently I was not the first.) Why was it not
> accepted? It seems quite easy to implement in contrast to full routed 
> networks.

All of those links are out of date.  As I mentioned to Neil in another
thread just now, I'm going to write a new spec for this based on the
current direction Neutron is taking.

> The work on routed networks will be beneficial mainly for large deployments,
> whose needs exceed the capacity of a few L2 domains. Small public deployers
> are working on the scale of tens of boxes, but hundreds of tenants. Each
> tenant gets a virtual router, which eats an IP. I only have 1024 IPs from
> RIPE and will probably get no more. If most of the tenants are small and
> only use a one or two VMs, I'm wasting up to 50% addresses and it is
> severely limiting my growth potential.

Understood.  I think it is about time we solved this.  Let's see what
we can get going in the rfe / spec process for Newton.

> I do not really understand why routed networks would be a prerequisite to
> using private IPs for router interfaces. I'm aiming at the last point from
> the Etherpad - Carrier grade NAT. Do you think that I could use the "Allow
> setting a tenant router's external IP" function and disable any checks if
> the specified IP is in the network defined as external? I already have a
> private subnet on the same L2 segment, that is NATted by the datacenter
> routers. The API is admin-only, so it would not create a risk. I would
> pre-create a router for each tenant and everyone would be happy. Floating
> IPs are taken care of at the compute nodes in DVR.

It isn't necessarily a prerequisite.  It has just been given more
priority and the work for routed networks will include a solution (at
least in part) for this.

I'm not sure that setting the router's external IP will work.  If you
decide to experiment, I'd be very interested in your results.  I think
we need a way to distinguish between two pools on the same network.
Find the post where I just replied to Neil and read that.  Hopefully
it makes sense.  This is exactly what I have mind currently and
hopefully can propose it as a spec or rfe soon.

Carl

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators