Re: [openstack-dev] [all] The future of the integrated release

2014-08-13 Thread Ian Wells
On 13 August 2014 06:01, Kyle Mestery  wrote:

> On Wed, Aug 13, 2014 at 5:15 AM, Daniel P. Berrange 
> wrote:
> > This idea of fixed slots is not really very appealing to me. It sounds
> > like we're adding a significant amount of buerocratic overhead to our
> > development process that is going to make us increasingly inefficient.
> > I don't want to waste time wating for a stalled blueprint to time out
> > before we give the slot to another blueprint.
> >
> I agree with all of Daniel's comments here, and these are the same
> reason I'm not in favor of "fixed slots" or "runways." As ttx has
> stated in this thread, we have done a really poor job as a project of
> understanding what are the priority items for a release, and sticking
> to those. Trying to solve that to put focus on the priority items,
> while allowing for smaller, low-overhead code and reviews should be
> the priority here.
>

It seems to me that we're addressing the symptom and not the cause of the
problem.  We've set ourselves up as more of a cathedral and less of a
bazaar in one important respect: core reviewers are inevitably going to be
a bottleneck.  The slots proposal is simply saying 'we can't think a way of
scaling beyond what we have, and so let's restrict the inflow of changes to
a manageable level' - it doesn't increase capacity at all, it simply
improves the efficiency of using the current capacity and leaves us with a
hard limit that's fractionally higher than we're currently managing - but
we still have a capacity ceiling.

In Linux, to take another large project with significant feature velocity,
there's a degree of decentralisation.  The ultimate cores review code, but
getting code in depends more on a wider network of trusted associates.  We
don't have the same setup: even *proposed* changes have to be reviewed by
two cores before it's necessarily worth writing anything to make the change
in question.  Everything goes through Gerrit, which is one, centralised,
location for everyone to put in their code.

I have no great answer to this, but is there a way - perhaps via team
sponsorship from cores to ensure that the general direction is right, and
cloned repositories for purpose-specific changes, as one example - that we
can get an audience of people to check, try and test proposed changes long
before they need reviewing for final inclusion?
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] What does NASA not using OpenStack mean to OS's future

2014-08-25 Thread Ian Wells
On 25 August 2014 10:34, Aryeh Friedman  wrote:

> Do you call Martin Meckos having no clue... he is the one that leveled the
> second worse criticism after mine... or is Euclapytus not one the founding
> members of OpenStack (after all many of the glance commands still use it's
> name)
>

You appear to be trolling, and throwing around amazingly easy-to-disprove
'factoids', in an inappropriate forum, in order to drum up support for your
own competing open source cloud platform.  Please stop.

Your time would be much better spent improving your platform rather than
coming up with frankly bizarre criticism of the competitors.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][NFV] VIF_VHOSTUSER

2014-08-30 Thread Ian Wells
The problem here is that you've removed the vif_driver option and now
you're preventing the inclusion of named VIF types into the generic driver,
which means that rather than adding a package to an installation to add
support for a VIF driver it's now necessary to change the Nova code (and
repackage it, or - ew - patch it in place after installation).  I
understand where you're coming from but unfortunately the two changes
together make things very awkward.  Granted that vif_driver needed to go
away - it was the wrong level of code and the actual value was coming from
the wrong place anyway (nova config and not Neutron) - but it's been
removed without a suitable substitute.

It's a little late for a feature for Juno, but I think we need to write
something discovers VIF types installed on the system.  That way you can
add a new VIF type to Nova by deploying a package (and perhaps naming it in
config as an available selection to offer to Neutron) *without* changing
the Nova tree itself.

In the meantime, I recommend you consult with the Neutron cores and see if
you can make an exception for the VHOSTUSER driver for the current
timescale.
-- 
Ian.



On 27 August 2014 07:30, Daniel P. Berrange  wrote:

> On Wed, Aug 27, 2014 at 04:06:25PM +0200, Luke Gorrie wrote:
> > Howdy!
> >
> > I am writing to ask whether it will be possible to merge VIF_VHOSTUSER
> [1]
> > in Juno?
> >
> > VIF_VHOSTUSER adds support for a QEMU 2.1 has a feature called vhost-user
> > [2] that allows a guest to do Virtio-net I/O via a userspace vswitch.
> This
> > makes it convenient to deploy new vswitches that are optimized for NFV
> > workloads, of which there are now several both open source and
> proprietary.
> >
> > The complication is that we have no CI coverage for this feature in Juno.
> > Originally we had anticipated merging a Neutron driver that would
> exercise
> > vhost-user but the Neutron core team requested that we develop that
> outside
> > of the Neutron tree for the time being instead [3].
> >
> > We are hoping that the Nova team will be willing to merge the feature
> even
> > so. Within the NFV subgroup it would help us to share more code with each
> > other and also be good for our morale :) particularly as the QEMU work
> was
> > done especially for use with OpenStack.
>
> Our general rule for accepting new VIF drivers in Nova is that Neutron
> should have accepted the corresponding other half of VIF driver, since
> nova does not want to add support for things that are not in-tree for
> Neutron.
>
> In this case addign the new VIF driver involves defining a new VIF type
> and corresponding metadata associated with it. This metadata is part of
> the public API definition, to be passed from Neutron to Nova during VIF
> plugging and so IMHO this has to be agreed upon and defined in tree for
> Neutron & Nova. So even if the VIF driver in Neutron were to live out
> of tree, at a very minimum I'd expect the VIF_VHOSTUSER part to be
> specified in-tree to Neutron, so that Nova has a defined interface it
> can rely on.
>
> So based on this policy, my recommendation would be to keep the Nova VIF
> support out of tree in your own branch of Nova codebase until Neutron team
> are willing to accept their half of the driver.
>
> In cases like this I think Nova & Neutron need to work together to agree
> on acceptance/rejection of the proposed feature. Having one project accept
> it and the other project reject, without them talking to each other is not
> a good position to be in.
>
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
> :|
> |: http://libvirt.org  -o- http://virt-manager.org
> :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/
> :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc
> :|
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPv6] Webex Recording of IPv6 / Neutron Sync-up

2013-12-04 Thread Ian Wells
Next time, could you perhaps do it (a) with a bit more notice and (b) at a
slightly more amenable time for us Europeans?


On 4 December 2013 15:27, Richard Woo  wrote:

> Shixiong,
>
> Thank you for the updates, do you mind to share the slide to the openstack
> mailing list?
>
> Richard
>
>
> On Tue, Dec 3, 2013 at 11:30 PM, Shixiong Shang <
> sparkofwisdom.cl...@gmail.com> wrote:
>
>>  Hi, guys:
>>
>> We had a great discussion tonight with stackers from Comcast, IBM, HP,
>> Cisco and Nephos6! Here is the debrief of what we discussed during this
>> 1-hr session:
>>
>> 1) Sean from Comcast provided clarification of his short-term and
>> mid-term goals in the proposed blueprint.
>> 2) Da Zhao, Yu Yang, and Xu Han from IBM went throughout the patches and
>> bug fixes they submitted.
>> 3) Brian from HP shared his view to support IPv6 and HA in the near
>> future.
>> 4) Shixiong from Nephos6 and Randy from Cisco presented a slide to
>> summarize the issues they encountered during POC together with the
>> solutions.
>> 5) We reached consensus to leverage the work Sean, Da Zhao have done
>> previously and integrate it with the L3 agent efforts brought by Shixiong
>> and Randy.
>>
>>
>> Please see below for Webex recording.
>>
>>
>> https://cisco.webex.com/ciscosales/lsr.php?AT=pb&SP=MC&rID=73520027&rKey=8e508b63604bb9d0
>>
>> IPv6 / Neutron synch-up-20131204 0204-1
>> Tuesday, December 3, 2013 9:04 pm New York Time
>> 1 Hour 4 Minutes
>>
>> Thanks!
>>
>> Shixiong
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ceilometer] [marconi] Notifications brainstorming session tomorrow @ 1500 UTC

2013-12-04 Thread Ian Wells
How frequent do you imagine these notifications being?  There's a wide
variation here between the 'blue moon' case where disk space is low and
frequent notifications of things like OS performance, which you might want
to display in Horizon or another monitoring tool on an every-few-seconds
basis, or instance state change, which is usually driven by polling at
present.

I'm not saying that we should necessarily design notifications for the
latter cases, because it introduces potentially quite a lot of
user-demanded load on the Openstack components, I'm just asking for a
statement of intent.
-- 
Ian.


On 4 December 2013 16:09, Kurt Griffiths wrote:

> Thanks! We touched on this briefly during the chat yesterday, and I will
> make sure it gets further attention.
>
> On 12/3/13, 3:54 AM, "Julien Danjou"  wrote:
>
> >On Mon, Dec 02 2013, Kurt Griffiths wrote:
> >
> >> Following up on some conversations we had at the summit, I¹d like to get
> >> folks together on IRC tomorrow to crystalize the design for a
> >>notifications
> >> project under the Marconi program. The project¹s goal is to create a
> >>service
> >> for surfacing events to end users (where a user can be a cloud app
> >> developer, or a customer using one of those apps). For example, a
> >>developer
> >> may want to be notified when one of their servers is low on disk space.
> >> Alternatively, a user of MyHipsterApp may want to get a text when one of
> >> their friends invites them to listen to That Band You¹ve Never Heard Of.
> >>
> >> Interested? Please join me and other members of the Marconi team
> >>tomorrow,
> >> Dec. 3rd, for a brainstorming session in #openstack-marconi at 1500
> >>
> >>UTC<
> http://www.timeanddate.com/worldclock/fixedtime.html?hour=15&min=0&se
> >>c=0>.
> >> Your contributions are crucial to making this project awesome.
> >>
> >> I¹ve seeded an etherpad for the discussion:
> >>
> >> https://etherpad.openstack.org/p/marconi-notifications-brainstorm
> >
> >This might (partially) overlap with what Ceilometer is doing with its
> >alarming feature, and one of the blueprint our roadmap for Icehouse:
> >
> >  https://blueprints.launchpad.net/ceilometer/+spec/alarm-on-notification
> >
> >While it doesn't solve the use case at the same level, the technical
> >mechanism is likely to be similar.
> >
> >--
> >Julien Danjou
> ># Free Software hacker # independent consultant
> ># http://julien.danjou.info
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Interfaces file format, was [Tempest] Need to prepare the IPv6 environment for static IPv6 injection test case

2013-12-04 Thread Ian Wells
We seem to have bound our config drive file formats to those used by the
operating system we're running, which doesn't seem like the right approach
to take.

Firstly, the above format doesn't actually work even for Debian-based
systems - if you have a network without ipv6, ipv6 ND will be enabled on
the ipv4-only interfaces, which strikes me as wrong.  (This is a feature of
Linux - ipv4 is enabled on interfaces which are specifically configured
with ipv4, but ipv6 is enabled on all interfaces that are brought up.)

But more importantly, the above file template only works for Debian-based
machines - not Redhat, not Windows, not anything else - and we seem to have
made that a feature of Openstack from the relatively early days of file
injection.  That's not an ipv6 only thing but a general statement.  It
seems wrong to have to extend Openstack's config drive injection for every
OS that might come along, so is there a way we can make this work without
tying the two things together?  Are we expecting the cloud-init code in
whatever OS to parse and understand this file format, or are they supposed
to use other information?  In general, what would the recommendation be for
someone using a VM where this config format is not native?

-- 
Ian.


On 2 December 2013 03:01, Yang XY Yu  wrote:

> Hi all stackers,
>
> Currently Neutron/Nova code has supported the static IPv6 injection, but
> there is no tempest scenario coverage to support IPv6 injection test case.
> So I finished the test case and run the it successfully in my local
> environment, and already submitted the code-review in community:
> *https://review.openstack.org/#/c/58721/*,
> but the community Jenkins env has not supported IPv6 and there are still a
> few pre-requisites setup below if running the test case correctly,
>
> 1. Special Image needed to support IPv6 by using cloud-init, currently the
> cirros image used by tempest does not installed cloud-init.
>
> 2. Prepare interfaces.template file below on compute node.
> edit  /usr/share/nova/interfaces.template
>
> # Injected by Nova on instance boot
> #
> # This file describes the network interfaces available on your system
> # and how to activate them. For more information, see interfaces(5).
>
> # The loopback network interface
> auto lo
> iface lo inet loopback
>
> {% for ifc in interfaces -%}
> auto {{ ifc.name }}
> {% if use_ipv6 -%}
> iface {{ ifc.name }} inet6 static
> address {{ ifc.address_v6 }}
> netmask {{ ifc.netmask_v6 }}
> {%- if ifc.gateway_v6 %}
> gateway {{ ifc.gateway_v6 }}
> {%- endif %}
> {%- endif %}
>
> {%- endfor %}
>
>
> So considering these two pre-requisites, what should be done to enable
> this patch for IPv6 injection? Should I open a bug for cirros to enable
> cloud-init?   Or skip the test case because of this bug ?
> Any comments are appreciated!
>
> Thanks & Best Regards,
> 
> Yang Yu(于杨)
> Cloud Solutions and OpenStack Development
> China Systems & Technology Laboratory Beijing
> E-mail: yuyan...@cn.ibm.com
> Tel: 86-10-82452757
> Address: Ring Bldg. No.28 Building, Zhong Guan Cun Software Park,
> No. 8 Dong Bei Wang West Road, ShangDi, Haidian District, Beijing 100193,
> P.R.China
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New API requirements, review of GCE

2013-12-04 Thread Ian Wells
On 20 November 2013 00:22, Robert Collins  wrote:

> On 20 November 2013 13:00, Sean Dague  wrote:
> > So we recently moved devstack gate to do con fig drive instead of
> metadata
> > service, and life was good (no one really noticed). In what ways is
> > configdrive insufficient compared to metadata service? And is that
> something
> > that we should be tackling?
>
> * The metadata service can be trivially updated live - and Heat wants
> to use this to get rid of it's own metadata service... whereas config
> drive requires unplugging the device, updating the data and replugging
> - and thats a bit more invasive.
>
> * Nova baremetal doesn't support config-drive today, and it's an open
> question as to whether we ever will - and if we do we can't do the hot
> unplug thing, so anyone using it would suffer downtime to update data.
>
> * config drive permits no-control-plane-visibility for instances,
> which some secure environments consider to be super important.
>
> So I think we'll have both indefinitely at this point - they serve
> overlapping but differing audiences.
>
> We should be testing both.
>

Since we've drifted off topic:

Metadata doesn't work with ipv6, because there's no well known ipv6 address
to go talk to (unless someone's made one up since I last looked).  Metadata
doesn't work if you have no IP address (!).  Metadata has certain
limitations with Neutron (you need a router on your network or you get no
data).

I think all the above things are fixable, and the only really concerning
one would be ipv6 + baremetal where apparently we have no solution that
works at present.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Neutron Distributed Virtual Router

2013-12-09 Thread Ian Wells
I would imagine that, from the Neutron perspective, you get a single router
whether or not it's distributed.  I think that if a router is distributed -
regardless of whether it's tenant-tenant or tenant-outside - it certainly
*could* have some sort of SLA flag, but I don't think a simple
'distributed' flag is either here or there; it's not telling the tenant
anything meaningful.


On 10 December 2013 00:48, Mike Wilson  wrote:

> I guess the question that immediately comes to mind is, is there anyone
> that doesn't want a distributed router? I guess there could be someone out
> there that hates the idea of traffic flowing in a balanced fashion, but
> can't they just run a single router then? Does there really need to be some
> flag to disable/enable this behavior? Maybe I am oversimplifying things...
> you tell me.
>
> -Mike Wilson
>
>
> On Mon, Dec 9, 2013 at 3:01 PM, Vasudevan, Swaminathan (PNB Roseville) <
> swaminathan.vasude...@hp.com> wrote:
>
>>  Hi Folks,
>>
>> We are in the process of defining the API for the Neutron Distributed
>> Virtual Router, and we have a question.
>>
>>
>>
>> Just wanted to get the feedback from the community before we implement
>> and post for review.
>>
>>
>>
>> We are planning to use the “distributed” flag for the routers that are
>> supposed to be routing traffic locally (both East West and North South).
>>
>> This “distributed” flag is already there in the “neutronclient” API, but
>> currently only utilized by the “Nicira Plugin”.
>>
>> We would like to go ahead and use the same “distributed” flag and add an
>> extension to the router table to accommodate the “distributed flag”.
>>
>>
>>
>> Please let us know your feedback.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> Swaminathan Vasudevan
>>
>> Systems Software Engineer (TC)
>>
>>
>>
>>
>>
>> HP Networking
>>
>> Hewlett-Packard
>>
>> 8000 Foothills Blvd
>>
>> M/S 5541
>>
>> Roseville, CA - 95747
>>
>> tel: 916.785.0937
>>
>> fax: 916.785.1815
>>
>> email: swaminathan.vasude...@hp.com
>>
>>
>>
>>
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-10 Thread Ian Wells
On 10 December 2013 20:55, Clint Byrum  wrote:

> If it is just a network API, it works the same for everybody. This
> makes it simpler, and thus easier to scale out independently of compute
> hosts. It is also something we already support and can very easily expand
> by just adding a tiny bit of functionality to neutron-metadata-agent.
>
> In fact we can even push routes via DHCP to send agent traffic through
> a different neutron-metadata-agent, so I don't see any issue where we
> are piling anything on top of an overstressed single resource. We can
> have neutron route this traffic directly to the Heat API which hosts it,
> and that can be load balanced and etc. etc. What is the exact scenario
> you're trying to avoid?
>

You may be making even this harder than it needs to be.  You can create
multiple networks and attach machines to multiple networks.  Every point so
far has been 'why don't we use  as a backdoor into our VM without
affecting the VM in any other way' - why can't that just be one more
network interface set aside for whatever management  instructions are
appropriate?  And then what needs pushing into Neutron is nothing more
complex than strong port firewalling to prevent the slaves/minions talking
to each other.  If you absolutely must make the communication come from a
system agent and go to a VM, then that can be done by attaching the system
agent to the administrative network - from within the system agent, which
is the thing that needs this, rather than within Neutron, which doesn't
really care how you use its networks.  I prefer solutions where other tools
don't have to make you a special case.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Neutron Distributed Virtual Router

2013-12-11 Thread Ian Wells
Are these NSX routers *functionally* different?

What we're talking about here is a router which, whether it's distributed
or not, behaves *exactly the same*.  So as I say, maybe it's an SLA thing,
but 'distributed' isn't really user meaningful if the user can't actually
prove he's received a distributed router by using the APIs or seeing
traffic flow differently.

I think, by the names you're referring to, the NSX routers acutally have
different user visible behaviour, and that's a different matter entirely,
obviously you want, as a user, to choose one or the other.
-- 
Ian.


On 10 December 2013 23:21, Vasudevan, Swaminathan (PNB Roseville) <
swaminathan.vasude...@hp.com> wrote:

>  Hi Nachi/Akihiro motoki,
>
> I am not clear.
>
> Today the L3 Service Plugin does not support the “service_type” attribute to 
> define the provider option.
>
>
>
> Are we suggesting that we need to include the service_type for the L3 Service 
> Plugin and then we can make use of the “service_type” attribute to 
> distinguish between the “edge” and “distributed”.
>
>
>
>
>
> So if I understand correctly, a “provider” router will be an Edge router and 
> a non-provider router will be a “distributed router”.
>
>
>
> Thanks
>
> Swami
>
>
>
> >I'm +1 for 'provider'.
>
>
>
> 2013/12/9 Akihiro Motoki :
>
> > Neutron defines "provider" attribute and it is/will be used in advanced
>
> > services (LB, FW, VPN).
>
> > Doesn't it fit for a distributed router case? If we can cover all services
>
> > with one concept, it would be nice.
>
> >
>
> > According to this thread, we assumes at least two types "edge" and
>
> > "distributed".
>
> > Though "edge" and "distributed" is a type of implementations, I think they
>
> > are some kind of "provider".
>
> >
>
> > I just would like to add an option. I am open to "provider" vs "distirbute"
>
> > attributes.
>
> >
>
> > Thanks,
>
> > Akihiro
>
> >
>
> > (2013/12/10 7:01), Vasudevan, Swaminathan (PNB Roseville) wrote:
>
> >> Hi Folks,
>
> >>
>
> >> We are in the process of defining the API for the Neutron Distributed
>
> >> Virtual Router, and we have a question.
>
> >>
>
> >> Just wanted to get the feedback from the community before we implement and
>
> >> post for review.
>
> >>
>
> >> We are planning to use the “distributed” flag for the routers that are
>
> >> supposed to be routing traffic locally (both East West and North South).
>
> >> This “distributed” flag is already there in the “neutronclient” API, but
>
> >> currently only utilized by the “Nicira Plugin”.
>
> >> We would like to go ahead and use the same “distributed” flag and add an
>
> >> extension to the router table to accommodate the “distributed flag”.
>
> >>
>
> >> Please let us know your feedback.
>
> >>
>
> >> Thanks.
>
> >>
>
> >> Swaminathan Vasudevan
>
> >> Systems Software Engineer (TC)
>
> >> HP Networking
>
> >> Hewlett-Packard
>
> >> 8000 Foothills Blvd
>
> >> M/S 5541
>
> >> Roseville, CA - 95747
>
> >> tel: 916.785.0937
>
> >> fax: 916.785.1815
>
> >> email: swaminathan.vasude...@hp.com  >> >
>
>
>
>
>
> Swaminathan Vasudevan
>
> Systems Software Engineer (TC)
>
>
>
>
>
> HP Networking
>
> Hewlett-Packard
>
> 8000 Foothills Blvd
>
> M/S 5541
>
> Roseville, CA - 95747
>
> tel: 916.785.0937
>
> fax: 916.785.1815
>
> email: swaminathan.vasude...@hp.com
>
>
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Interfaces file format, was [Tempest] Need to prepare the IPv6 environment for static IPv6 injection test case

2013-12-12 Thread Ian Wells
(2) should read 'the data should appear on both metadata and the config
drive', I would say.  Vish was making a point that this metadata changes
(e.g. when running nova interface-attach) and it might be nice if the
metadata server updated its information.  It might be, and changing
metadata has been discussed more than once before, but that's a can of
worms that no-one is going to rush to open so it's not going to affect
tests right now.

To run that test case, I don't think you have any option right now but to
use a special interface template - it's that or go reading your addresses
from the Neutron API directly when your machine tries to configure, and
no-one would set up a VM like that.
-- 
Ian.


On 12 December 2013 10:33, Yang XY Yu  wrote:

> Hi Ian  and Vish,
>
> Thanks for your reply. From your response, I still have some questions,
> hope you can help me to clear up them.
>
> <1> From Vish's response, may I believe that static IPv6 injection using
> the interface.template will be dropped in future because it is not a
> correct way to do net config as you said?
>
> <2> If we still use the interface.template to do net config, we'd better
> put it into metadata server, right?
>
> <3> If these two assumption above are correct, my question is we still
> need to prepare some special ENV such as images with cloud-init to run this
> tempest case 
> *https://review.openstack.org/#/c/58721/*<https://review.openstack.org/#/c/58721/>,
> is it reasonable to be accepted by community? Is it the correct direction
> to submit case for IPv6? How should I ask community to prepare the this
> special ENV?
>
> Thanks & Best Regards,
> 
> Yang Yu(于杨)
> 
>
>
>  *Vishvananda Ishaya >*
>
> 2013-12-06 02:34
>  Please respond to
> "OpenStack Development Mailing List \(not for usage questions\)" <
> openstack-dev@lists.openstack.org>
>
>   To
> "OpenStack Development Mailing List \(not for usage questions\)" <
> openstack-dev@lists.openstack.org>
> cc
>   Subject
> Re: [openstack-dev] [Neutron] Interfaces file format,was [Tempest]
> Need to prepare the IPv6 environment for static IPv6injection test
> case
>
>
>
>
> Hi Ian,
>
> The rendered network template was a legacy item that got stuck onto the
> config drive so we could remove file injection. It is not intended that
> this is the correct way to do net config. We have intended in the past to
> put a generic form of network info into the metadata service and config
> drive. Cloud-init can parse this data and have code to set up networking
> config on different operating systems.
>
> We actually discussed doing this during the Havana summit, but no one ever
> made any progress. There was some debate about whether there was an
> existing xml format and someone was going to investigate. Since this has
> not happened, I propose we scrap that idea and just produce the network
> info in json.
>
> Nova should be able to populate this data from its cached network info. It
> might also be nice to stick it in a known location on the metadata server
> so the neutron proxy could potentially overwrite it with more current
> network data if it wanted to.
>
> Vish
>
> On Dec 4, 2013, at 8:26 AM, Ian Wells 
> <*ijw.ubu...@cack.org.uk*>
> wrote:
>
> We seem to have bound our config drive file formats to those used by the
> operating system we're running, which doesn't seem like the right approach
> to take.
>
> Firstly, the above format doesn't actually work even for Debian-based
> systems - if you have a network without ipv6, ipv6 ND will be enabled on
> the ipv4-only interfaces, which strikes me as wrong.  (This is a feature of
> Linux - ipv4 is enabled on interfaces which are specifically configured
> with ipv4, but ipv6 is enabled on all interfaces that are brought up.)
>
> But more importantly, the above file template only works for Debian-based
> machines - not Redhat, not Windows, not anything else - and we seem to have
> made that a feature of Openstack from the relatively early days of file
> injection.  That's not an ipv6 only thing but a general statement.  It
> seems wrong to have to extend Openstack's config drive injection for every
> OS that might come along, so is there a way we can make this work without
> tying the two things together?  Are we expecting the cloud-init code in
> whatever OS to parse and understand this file format, or are they supposed
> to use other information?  In general, what would the recommendation be for
> someone using a VM where

Re: [openstack-dev] Unified Guest Agent proposal

2013-12-12 Thread Ian Wells
On 12 December 2013 19:48, Clint Byrum  wrote:

> Excerpts from Jay Pipes's message of 2013-12-12 10:15:13 -0800:
> > On 12/10/2013 03:49 PM, Ian Wells wrote:
> > > On 10 December 2013 20:55, Clint Byrum  > > <mailto:cl...@fewbar.com>> wrote:
> > I've read through this email thread with quite a bit of curiosity, and I
> > have to say what Ian says above makes a lot of sense to me. If Neutron
> > can handle the creation of a "management vNIC" that has some associated
> > iptables rules governing it that provides a level of security for guest
> > <-> host and guest <-> $OpenStackService, then the transport problem
> > domain is essentially solved, and Neutron can be happily ignorant (as it
> > should be) of any guest agent communication with anything else.
> >
>
> Indeed I think it could work, however I think the NIC is unnecessary.
>
> Seems likely even with a second NIC that said address will be something
> like 169.254.169.254 (or the ipv6 equivalent?).
>

There *is* no ipv6 equivalent, which is one standing problem.  Another is
that (and admittedly you can quibble about this problem's significance) you
need a router on a network to be able to get to 169.254.169.254 - I raise
that because the obvious use case for multiple networks is to have a net
which is *not* attached to the outside world so that you can layer e.g. a
private DB service behind your app servers.

Neither of these are criticisms of your suggestion as much as they are
standing issues with the current architecture.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPv6] Agenda for the meeting today

2013-12-12 Thread Ian Wells
Can we go over the use cases for the multiple different address allocation
techniques, per my comment on the blueprint that suggests we expose
different dnsmasq modes?

And perhaps also what we're going to do with routers in terms of equivalent
behaviour for the current floating-ip versus no-floating-ip systems.  One
idea is that we would offer tenants a routeable address regardless of which
network they're on (something itself which is not what we do in v4 and
which doesn't really fit with current subnet-create) and rather than NAT we
have two default firewalling schemes in routers for an externally
accessible versus an inaccessible address, but I'd really like to hear what
other ideas there are out there.
-- 
Ian.


On 12 December 2013 18:22, Collins, Sean wrote:

> The agenda for today is pretty light - if there is anything that people
> would like to discuss, please feel free to add.
>
>
> https://wiki.openstack.org/wiki/Meetings/Neutron-IPv6-Subteam#Agenda_for_Dec_12._2013
>
> --
> Sean M. Collins
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Unified Guest Agent proposal

2013-12-13 Thread Ian Wells
On 13 December 2013 16:13, Alessandro Pilotti <
apilo...@cloudbasesolutions.com> wrote:

> 2) The HTTP metadata service accessible from the guest with its magic
> number is IMO quite far from an optimal solution. Since every hypervisor
> commonly
> used in OpenStack (e.g. KVM, XenServer, Hyper-V, ESXi) provides guest /
> host communication services, we could define a common abstraction layer
> which will
> include a guest side (to be included in cloud-init, cloudbase-init, etc)
> and a hypervisor side, to be implemented for each hypervisor and included
> in the related Nova drivers.
> This has already been proposed / implemented in various third party
> scenarios, but never under the OpenStack umbrella for multiple hypervisors.
>

Firstly, what's wrong with the single anycast IP address mechanism that
makes it 'not an optimal solution'?

While I agree we could, theoretically, make KVM, Xen, Docker, Hyper-V,
VMWare and so on all implement the same backdoor mechanism - unlikely as
that seems - and then implement a userspace mechanism to match in every
cloud-init service in Windows, Linux, *BSD (and we then have a problem with
niche OSes, too, so this mechanism had better be easy to implement, and
it's likely to involve the kernel) it's hard.  And we still come unstuck
when we get to bare metal, because these interfaces just can't be added
there.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPv6] Change I5b2313ff: Create a new attribute for subnets, to store v6 dhcp options

2013-12-17 Thread Ian Wells
This is a discussion document for starters -
https://docs.google.com/document/d/1rOBOOu_OwixMStm6XJOb5PKkJA6eFbL_XCE7wlTfaPY.
 It's lacking the names you asked for at the moment but have a comment
on
it (frequent commenters get edit rights) and from there we can generate and
tidy up the blueprints.  I think that BP will go forward with changes to
the attribute name and possibly the location in that patch.  In summary, I
think we need only a couple of public options and then to generate dnsmasq
configs from those.

Aside from that, the document is trying to be a comprehensive list of
changes to implement ipv6 properly.  That doesn't mean to say we have to do
every element between now and Icehouse, we'll have succeeded even if we
only get the basic cases to work.  For instance, we don't need DHCPv6
providing SLAAC will allocate addresses, it's a refinement that brings
benefits but we can do without.
-- 
Ian.


On 17 December 2013 21:41, Collins, Sean wrote:

> On Tue, Dec 17, 2013 at 07:39:14PM +0100, Ian Wells wrote:
> > 1. The patch ties Neutron's parameters specifically to dnsmasq.  It would
> > be, I think, impossible to reimplement this for isc-dhcpd, for instance.
>
> While I agree in theory with this point - there are currently no
> active blueprints to add another DHCP server to Neutron. The isc-dhcp
> one has been stalled for quite a long time.
>
> Frankly, if we can think of better names for the modes that we're
> looking to have happen for v6 provisioning, that doesn't rely directly
> on dnsmasq-isms, I'm all ears. Feel free to propose better names for the
> modes and we'll create a map between the modes and what you pass to
> dnsmasq.
>
> --
> Sean M. Collins
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPv6] Change I5b2313ff: Create a new attribute for subnets, to store v6 dhcp options

2013-12-17 Thread Ian Wells
On 17 December 2013 18:57, Shixiong Shang wrote:

> Yes, the man page is a little bit confusing. The “slaac” mode requires
> “—enable-ra” since it needs to manipulate MOAL bits in the RA. As matter of
> fact, all of the modes available for IPv6 rely on “—enable-ra”.
>
> My understanding is, the ra-names option has nothing to do with RA. It
> resolves the problem of where to find DNS server. It should work with slaac
> mode or ra-stateless mode.
>

I'm going to reiterate what I said in my comment on that patch and its
blueprint.

1. The patch ties Neutron's parameters specifically to dnsmasq.  It would
be, I think, impossible to reimplement this for isc-dhcpd, for instance.
2. The fact that ra-names is under consideration says that we're thinking
in implementation terms, not API design terms.  dnsmasq isn't a DNS server
in Openstack, so ra-names isn't an appropriate choice in any case.  It's
only in the list because the options on offer are the options dnsmasq
allows, which is the tail wagging the dog.

What we should have is the reverse: first, what do we want from the
interface, and second, what does that imply for the implementation?  I
don't think we need all the modes just because dnsmasq offers them.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPv6] Three SLAAC and DHCPv6 related blueprints

2013-12-18 Thread Ian Wells
Hey Shixiong,

This is intended as a replacement for [1], correct?  Do you have code for
this already, or should we work with Sean's patch?

There's a discussion document at [2], which is intended to be more specific
behind the reasoning for the choices we make, and the interface offered to
the user for these features.  I'd be grateful if you could read and comment
on it.
-- 
Ian.

[1] https://blueprints.launchpad.net/neutron/+spec/dnsmasq-mode-keyword
[2]
https://docs.google.com/document/d/1rOBOOu_OwixMStm6XJOb5PKkJA6eFbL_XCE7wlTfaPY



On 18 December 2013 04:19, Randy Tuttle  wrote:

> Great Shixiong. I can see that we have BPs from Sean / Da Zhao for
> providing the modes via the neutron client cli, but have we seen how those
> modes are provided through the dashboard?
>
> Randy
>
> Sent from my iPhone
>
> On Dec 17, 2013, at 9:07 PM, Shixiong Shang 
> wrote:
>
> > Hi, team:
> >
> > I created a new blueprint to reflect the work we accomplished in the
> previous POC to enable dnsmasq in SLAAC mode. In addition, I took the
> action item two weeks ago from weekly sub-team meeting to explore DHCPv6
> options. The goal was to run dnsmasq as DHCPv6 server and provide both
> optional information and/or IPv6 address to VM in the tenant network. Below
> you can find the link to the new blueprints, which are all related to the
> mid-term goal in Sean’s original proposal.
> >
> > https://blueprints.launchpad.net/neutron/+spec/dnsmasq-ipv6-slaac
> >
> https://blueprints.launchpad.net/neutron/+spec/dnsmasq-ipv6-dhcpv6-stateful
> >
> https://blueprints.launchpad.net/neutron/+spec/dnsmasq-ipv6-dhcpv6-stateless
> >
> > Please let me know if you have any questions. Thanks!
> >
> > Shixiong
> >
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Diversity as a requirement for incubation

2013-12-18 Thread Ian Wells
I think we're all happy that if a project *does* have a broad support base
we're good; this is only the case for projects in one of two situations:
- support is spread so thinly that each company involved in the area has
elected to support a different solution
- the project is just not that interesting to a wider audience

It might be less about 'should we even allow its incubation' - yes, in
exceptional circumstances we probably should, because it serves a useful
purpose in that case of divided loyalties - but what checks should be in
place to avoid problems of favouritism over technical merit, and of
spreading our support thinly over areas of low demand.  Neither of these
things is directly a problem of diversity itself (e.g. the 'company gets
bored'/'company goes tits up' problem); instead, the diversity serves as a
warning flag indicating that decisions should not be taken lightly.
-- 
Ian.


On 18 December 2013 11:40, Thierry Carrez  wrote:

> Hi everyone,
>
> The TC meeting yesterday uncovered an interesting question which, so
> far, divided TC members.
>
> We require that projects have a number of different developers involved
> before they apply for incubation, mostly to raise the bus factor. But we
> also currently require some level of diversity in that development team:
> we tend to reject projects where all the development team comes from a
> single company.
>
> There are various reasons for that: we want to make sure the project
> survives the loss of interest of its main corporate sponsor, we want to
> make sure it takes into account more than just one company's use case,
> and we want to make sure there is convergence, collaboration and open
> development at play there, before we start spending common resources in
> helping them integrate with the rest of OpenStack.
>
> That said, it creates a chicken-and-egg issue: other companies are less
> likely to assign resources and converge to a project unless it gets
> blessed as THE future solution. And it's true that in the past a lot of
> projects really ramped up their communities AFTER being incubated.
>
> I guess there are 3 options:
>
> 1. Require diversity for incubation, but find ways to bless or recommend
> projects pre-incubation so that this diversity can actually be achieved
>
> 2. Do not require diversity for incubation, but require it for
> graduation, and remove projects from incubation if they fail to attract
> a diverse community
>
> 3. Do not require diversity at incubation time, but at least judge the
> interest of other companies: are they signed up to join in the future ?
> Be ready to drop the project from incubation if that was a fake support
> and the project fails to attract a diverse community
>
> Personally I'm leaning towards (3) at the moment. Thoughts ?
>
> --
> Thierry Carrez (ttx)
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Re: [Blueprint vlan-aware-vms] VLAN aware VMs

2013-12-18 Thread Ian Wells
A Neutron network is analagous to a wire between ports.  We can do almost
everything with this wire - we can pass  both IP and non-IP traffic, I can
even pass MPLS traffic over it (yes, I tried).  For no rational reason, at
least if you live north of the API, I sometimes can't pass VLAN traffic
over it.  You would think this would be in the specification for what a
network is, but as it happens I don't think we have a specification for
what a network is in those terms.

I have a counterproposal that I wrote up yesterday [1].  This is the
absurdly simple approach, taking the position that implementing trunks
*should* be easy.  That's actually not such a bad position to take, because
the problem lies with certain plugins (OVS-based setups, basically) - it's
not a problem with Neutron.

It's very uncompromising, though - it just allows you to request a
VLAN-clean network.  It would work with OVS code because it allows plugins
to decline a request, but it doesn't solve the VLAN problem for you, it
just ensures that you don't run somewhere where your application doesn't
work, and gives plugins with problems an opportunity for special case
code.  You could expand it so that you're requesting either a VLAN-safe
network or a network that passes *specified* VLANs - which is the starting
position of Eric's document, a plugin-specific solution to a
plugin-specific problem.

I accept that, for as long as we use VLAN based infrastructure, we have to
accommodate the fact that VLANs are a special case, but this is very much
an artifact of the plugin implementation - Linux bridge based network
infrastructure simply doesn't have this problem, for instance.

On 17 December 2013 06:17, Isaku Yamahata  wrote:

> - 2 Modeling proposal
>   What's the purpose of trunk network?
>   Can you please add a use case that trunk network can't be optimized away?
>

Even before I read the document I could list three use cases.  Eric's
covered some of them himself.

The reasons you might want to have a trunked network passing VLAN traffic:
1: You're replicating a physical design for simulation purposes [2]

2: There are any number of reasons to use VLANs in a physical design, but
generally it's a port reduction thing.  In Openstack, clearly I can do this
a different way - instead of using 30 VLANs over one network with two
ports, I can use 30 networks with two ports each.  Ports are cheaper when
you're virtual, but they're not free - KVM has a limitation of, from
memory, 254 ports per VM.  So I might well still want to use VLANs.  I
could arbitrarily switch to another encap technology, but this is the tail
wagging the dog - I have to change my design because Neutron's contract is
inconsistent.

3: I want to condense many tenant networks into a single VM or physical box
because I'm using a single VM to offer logically separated services to
multiple tenants.  This has all the points of (2) basically, that VLANs are
not the only encap I could use, but they're the conventional one and widely
supported.  Provider networks do actually offer the functionality you need
for this already - if you're talking physical - but they would, I think, be
awkward to use in practice, and they would eat NIC ports on your hosts.

-- 
Ian.

[1]
https://docs.google.com/document/d/16DDJLYHxMmbCPO5LxW_kp610oj4goiic_oTakJiXjTs
[2] http://blogs.cisco.com/wp-content/uploads/network1-550x334.png - a
network simulator (search for 'Cisco VIRL'). Shameless plug, sorry, but
it's an Openstack based application and I'm rather proud of it.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPv6] Three SLAAC and DHCPv6 related blueprints

2013-12-18 Thread Ian Wells
On 18 December 2013 14:10, Shixiong Shang wrote:

> Hi, Ian:
>
> I won’t say the intent here is to replace dnsmasq-mode-keyword BP.
> Instead, I was trying to leverage and enhance those definitions so when
> dnsmasq is launched, it knows which mode it should run in.
>
> That being said, I see the value of your points and I also had lengthy
> discussion with Randy regarding this. We did realize that the keyword
> itself may not be sufficient to properly configure dnsmasq.
>

I think the point is that the attribute on whatever object (subnet or
router) that defines the behaviour should define the behaviour, in
precisely the terms you're talking about, and then we should find the
dnsmasq options to suit.  Talking to Sean, he's good with this too, so
we're all working to the same ends and it's just a matter of getting code
in.


> Let us discuss that on Thursday’s IRC meeting.
>

Not sure if I'll be available or not this Thursday, unfortunately.  I'll
try to attend but I can't make promises.

Randy and I had a quick glance over your document. Much of it parallels the
> work we did on our POC last summer, and is now being addressed across
> multiple BP being implemented by ourselves or with Sean Collins and IBM
> team's work. I will take a closer look and provide my comments.
>

That's great.  I'm not wedded to the details in there, I'm actually more
interested that we've covered everything.

If you have blueprint references, add them as comments - the
ipv6-feature-parity BP could do with work and if we get the links together
in one place we can update it.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread Ian Wells
John:

> At a high level:
>
> Neutron:
> * user wants to connect to a particular neutron network
> * user wants a super-fast SRIOV connection

Administration:
> * needs to map PCI device to what neutron network the connect to
>
The big question is:
> * is this a specific SRIOV only (provider) network
> * OR... are other non-SRIOV connections also made to that same network
>
> I feel we have to go for that latter. Imagine a network on VLAN 42,
> you might want some SRIOV into that network, and some OVS connecting
> into the same network. The user might have VMs connected using both
> methods, so wants the same IP address ranges and same network id
> spanning both.
>

> If we go for that latter new either need:
> * some kind of nic-flavor
> ** boot ... -nic nic-id:"public-id:,nic-flavor:"10GBpassthrough"
> ** but neutron could store nic-flavor, and pass it through to VIF
> driver, and user says port-id
> * OR add NIC config into the server flavor
> ** extra spec to say, tell VIF driver it could use on of this list of
> PCI devices: (list pci-flavors)
> * OR do both
>
> I vote for nic-flavor only, because it matches the volume-type we have
> with cinder.
>

I think the issue there is that Nova is managing the supply of PCI devices
(which is limited and limited on a per-machine basis).  Indisputably you
need to select the NIC you want to use as a passthrough rather than a vnic
device, so there's something in the --nic argument, but you have to answer
two questions:

- how many devices do you need (which is now not a flavor property but in
the --nic list, which seems to me an odd place to be defining billable
resources)
- what happens when someone does nova interface-attach?

Cinder's an indirect parallel because the resources it's adding to the
hypervisor are virtual and unlimited, I think, or am I missing something
here?


> However, it does suggest that Nova should leave all the SRIOV work to
> the VIF driver.
> So the VIF driver, as activate by neutron, will understand which PCI
> devices to passthrough.
>
> Similar to the plan with brick, we could have an oslo lib that helps
> you attach SRIOV devices that could be used by the neturon VIF drivers
> and the nova PCI passthrough code.
>

I'm not clear that this is necessary.

At the moment with vNICs, you pass through devices by having a co-operation
between Neutron (which configures a way of attaching them to put them on a
certain network) and the hypervisor specific code (which creates them in
the instance and attaches them as instructed by Neutron).  Why would we not
follow the same pattern with passthrough devices?  In this instance,
neutron would tell nova that when it's plugging this device it should be a
passthrough device, and pass any additional parameters like the VF encap,
and Nova would do as instructed, then Neutron would reconfigure whatever
parts of the network need to be reconfigured in concert with the
hypervisor's settings to make the NIC a part of the specified network.
-- 
Ian.


>
> Thanks,
> John
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Re: [Blueprint vlan-aware-vms] VLAN aware VMs

2013-12-19 Thread Ian Wells
On 19 December 2013 06:35, Isaku Yamahata  wrote:

>
> Hi Ian.
>
> I can't see your proposal. Can you please make it public viewable?
>

Crap, sorry - fixed.


> > Even before I read the document I could list three use cases.  Eric's
> > covered some of them himself.
>
> I'm not against trunking.
> I'm trying to understand what requirements need "trunk network" in
> the figure 1 in addition to "L2 gateway" directly connected to VM via
> "trunk port".
>

No problem, just putting the information there for you.

-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPv6] Meeting time - change to 1300 UTC or 1500 UTC?

2013-12-19 Thread Ian Wells
I'm easy.


On 20 December 2013 00:47, Randy Tuttle  wrote:

> Any of those times suit me.
>
> Sent from my iPhone
>
> On Dec 19, 2013, at 5:12 PM, "Collins, Sean" <
> sean_colli...@cable.comcast.com> wrote:
>
> > Thoughts? I know we have people who are not able to attend at our
> > current time.
> >
> > --
> > Sean M. Collins
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPv6] Three SLAAC and DHCPv6 related blueprints

2013-12-19 Thread Ian Wells
Xuhan, check the other thread - would 1500UTC suit?


On 19 December 2013 01:09, Xuhan Peng  wrote:

> Shixiong and guys,
>
> The sub team meeting is too early for china IBM folks to join although we
> would like to participate the discussion very much. Any chance to rotate
> the time so we can comment?
>
> Thanks, Xuhan
>
>
> On Thursday, December 19, 2013, Shixiong Shang wrote:
>
>> Hi, Ian:
>>
>> I agree with you on the point that the way we implement it should be app
>> agnostic. In addition, it should cover both CLI and Dashboard, so the
>> system behavior should be consistent to end users.
>>
>> The keywords is just one of the many ways to implement the concept. It is
>> based on the reality that dnsmasq is the only driver available today to the
>> community. By the end of the day, the input from customer should be
>> translated to one of those mode keywords. It doesn't imply the same
>> constants have to be used as part of the CLI or Dashboard.
>>
>> Randy and I had lengthy discussion/debating about this topic today. We
>> have straw-man proposal and will share with the team tomorrow.
>>
>> That being said, what concerned me the most at this moment is, we are not
>> on the same page. I hope tomorrow during sub-team meeting, we can reach
>> consensus. If you can not make it, then please set up a separate meeting to
>> invite key placeholders so we have a chance to sort it out.
>>
>> Shixiong
>>
>>
>>
>>
>> On Dec 18, 2013, at 8:25 AM, Ian Wells  wrote:
>>
>> On 18 December 2013 14:10, Shixiong Shang 
>> wrote:
>>
>>> Hi, Ian:
>>>
>>> I won’t say the intent here is to replace dnsmasq-mode-keyword BP.
>>> Instead, I was trying to leverage and enhance those definitions so when
>>> dnsmasq is launched, it knows which mode it should run in.
>>>
>>> That being said, I see the value of your points and I also had lengthy
>>> discussion with Randy regarding this. We did realize that the keyword
>>> itself may not be sufficient to properly configure dnsmasq.
>>>
>>
>> I think the point is that the attribute on whatever object (subnet or
>> router) that defines the behaviour should define the behaviour, in
>> precisely the terms you're talking about, and then we should find the
>> dnsmasq options to suit.  Talking to Sean, he's good with this too, so
>> we're all working to the same ends and it's just a matter of getting code
>> in.
>>
>>
>>> Let us discuss that on Thursday’s IRC meeting.
>>>
>>
>> Not sure if I'll be available or not this Thursday, unfortunately.  I'll
>> try to attend but I can't make promises.
>>
>> Randy and I had a quick glance over your document. Much of it parallels
>>> the work we did on our POC last summer, and is now being addressed across
>>> multiple BP being implemented by ourselves or with Sean Collins and IBM
>>> team's work. I will take a closer look and provide my comments.
>>>
>>
>> That's great.  I'm not wedded to the details in there, I'm actually more
>> interested that we've covered everything.
>>
>> If you have blueprint references, add them as comments - the
>> ipv6-feature-parity BP could do with work and if we get the links together
>> in one place we can update it.
>> --
>> Ian.
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPv6] Blueprint Bind dnsmasq in qrouter- namespace

2013-12-19 Thread Ian Wells
Per the discussions this evening, we did identify a reason why you might
need a dhcp namespace for v6 - because networks don't actually have to have
routers.  It's clear you need an agent in the router namespace for RAs and
another one in the DHCP namespace for when the network's not connected to a
router, though.

We've not pinned down all the API details yet, but the plan is to implement
an RA agent first, responding to subnets that router is attached to (which
is very close to what Randy and Shixiong have already done).
-- 
Ian.


On 19 December 2013 14:01, Randy Tuttle  wrote:

> First, dnsmasq is not being "moved". Instead, it's a different instance
> for the attached subnet in the qrouter namespace. If it's not in the
> qrouter namespace, the default gateway (the local router interface) will be
> the interface of qdhcp namespace interface. That will cause blackhole for
> traffic from VM. As you know, routing tables and NAT all occur in qrouter
> namespace. So we want the RA to contain the local interface as default
> gateway in qrouter namespace
>
> Randy
>
> Sent from my iPhone
>
> On Dec 19, 2013, at 4:05 AM, Xuhan Peng  wrote:
>
> I am reading through the blueprint created by Randy to bind dnsmasq into
> qrouter- namespace:
>
>
> https://blueprints.launchpad.net/neutron/+spec/dnsmasq-bind-into-qrouter-namespace
>
> I don't think I can follow the reason that we need to change the namespace
> which contains dnsmasq process and the device it listens to from qdhcp- to
> qrouter-. Why the original namespace design conflicts with the Router
> Advertisement sending from dnsmasq for SLAAC?
>
> From the attached POC result link, the reason is stated as:
>
> "Even if the dnsmasq process could send Router Advertisement, the default
> gateway would bind to its own link-local address in the qdhcp- namespace.
> As a result, traffic leaving tenant network will be drawn to DHCP
> interface, instead of gateway port on router. That is not desirable! "
>
> Can Randy or Shixiong explain this more? Thanks!
>
> Xuhan
>
> ___
>
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread Ian Wells
On 19 December 2013 15:15, John Garbutt  wrote:

> > Note, I don't see the person who boots the server ever seeing the
> pci-flavor, only understanding the server flavor.
>  > [IrenaB] I am not sure that elaborating PCI device request into server
> flavor is the right approach for the PCI pass-through network case. vNIC by
> its nature is something dynamic that can be plugged or unplugged after VM
> boot. server flavor is  quite static.
>
> I was really just meaning the server flavor specify the type of NIC to
> attach.
>
> The existing port specs, etc, define how many nics, and you can hot
> plug as normal, just the VIF plugger code is told by the server flavor
> if it is able to PCI passthrough, and which devices it can pick from.
> The idea being combined with the neturon network-id you know what to
> plug.
>
> The more I talk about this approach the more I hate it :(
>

The thinking we had here is that nova would provide a VIF or a physical NIC
for each attachment.  Precisely what goes on here is a bit up for grabs,
but I would think:

Nova specifiies the type at port-update, making it obvious to Neutron it's
getting a virtual interface or a passthrough NIC (and the type of that NIC,
probably, and likely also the path so that Neutron can distinguish between
NICs if it needs to know the specific attachment port)
Neutron does its magic on the network if it has any to do, like faffing(*)
with switches
Neutron selects the VIF/NIC plugging type that Nova should use, and in the
case that the NIC is a VF and it wants to set an encap, returns that encap
back to Nova
Nova plugs it in and sets it up (in libvirt, this is generally in the XML;
XenAPI and others are up for grabs).

 > We might also want a "nic-flavor" that tells neutron information it
> requires, but lets get to that later...
> > [IrenaB] nic flavor is definitely something that we need in order to
> choose if  high performance (PCI pass-through) or virtio (i.e. OVS) nic
> will be created.
>
> Well, I think its the right way go. Rather than overloading the server
> flavor with hints about which PCI devices you could use.
>

The issue here is that additional attach.  Since for passthrough that isn't
NICs (like crypto cards) you would almost certainly specify it in the
flavor, if you did the same for NICs then you would have a preallocated
pool of NICs from which to draw.  The flavor is also all you need to know
for billing, and the flavor lets you schedule.  If you have it on the list
of NICs, you have to work out how many physical NICs you need before you
schedule (admittedly not hard, but not in keeping) and if you then did a
subsequent attach it could fail because you have no more NICs on the
machine you scheduled to - and at this point you're kind of stuck.

Also with the former, if you've run out of NICs, the already-extant resize
call would allow you to pick a flavor with more NICs and you can then
reschedule the subsequent VM to wherever resources are available to fulfil
the new request.

One question here is whether Neutron should become a provider of billed
resources (specifically passthrough NICs) in the same way as Cinder is of
volumes - something we'd not discussed to date; we've largely worked on the
assumption that NICs are like any other passthrough resource, just one
where, once it's allocated out, Neutron can work magic with it.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPv6] Blueprint Bind dnsmasq in qrouter- namespace

2013-12-19 Thread Ian Wells
Interesting.  So you're suggesting we provision a single namespace (per
network, rather than subnet?) proactively, and use it for both routing and
DHCP.  Not unreasonable.  Also workable for v4, I think?
-- 
Ian.


On 20 December 2013 02:31, Shixiong Shang wrote:

> Hi, Ian:
>
> The use case brought by Comcast team today during the ipv6 sub-team
> meeting actually proved the point I made here, instead of against it. If I
> didn’t explain it clearly in my previous email, here it is.
>
> I was questioning the design with two namespaces and I believe we can use
> a SINGLE namespace as the common container to host two services, i.e. DHCP
> and ROUTING. If your use case needs DHCP instance, but not ROUTING, then
> just launch dnsmasq in THE namespace with qr- interface; If your use case
> needs default GW, then add qg- interface in THE namespace. Whether it is
> called qdhcp or qrouter, I don’t care. It is just a label.
>
> People follow the routine to use it, simply because this is what OpenStack
> offers. But my question is, why? And why NOT we design the system in the
> way that qg- and qr- interface collocate in the same namespace?
>
> It is because we intentionally separate the service, now the system become
> clumsy and less efficient. As you can see in IPv6 cases, we are forced to
> deal with two namespaces now. It just doesn’t make any sense.
>
> Shixiong
>
>
>
>
>
>
> On Dec 19, 2013, at 7:27 PM, Ian Wells  wrote:
>
> Per the discussions this evening, we did identify a reason why you might
> need a dhcp namespace for v6 - because networks don't actually have to have
> routers.  It's clear you need an agent in the router namespace for RAs and
> another one in the DHCP namespace for when the network's not connected to a
> router, though.
>
> We've not pinned down all the API details yet, but the plan is to
> implement an RA agent first, responding to subnets that router is attached
> to (which is very close to what Randy and Shixiong have already done).
> --
> Ian.
>
>
> On 19 December 2013 14:01, Randy Tuttle  wrote:
>
>> First, dnsmasq is not being "moved". Instead, it's a different instance
>> for the attached subnet in the qrouter namespace. If it's not in the
>> qrouter namespace, the default gateway (the local router interface) will be
>> the interface of qdhcp namespace interface. That will cause blackhole for
>> traffic from VM. As you know, routing tables and NAT all occur in qrouter
>> namespace. So we want the RA to contain the local interface as default
>> gateway in qrouter namespace
>>
>> Randy
>>
>> Sent from my iPhone
>>
>> On Dec 19, 2013, at 4:05 AM, Xuhan Peng  wrote:
>>
>> I am reading through the blueprint created by Randy to bind dnsmasq into
>> qrouter- namespace:
>>
>>
>> https://blueprints.launchpad.net/neutron/+spec/dnsmasq-bind-into-qrouter-namespace
>>
>> I don't think I can follow the reason that we need to change the
>> namespace which contains dnsmasq process and the device it listens to from
>> qdhcp- to qrouter-. Why the original namespace design conflicts with the
>> Router Advertisement sending from dnsmasq for SLAAC?
>>
>> From the attached POC result link, the reason is stated as:
>>
>> "Even if the dnsmasq process could send Router Advertisement, the default
>> gateway would bind to its own link-local address in the qdhcp- namespace.
>> As a result, traffic leaving tenant network will be drawn to DHCP
>> interface, instead of gateway port on router. That is not desirable! "
>>
>> Can Randy or Shixiong explain this more? Thanks!
>>
>> Xuhan
>>
>> ___
>>
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] packet forwarding

2013-12-21 Thread Ian Wells
Randy has it spot on.  The antispoofing rules prevent you from doing this
in Neutron.  Clearly a router transmits traffic that isn't from it, and
receives traffic that isn't addressed to it - and the port filtering
discards them.

You can disable them for the entire cloud by judiciously tweaking the Nova
config settings, or if you're using the Nicira plugin you'll find it has
extensions for modifying firewall behaviour (they could do with porting
around, or even becoming core, but at the moment they're Nicira-specific).
-- 
Ian.


On 20 December 2013 17:50, Abbass MAROUNI wrote:

> Hello,
>
> Is it true that a traffic from one OpenStack virtual network to another
> have to pass by an OpenStack router ? (using an OpenVirtual switch as the
> L2 ).
>
> I'm trying ti use a VM as a router between 2 OpenStack virtual networks
> but for some reason I'm not able.
>
> Appreciate any insights,
>
>
> Best regards,
> Abbass
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Re: [Blueprint vlan-aware-vms] VLAN aware VMs

2013-12-23 Thread Ian Wells
I think we have two different cases here - one where a 'trunk' network
passes all VLANs, which is potentially supportable by anything that's not
based on VLANs for separation, and one where a trunk can't feasibly do that
but where we could make it pass a restricted set of VLANs by mapping.

In the former case, obviously we need no special awareness of the nature of
the network to implement an L2 gateway.

In the latter case, we're looking at a specialisation of networks, one
where you would first create them with a set of VLANs you wanted to pass
(and - presumably - the driver would say 'ah, I must allocate multiple
VLANs to this network rather than just one'.  You've jumped in with two
optimisations on top of that:

- we can precalculate the VLANs the network needs to pass in some cases,
because it's the sum of VLANs that L2 gateways on that network know about
- we can use L2 gateways to make the mapping from 'tenant' VLANs to
'overlay' VLANs

They're good ideas but they add some limitations to what you can do with
trunk networks that aren't actually necessary in a number of solutions.

I wonder if we should try the general case first with e.g. a
Linuxbridge/GRE based infrastructure, and then add the optimisations
afterwards.  If I were going to do that optimisation I'd start with the
capability mechanism and add the ability to let the tenant specify the
specific VLAN tags which must be passed (as you normally would on a
physical switch). I'd then have two port types - a user-facing one that
ensures the entry and exit mapping is made on the port, and an
administrative one which exposes that mapping internally and lets the
client code (e.g. the L2 gateway) do the mapping itself.  But I think it
would be complicated, and maybe even has more complexity than is
immediately apparent (e.g. we're effectively allocating a cluster-wide
network to get backbone segmentation IDs for each VLAN we pass, which is
new and different) hence my thought that we should start with the easy case
first just to have something working, and see how the tenant API feels.  We
could do this with a basic bit of gateway code running on a system using
Linuxbridge + GRE, I think - the key seems to be avoiding VLANs in the
overlay and then the problem is drastically simplified.
-- 
Ian.


On 21 December 2013 23:00, Erik Moe  wrote:

> Hi Ian,
>
> I think your VLAN trunking capability proposal can be a good thing, so the
> user can request a Neutron network that can trunk VLANs without caring
> about detailed information regarding which VLANs to pass. This could be
> used for use cases there user wants to pass VLANs between endpoints on a L2
> network etc.
>
> For the use case there a VM wants to connect to several "normal" Neutron
> networks using VLANs, I would prefer a solution that did not require a
> Neutron trunk network. Possibly by connecting a L2-gateway directly to the
> Neutron 'vNic' port, or some other solution. IMHO it would be good to map
> VLAN to Neutron network as soon as possible.
>
> Thanks,
> Erik
>
>
>
> On Thu, Dec 19, 2013 at 2:15 PM, Ian Wells  wrote:
>
>> On 19 December 2013 06:35, Isaku Yamahata wrote:
>>
>>>
>>> Hi Ian.
>>>
>>> I can't see your proposal. Can you please make it public viewable?
>>>
>>
>> Crap, sorry - fixed.
>>
>>
>>> > Even before I read the document I could list three use cases.  Eric's
>>> > covered some of them himself.
>>>
>>> I'm not against trunking.
>>> I'm trying to understand what requirements need "trunk network" in
>>> the figure 1 in addition to "L2 gateway" directly connected to VM via
>>> "trunk port".
>>>
>>
>> No problem, just putting the information there for you.
>>
>> --
>> Ian.
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] Todays' meeting log: PCI pass-through network support

2013-12-23 Thread Ian Wells
On autodiscovery and configuration, we agree that each compute node finds
out what it has based on some sort of list of match expressions; we just
disagree on where they should live.

I know we've talked APIs for setting that matching expression, but I would
prefer that compute nodes are responsible for their own physical
configuration - generally this seems wiser on the grounds that configuring
new hardware correctly is a devops problem and this pushes the problem into
the installer, clear devops territory.  It also makes the (I think likely)
assumption that the config may differ per compute node without having to
add more complexity to the API with host aggregates and so on.  And it
means that a compute node can start working without consulting the central
database or reporting its entire device list back to the central controller.

On PCI groups, I think it is a good idea to have them declared centrally
(their name, not their content).  Now, I would use config to define them
and maybe an API for the tenant to list their names, personally; that's
simpler and easier to implement and doesn't preclude adding an (admin) API
in the future.  But I don't imagine the list of groups will change
frequently so any update API would be very infrequently used, and if
someone really feels they want to implement it I'm not going to stop them.

On nova boot, I completely agree that we need a new argument to --nic to
specify the PCI group of the NIC.  The rest of the arguments - I'm
wondering if we could perhaps do this in two stages:
1. Neutron will read those arguments (attachment type, additional stuff
like port group where relevant) from the port during an attach and pass
relevant information to the plugging driver in Nova
2. We add a feature to nova so that you can specify other properties in the
--nic section line and they're passed straight to the port-create called
from within nova.

This is not specific to passthrough at all, just a useful general purpose
feature.  However, it would simplify both the problem and design here,
because these parameters, whatever they are, are now entirely the
responsibility of Neutron and Nova's simply transporting them into it.  A
PCI aware Neutron will presumably understand the attachment type, the port
group and so on, or will reject them if they're meaningless to it, and
we've even got room for future expansion without changing Nova or Neutron,
just the plugin.  We can propose it now and independently, put in a patch
and have it ready before we need it.  I think anything that helps to
clarify and divide the responsibilities of Nova and Neutron will be
helpful, because then we don't end up with too many
cross-project-interrelated patches.

I'm going to ignore the allocation problem for now.  If a single user can
allocate all the NICs in the cluster to himself, we still have a more
useful solution than the one now where he can't use them, so it's not the
top of our list.


Time seems to be running out for Icehouse. We need to come to agreement
> ASAP. I will be out from wednesday until after new year. I'm thinking that
> to move it forward after the new year, we may need to have the IRC meeting
> in a daily basis until we reach agreement. This should be one of our new
> year's resolutions?
>

Whatever gets it done.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][neutron][ipv6]Hairpinning in libvirt, once more

2014-01-07 Thread Ian Wells
See Sean Collins' review https://review.openstack.org/#/c/56381 which
disables hairpinning when Neutron is in use.  tl;dr - please upvote the
review.  Long form reasoning follows...

There's a solid logical reason for enabling hairpinning, but it only
applies to nova-network.  Hairpinning is used in nova-network so that
packets from a machine and destined for that same machine's floating IP
address are returned to it.  They then pass through the rewrite rules
(within the libvirt filters on the instance's tap interface) that do the
static NAT mapping to translate floating IP to fixed IP.

Whoever implemented this assumed that hairpinning in other situations is
harmless.  However, this same feature also prevents IPv6 from working -
returned neighbor discovery packets panic VMs into thinking they're using a
duplicate address on the network.  So we'd like to turn it off.  Accepting
that nova-network will change behaviour comprehensively if we just remove
the code, we've elected to turn it off only when Neutron is being used and
leave nova-network broken for ipv6.

Obviously, this presents an issue, because we're changing the way that
Openstack behaves in a user-visible way - hairpinning may not be necessary
or desirable for Neutron, but it's still detectable when it's on or off if
you try hard enough - so the review comments to date have been
conservatively suggesting that we avoid the functional change as much as
possible, and there's a downvote to that end.  But having done more
investigation I don't think there's sufficient justification to keep the
status quo.

We've also talked about leaving hairpinning off if and only if the Neutron
plugin explicitly says that it doesn't want to use hairpinning.  We can
certainly do this, and I've looked into it, but in practice it's not worth
the code and interface changes:

 - Neutron (not 'some drivers' - this is consistent across all of them)
does NAT rewriting in the routers now, not on the ports, so hairpinning
doesn't serve its intended purpose; what it actually does is waste CPU and
bandwidth by receives a packet every time it sends an outgoing packet and
precious little else.  The instance doesn't expect these packets, it always
ignores these packets, but it receives them anyway.  It's a pointless
no-op, though there exists the theoretical possibility that someone is
relying on it for their application.
- it's *only* libvirt that ever turns hairpinning on in the first place -
none of the other drivers do it
- libvirt only turns it on sometimes - for hybrid VIFs it's enabled, if
generic VIFs are configured and linuxbridge is in use it's enabled, but for
generic VIFs and OVS is in use then the enable function fails silently
(and, indeed, has been designed to fail silently, it seems).

Given these details, there seems little point in making the code more
complex to support a feature that isn't universal and isn't needed; better
that we just disable it for Neutron and be done.  So (and test failures
aside) could I ask that the core devs check and approve the patch review?
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Ian Wells
I think I'm in agreement with all of this.  Nice summary, Robert.

It may not be where the work ends, but if we could get this done the rest
is just refinement.


On 9 January 2014 17:49, Robert Li (baoli)  wrote:

>Hi Folks,
>
>  With John joining the IRC, so far, we had a couple of productive
> meetings in an effort to come to consensus and move forward. Thanks John
> for doing that, and I appreciate everyone's effort to make it to the daily
> meeting. Let's reconvene on Monday.
>
>  But before that, and based on our today's conversation on IRC, I'd like
> to say a few things. I think that first of all, we need to get agreement on
> the terminologies that we are using so far. With the current nova PCI
> passthrough
>
>  PCI whitelist: defines all the available PCI passthrough devices
> on a compute node. pci_passthrough_whitelist=[{
> "vendor_id":"","product_id":""}]
> PCI Alias: criteria defined on the controller node with which
> requested PCI passthrough devices can be selected from all the PCI
> passthrough devices available in a cloud.
> Currently it has the following format: 
> pci_alias={"vendor_id":"",
> "product_id":"", "name":"str"}
>
> nova flavor extra_specs: request for PCI passthrough devices can
> be specified with extra_specs in the format for example:
> "pci_passthrough:alias"="name:count"
>
>  As you can see, currently a PCI alias has a name and is defined on the
> controller. The implications for it is that when matching it against the
> PCI devices, it has to match the vendor_id and product_id against all the
> available PCI devices until one is found. The name is only used for
> reference in the extra_specs. On the other hand, the whitelist is basically
> the same as the alias without a name.
>
>  What we have discussed so far is based on something called PCI groups
> (or PCI flavors as Yongli puts it). Without introducing other complexities,
> and with a little change of the above representation, we will have
> something like:
>
> pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"",
> "name":"str"}]
>
>  By doing so, we eliminated the PCI alias. And we call the "name" in
> above as a PCI group name. You can think of it as combining the definitions
> of the existing whitelist and PCI alias. And believe it or not, a PCI group
> is actually a PCI alias. However, with that change of thinking, a lot of
> benefits can be harvested:
>
>   * the implementation is significantly simplified
>  * provisioning is simplified by eliminating the PCI alias
>  * a compute node only needs to report stats with something like:
> PCI group name:count. A compute node processes all the PCI passthrough
> devices against the whitelist, and assign a PCI group based on the
> whitelist definition.
>  * on the controller, we may only need to define the PCI group
> names. if we use a nova api to define PCI groups (could be private or
> public, for example), one potential benefit, among other things
> (validation, etc),  they can be owned by the tenant that creates them. And
> thus a wholesale of PCI passthrough devices is also possible.
>  * scheduler only works with PCI group names.
>  * request for PCI passthrough device is based on PCI-group
>  * deployers can provision the cloud based on the PCI groups
>  * Particularly for SRIOV, deployers can design SRIOV PCI groups
> based on network connectivities.
>
>  Further, to support SRIOV, we are saying that PCI group names not only
> can be used in the extra specs, it can also be used in the —nic option and
> the neutron commands. This allows the most flexibilities and
> functionalities afforded by SRIOV.
>
>  Further, we are saying that we can define default PCI groups based on
> the PCI device's class.
>
>  For vnic-type (or nic-type), we are saying that it defines the link
> characteristics of the nic that is attached to a VM: a nic that's connected
> to a virtual switch, a nic that is connected to a physical switch, or a nic
> that is connected to a physical switch, but has a host macvtap device in
> between. The actual names of the choices are not important here, and can be
> debated.
>
>  I'm hoping that we can go over the above on Monday. But any comments are
> welcome by email.
>
>  Thanks,
> Robert
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Ian Wells
On 9 January 2014 20:19, Brian Schott wrote:

> Ian,
>
> The idea of pci flavors is a great and using vendor_id and product_id make
> sense, but I could see a case for adding the class name such as 'VGA
> compatible controller'. Otherwise, slightly different generations of
> hardware will mean custom whitelist setups on each compute node.
>

Personally, I think the important thing is to have a matching expression.
The more flexible the matching language, the better.

On the flip side, vendor_id and product_id might not be sufficient.
>  Suppose I have two identical NICs, one for nova internal use and the
> second for guest tenants?  So, bus numbering may be required.
>
> 01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900
> GTX] (rev a1)
> 02:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900
> GTX] (rev a1)
>

I totally concur on this - with network devices in particular the PCI path
is important because you don't accidentally want to grab the Openstack
control network device ;)


> I know you guys are thinking of PCI devices, but any though of mapping to
> something like udev rather than pci?  Supporting udev rules might be easier
> and more robust rather than making something up.
>

Past experience has told me that udev rules are not actually terribly good,
which you soon discover when you have to write expressions like:

 SUBSYSTEM=="net", KERNELS==":83:00.0", ACTION=="add", NAME="eth8"

which took me a long time to figure out and is self-documenting only in
that it has a recognisable PCI path in there, 'KERNELS' not being a
meaningful name to me.  And self-documenting is key to udev rules, because
there's not much information on the tag meanings otherwise.

I'm comfortable with having a match format that covers what we know and
copes with extension for when we find we're short a feature, and what we
have now is close to that.  Yes, it needs the class adding, we all agree,
and you should be able to match on PCI path, which you can't now, but it's
close.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Ian Wells
On 9 January 2014 22:50, Ian Wells  wrote:

> On 9 January 2014 20:19, Brian Schott wrote:
> On the flip side, vendor_id and product_id might not be sufficient.
>  Suppose I have two identical NICs, one for nova internal use and the
> second for guest tenants?  So, bus numbering may be required.
>
>>
>> 01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900
>> GTX] (rev a1)
>> 02:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900
>> GTX] (rev a1)
>>
>
> I totally concur on this - with network devices in particular the PCI path
> is important because you don't accidentally want to grab the Openstack
> control network device ;)
>

Redundant statement is redundant.  Sorry, yes, this has been a pet bugbear
of mine.  It applies equally to provider networks on the networking side of
thing, and, where Neutron is not your network device manager for a PCI
device, you may want several device groups bridged to different segments.
Network devices are one case of a category of device where there's
something about the device that you can't detect that means it's not
necessarily interchangeable with its peers.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells
On 10 January 2014 07:40, Jiang, Yunhong  wrote:

>  Robert, sorry that I’m not fan of * your group * term. To me, *your
> group” mixed two thing. It’s an extra property provided by configuration,
> and also it’s a very-not-flexible mechanism to select devices (you can only
> select devices based on the ‘group name’ property).
>
>
It is exactly that.  It's 0 new config items, 0 new APIs, just an extra tag
on the whitelists that are already there (although the proposal suggests
changing the name of them to be more descriptive of what they now do).  And
you talk about flexibility as if this changes frequently, but in fact the
grouping / aliasing of devices almost never changes after installation,
which is, not coincidentally, when the config on the compute nodes gets set
up.

>  1)   A dynamic group is much better. For example, user may want to
> select GPU device based on vendor id, or based on vendor_id+device_id. In
> another word, user want to create group based on vendor_id, or
> vendor_id+device_id and select devices from these group.  John’s proposal
> is very good, to provide an API to create the PCI flavor(or alias). I
> prefer flavor because it’s more openstack style.
>
I disagree with this.  I agree that what you're saying offers a more
flexibilibility after initial installation but I have various issues with
it.

This is directly related to the hardware configuation on each compute
node.  For (some) other things of this nature, like provider networks, the
compute node is the only thing that knows what it has attached to it, and
it is the store (in configuration) of that information.  If I add a new
compute node then it's my responsibility to configure it correctly on
attachment, but when I add a compute node (when I'm setting the cluster up,
or sometime later on) then it's at that precise point that I know how I've
attached it and what hardware it's got on it.  Also, it's at this that
point in time that I write out the configuration file (not by hand, note;
there's almost certainly automation when configuring hundreds of nodes so
arguments that 'if I'm writing hundreds of config files one will be wrong'
are moot).

I'm also not sure there's much reason to change the available devices
dynamically after that, since that's normally an activity that results from
changing the physical setup of the machine which implies that actually
you're going to have access to and be able to change the config as you do
it.  John did come up with one case where you might be trying to remove old
GPUs from circulation, but it's a very uncommon case that doesn't seem
worth coding for, and it's still achievable by changing the config and
restarting the compute processes.

This also reduces the autonomy of the compute node in favour of centralised
tracking, which goes against the 'distributed where possible' philosophy of
Openstack.

Finally, you're not actually removing configuration from the compute node.
You still have to configure a whitelist there; in the grouping design you
also have to configure grouping (flavouring) on the control node as well.
The groups proposal adds one extra piece of information to the whitelists
that are already there to mark groups, not a whole new set of config lines.


To compare scheduling behaviour:

If I  need 4G of RAM, each compute node has reported its summary of free
RAM to the scheduler.  I look for a compute node with 4G free, and filter
the list of compute nodes down.  This is a query on n records, n being the
number of compute nodes.  I schedule to the compute node, which then
confirms it does still have 4G free and runs the VM or rejects the request.

If I need 3 PCI devices and use the current system, each machine has
reported its device allocations to the scheduler.  With SRIOV multiplying
up the number of available devices, it's reporting back hundreds of records
per compute node to the schedulers, and the filtering activity is a 3
queries on n * number of PCI devices in cloud records, which could easily
end up in the tens or even hundreds of thousands of records for a
moderately sized cloud.  There compute node also has a record of its device
allocations which is also checked and updated before the final request is
run.

If I need 3 PCI devices and use the groups system, each machine has
reported its device *summary* to the scheduler.  With SRIOV multiplying up
the number of available devices, it's still reporting one or a small number
of categories, i.e. { net: 100}.  The difficulty of scheduling is a query
on num groups * n records - fewer, in fact, if some machines have no
passthrough devices.

You can see that there's quite a cost to be paid for having those flexible
alias APIs.

> 4)   IMHO, the core for nova PCI support is **PCI property**. The
> property means not only generic PCI devices like vendor id, device id,
> device type, compute specific property like BDF address, the adjacent
> switch IP address,  but also user defined property like nuertron’s phys

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells
In any case, we don't have to decide this now.  If we simply allowed the
whitelist to add extra arbitrary properties to the PCI record (like a group
name) and return it to the central server, we could use that in scheduling
for the minute as a group name, we wouldn't implement the APIs for flavors
yet, and we could get a working system that would be minimally changed from
what we already have.  We could worry about the scheduling in the
scheduling group, and we could leave the APIs (which, as I say, are a
minimally useful feature) untl later.  then we'd have something useful in
short order.
-- 
Ian.


On 10 January 2014 13:08, Ian Wells  wrote:

> On 10 January 2014 07:40, Jiang, Yunhong  wrote:
>
>>  Robert, sorry that I’m not fan of * your group * term. To me, *your
>> group” mixed two thing. It’s an extra property provided by configuration,
>> and also it’s a very-not-flexible mechanism to select devices (you can only
>> select devices based on the ‘group name’ property).
>>
>>
> It is exactly that.  It's 0 new config items, 0 new APIs, just an extra
> tag on the whitelists that are already there (although the proposal
> suggests changing the name of them to be more descriptive of what they now
> do).  And you talk about flexibility as if this changes frequently, but in
> fact the grouping / aliasing of devices almost never changes after
> installation, which is, not coincidentally, when the config on the compute
> nodes gets set up.
>
>>  1)   A dynamic group is much better. For example, user may want to
>> select GPU device based on vendor id, or based on vendor_id+device_id. In
>> another word, user want to create group based on vendor_id, or
>> vendor_id+device_id and select devices from these group.  John’s proposal
>> is very good, to provide an API to create the PCI flavor(or alias). I
>> prefer flavor because it’s more openstack style.
>>
> I disagree with this.  I agree that what you're saying offers a more
> flexibilibility after initial installation but I have various issues with
> it.
>
> This is directly related to the hardware configuation on each compute
> node.  For (some) other things of this nature, like provider networks, the
> compute node is the only thing that knows what it has attached to it, and
> it is the store (in configuration) of that information.  If I add a new
> compute node then it's my responsibility to configure it correctly on
> attachment, but when I add a compute node (when I'm setting the cluster up,
> or sometime later on) then it's at that precise point that I know how I've
> attached it and what hardware it's got on it.  Also, it's at this that
> point in time that I write out the configuration file (not by hand, note;
> there's almost certainly automation when configuring hundreds of nodes so
> arguments that 'if I'm writing hundreds of config files one will be wrong'
> are moot).
>
> I'm also not sure there's much reason to change the available devices
> dynamically after that, since that's normally an activity that results from
> changing the physical setup of the machine which implies that actually
> you're going to have access to and be able to change the config as you do
> it.  John did come up with one case where you might be trying to remove old
> GPUs from circulation, but it's a very uncommon case that doesn't seem
> worth coding for, and it's still achievable by changing the config and
> restarting the compute processes.
>
> This also reduces the autonomy of the compute node in favour of
> centralised tracking, which goes against the 'distributed where possible'
> philosophy of Openstack.
>
> Finally, you're not actually removing configuration from the compute
> node.  You still have to configure a whitelist there; in the grouping
> design you also have to configure grouping (flavouring) on the control node
> as well.  The groups proposal adds one extra piece of information to the
> whitelists that are already there to mark groups, not a whole new set of
> config lines.
>
>
> To compare scheduling behaviour:
>
> If I  need 4G of RAM, each compute node has reported its summary of free
> RAM to the scheduler.  I look for a compute node with 4G free, and filter
> the list of compute nodes down.  This is a query on n records, n being the
> number of compute nodes.  I schedule to the compute node, which then
> confirms it does still have 4G free and runs the VM or rejects the request.
>
> If I need 3 PCI devices and use the current system, each machine has
> reported its device allocations to the scheduler.  With SRIOV multiplying
> up the number of available devices, it'

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells
On 10 January 2014 15:30, John Garbutt  wrote:

> We seemed happy with the current system (roughly) around GPU passthrough:
> nova flavor-key  set
> "pci_passthrough:alias"=" large_GPU:1,small_GPU:2"
> nova boot --image some_image --flavor  
>

Actually, I think we pretty solidly disagree on this point.  On the other
hand, Yongli's current patch (with pci_flavor in the whitelist) is pretty
OK.


> nova boot --flavor m1.large --image 
>   --nic net-id=
>   --nic net-id=,nic-type=fast

  --nic net-id=,nic-type=fast 
>

With flavor defined (wherever it's defined):

nova boot ..
   --nic net-id=,pci-flavor=xxx# ok, presumably defaults to
PCI passthrough
   --nic net-id=,pci-flavor=xxx,vnic-attach=macvtap # ok
   --nic net-id= # ok - no flavor = vnic
   --nic port-id=,pci-flavor=xxx# ok, gets vnic-attach from
port
   --nic port-id= # ok - no flavor = vnic



> or
>
> neutron port-create
>   --fixed-ip subnet_id=,ip_address=192.168.57.101
>   --nic-type=
>   
> nova boot --flavor m1.large --image  --nic port-id=
>

No, I think not - specifically because flavors are a nova concept and not a
neutron one, so putting them on the port is inappropriate. Conversely,
vnic-attach is a Neutron concept (fine, nova implements it, but Neutron
tells it how) so I think it *is* a port field, and we'd just set it on the
newly created port when doing nova boot ..,vnic-attach=thing

2) Expand PCI alias information

{
>  "name":"NIC_fast",
>   sriov_info: {
>   "nic_type":"fast",
>
  "network_ids": ["net-id-1", "net-id-2"]
>

Why can't we use the flavor name in --nic (because multiple flavors might
be on one NIC type, I guess)?  Where does e.g. switch/port information go,
particularly since it's per-device (not per-group) and non-scheduling?

I think the issue here is that you assume we group by flavor, then add
extra info, then group into a NIC group.  But for a lot of use cases there
is information that differs on every NIC port, so it makes more sense to
add extra info to a device, then group into flavor and that can also be
used for the --nic.

network_ids is interesting, but this is a nova config file and network_ids
are (a) from Neutron (b) ephemeral, so we can't put them in config.  They
could be provider network names, but that's not the same thing as a neutron
network name and not easily discoverable, outside of Neutron i.e. before
scheduling.

Again, Yongli's current change with pci-flavor in the whitelist records
leads to a reasonable way to how to make this work here, I think;
straightforward extra_info would be fine (though perhaps nice if it's
easier to spot it as of a different type from the whitelist regex fields).
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells
Hey Yunhong,

The thing about 'group' and 'flavor' and 'whitelist' is that they once
meant distinct things (and I think we've been trying to reduce them back
from three things to two or one):

- group: equivalent devices at a host level - use any one, no-one will
care, because they're either identical or as near as makes no difference
- flavor: equivalent devices to an end user - we may re-evaluate our
offerings and group them differently on the fly
- whitelist: either 'something to match the devices you may assign'
(originally) or 'something to match the devices you may assign *and* put
them in the group (in the group proposal)

Bearing in mind what you said about scheduling, and if we skip 'group' for
a moment, then can I suggest (or possibly restate, because your comments
are pointing in this direction):

- we allow extra information to be added at what is now the whitelisting
stage, that just gets carried around with the device
- when we're turning devices into flavors, we can also match on that extra
information if we want (which means we can tag up the devices on the
compute node if we like, according to taste, and then bundle them up by tag
to make flavors; or we can add Neutron specific information and ignore it
when making flavors)
- we would need to add a config param on the control host to decide which
flags to group on when doing the stats (and they would additionally be the
only params that would work for flavors, I think)
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells
On 11 January 2014 00:04, Jiang, Yunhong  wrote:

>  [yjiang5] Really thanks for the summary and it is quite clear. So what’s
> the object of “equivalent devices at host level”? Because ‘equivalent
> device * to an end user *” is flavor, so is it ‘equivalent to **scheduler**”
> or ‘equivalent to **xxx**’? If equivalent to scheduler, then I’d take the
> pci_stats as a flexible group for scheduler
>

To the scheduler, indeed.  And with the group proposal the scheduler and
end user equivalences are one and the same.


> Secondly, for your definition of ‘whitelist’, I’m hesitate to your ‘*and*’
> because IMHO, ‘and’ means mixed two things together, otherwise, we can
> state in simply one sentence. For example, I prefer to have another
> configuration option to define the ‘put devices in the group’, or, if we
> extend it , be “define extra information like ‘group name’ for devices”.
>

I'm not stating what we should do, or what the definitions should mean; I'm
saying how they've been interpreted as weve discussed this in the past.
We've had issues in the past where we've had continuing difficulties in
describing anything without coming back to a 'whitelist' (generally meaning
'matching expression, as an actual 'whitelist' is implied, rather than
separately required, in a grouping system.

  Bearing in mind what you said about scheduling, and if we skip 'group'
> for a moment, then can I suggest (or possibly restate, because your
> comments are pointing in this direction):
>
> - we allow extra information to be added at what is now the whitelisting
> stage, that just gets carried around with the device
>
> [yjiang5] For ‘added at … whitelisting stage’, see my above statement
> about the configuration. However, if you do want to use whitelist, I’m ok,
> but please keep in mind that it’s two functionality combined: device you
> may assign **and** the group name for these devices.
>

Indeed - which is in fact what we've been proposing all along.


>
> - when we're turning devices into flavors, we can also match on that extra
> information if we want (which means we can tag up the devices on the
> compute node if we like, according to taste, and then bundle them up by tag
> to make flavors; or we can add Neutron specific information and ignore it
> when making flavors)
>
> [yjiang5] Agree. Currently we can only use vendor_id and device_id for
> flavor/alias, but we can extend it to cover such extra information since
> now it’s a API.
>
>
>
> - we would need to add a config param on the control host to decide which
> flags to group on when doing the stats (and they would additionally be the
> only params that would work for flavors, I think)
>
> [yjiang5] Agree. And this is achievable because we switch the flavor to be
> API, then we can control the flavor creation process.
>

OK - so if this is good then I think the question is how we could change
the 'pci_whitelist' parameter we have - which, as you say, should either
*only* do whitelisting or be renamed - to allow us to add information.
Yongli has something along those lines but it's not flexible and it
distinguishes poorly between which bits are extra information and which
bits are matching expressions (and it's still called pci_whitelist) - but
even with those criticisms it's very close to what we're talking about.
When we have that I think a lot of the rest of the arguments should simply
resolve themselves.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Building a new open source NFV system for Neutron

2014-01-10 Thread Ian Wells
Hey Luke,

If you look at the passthrough proposals, the overview is that part of the
passthrough work is to ensure there's an PCI function available to allocate
to the VM, and part is to pass that function on to the Neutron plugin via
conventional means.  There's nothing that actually mandates that you
connect the SRIOV port using the passthrough mechanism, and we've been
working on the assumption that we would be supporting the 'macvtap' method
of attachment that Mellanox came up with some time ago.

I think what we'll probably have is a set of standard attachments
(including passthrough) added to the Nova drivers - you'll see in the
virtualisation drivers that Neutron already gets to tell Nova how to attach
the port and can pass auxiliary information - and we will pass the PCI path
and, optionally, other parameters to Neutron in the port-update that
precedes VIF plugging.  That would leave you with the option of passing the
path back and requesting an actual passthrough or coming up with some other
mechanism of your own choosing (which may not involve changing Nova at all,
if you're using your standard virtual plugging mechanism).

-- 
Ian.


On 10 January 2014 19:26, Luke Gorrie  wrote:

> Hi Mike,
>
> On 10 January 2014 17:35, Michael Bright  wrote:
>
> > Very pleased to see this initiative in the OpenStack/NFV space.
>
> Glad to hear it!
>
> > A dumb question - how do you see this related to the ongoing
> >  "[openstack-dev] [nova] [neutron] PCI pass-through network support"
> >
> > discussion on this list?
> >
> > Do you see that work as one component within your proposed architecture
> for
> > example or an alternative implementation?
>
> Good question. I'd like to answer separately about the underlying
> technology on the one hand and the OpenStack API on the other.
>
> The underlying technology of SR-IOV and IOMMU hardware capabilities
> are the same in PCI pass-through and Snabb NFV. The difference is that
> we introduce a very thin layer of software over the top that preserves
> the basic zero-copy operation while adding a Virtio-net abstraction
> towards the VM, packet filtering, tunneling, and policing (to start
> off with). The design goal is to add quite a bit of functionality with
> only a modest processing cost.
>
> The OpenStack API question is more open. How should we best map our
> functionality onto Neutron APIs? This is something we need to thrash
> out together with the community. Our current best guess - which surely
> needs much revision, and is not based on the PCI pass-through
> blueprint - is here:
>
> https://github.com/SnabbCo/snabbswitch/tree/snabbnfv-readme/src/designs/nfv#neutron-configuration
>
> Cheers,
> -Luke
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells
>
> OK - so if this is good then I think the question is how we could change
the 'pci_whitelist' parameter we have - which, as you say, should either
*only* do whitelisting or be renamed - to allow us to add information.
 Yongli has something along those lines but it's not flexible and it
distinguishes poorly between which bits are extra information and which
bits are matching expressions (and it's still called pci_whitelist) - but
even with those criticisms it's very close to what we're talking about.
 When we have that I think a lot of the rest of the arguments should simply
resolve themselves.
>
>
>
> [yjiang5_1] The reason that not easy to find a flexible/distinguishable
change to pci_whitelist is because it combined two things. So a
stupid/naive solution in my head is, change it to VERY generic name,
‘pci_devices_information’,
>
> and change schema as an array of {‘devices_property’=regex exp,
‘group_name’ = ‘g1’} dictionary, and the device_property expression can be
‘address ==xxx, vendor_id == xxx’ (i.e. similar with current white list),
 and we can squeeze more into the “pci_devices_information” in future, like
‘network_information’ = xxx or “Neutron specific information” you required
in previous mail.


We're getting to the stage that an expression parser would be useful,
annoyingly, but if we are going to try and squeeze it into JSON can I
suggest:

{ match = { class = "Acme inc. discombobulator" }, info = { group = "we
like teh groups", volume = "11" } }

>
> All keys other than ‘device_property’ becomes extra information, i.e.
software defined property. These extra information will be carried with the
PCI devices,. Some implementation details, A)we can limit the acceptable
keys, like we only support ‘group_name’, ‘network_id’, or we can accept any
keys other than reserved (vendor_id, device_id etc) one.


Not sure we have a good list of reserved keys at the moment, and with two
dicts it isn't really necessary, I guess.  I would say that we have one
match parser which looks something like this:

# does this PCI device match the expression given?
def match(expression, pci_details, extra_specs):
   for (k, v) in expression:
if k.starts_with('e.'):
   mv = extra_specs.get(k[2:])
else:
   mv = pci_details.get(k[2:])
if not match(m, mv):
return False
return True

Usable in this matching (where 'e.' just won't work) and also for flavor
assignment (where e. will indeed match the extra values).

> B) if a device match ‘device_property’ in several entries, raise
exception, or use the first one.

Use the first one, I think.  It's easier, and potentially more useful.

> [yjiang5_1] Another thing need discussed is, as you pointed out, “we
would need to add a config param on the control host to decide which flags
to group on when doing the stats”.  I agree with the design, but some
details need decided.

This is a patch that can come at any point after we do the above stuff
(which we need for Neutron), clearly.

> Where should it defined. If we a) define it in both control node and
compute node, then it should be static defined (just change pool_keys in
"/opt/stack/nova/nova/pci/pci_stats.py" to a configuration parameter) . Or
b) define only in control node, then I assume the control node should be
the scheduler node, and the scheduler manager need save such information,
present a API to fetch such information and the compute node need fetch it
on every update_available_resource() periodic task. I’d prefer to take a)
option in first step. Your idea?

I think it has to be (a), which is a shame.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Ian Wells
uot;NIC_fast",
>  devices:[
>   {"vendor_id":"1137","product_id":"0071", address:"0:[1-50]:2:*",
> "attach-type":"macvtap"}
>   {"vendor_id":"1234","product_id":"0081", address:"*",
> "attach-type":"direct"}  ],
>  sriov_info: {
>   "nic_type":"fast",
>   "network_ids": ["net-id-1", "net-id-2"]  } }
>
> {
>  "name":"NIC_slower",
>  devices:[
>   {"vendor_id":"1137","product_id":"0071", address:"*",
> "attach-type":"direct"}
>   {"vendor_id":"1234","product_id":"0081", address:"*",
> "attach-type":"direct"}  ],
>  sriov_info: {
>   "nic_type":"fast",
>   "network_ids": ["*"]  # this means could attach to any network  } }
>
> The idea being the VIF driver gets passed this info, when network_info
> includes a nic that matches.
> Any other details, like VLAN id, would come from neutron, and passed to
> the VIF driver as normal.
>
>
> 3) Reading "nic_type" and doing the PCI passthrough of NIC user requests
>
> Not sure we are agreed on this, but basically:
> * network_info contains "nic-type" from neutron
> * need to select the correct VIF driver
> * need to pass matching PCI alias information to VIF driver
> * neutron passes details other details (like VLAN id) as before
> * nova gives VIF driver an API that allows it to attach PCI devices that
> are in the whitelist to the VM being configured
> * with all this, the VIF driver can do what it needs to do
> * lets keep it simple, and expand it as the need arrises
>
> 4) Make changes to VIF drivers, so the above is implemented
>
> Depends on (3)
>
>
>
> These seems like some good steps to get the basics in place for PCI
> passthrough networking.
> Once its working, we can review it and see if there are things that need
> to evolve further.
>
> Does that seem like a workable approach?
> Who is willing to implement any of (1), (2) and (3)?
>
>
> Cheers,
> John
>
>
> On 9 January 2014 17:47, Ian Wells  wrote:
> > I think I'm in agreement with all of this.  Nice summary, Robert.
> >
> > It may not be where the work ends, but if we could get this done the
> > rest is just refinement.
> >
> >
> > On 9 January 2014 17:49, Robert Li (baoli)  wrote:
> >>
> >> Hi Folks,
> >>
> >>
> >> With John joining the IRC, so far, we had a couple of productive
> >> meetings in an effort to come to consensus and move forward. Thanks
> >> John for doing that, and I appreciate everyone's effort to make it to
> the daily meeting.
> >> Let's reconvene on Monday.
> >>
> >> But before that, and based on our today's conversation on IRC, I'd
> >> like to say a few things. I think that first of all, we need to get
> >> agreement on the terminologies that we are using so far. With the
> >> current nova PCI passthrough
> >>
> >> PCI whitelist: defines all the available PCI passthrough
> >> devices on a compute node. pci_passthrough_whitelist=[{
> >> "vendor_id":"","product_id":""}]
> >> PCI Alias: criteria defined on the controller node with which
> >> requested PCI passthrough devices can be selected from all the PCI
> >> passthrough devices available in a cloud.
> >> Currently it has the following format:
> >> pci_alias={"vendor_id":"", "product_id":"", "name":"str"}
> >>
> >> nova flavor extra_specs: request for PCI passthrough devices
> >> can be specified with extra_specs in the format for
> >> example:"pci_passthrough:alias"="name:count"
> >>
> >> As you can see, currently a PCI alias has a name and is defined on
> >> the controller. The implications for it is that when matching it
> >> against the PCI devices, it has to match the vendor_id and product_id
> >> against all the available PCI devices until one is found. The name is
> >> only used for reference in the extra_specs. On the other hand, the
> >> whitelist is basically the same as the alias without a name.
> >>
> >> What we have discussed so far is based on something called PCI groups
> >&

Re: [openstack-dev] [Neutron] Building a new open source NFV system for Neutron

2014-01-13 Thread Ian Wells
Understood.  You should be able to make that work but the issue is
allocating your VM to some machine that has spare hardware - which is
really what the patches are about, Nova manages allocations and Neutron
manages using the hardware when appropriate.  From past experience with the
patch that was around back in the Essex timeframe, you can get this to work
temporarily by rejecting the schedule in nova-compute when the machine is
short of hardware and using a high schedule retry count, which will get you
somewhere but is obviously a bit sucky in the long run.
-- 
Ian.


On 13 January 2014 18:44, Luke Gorrie  wrote:

> Howdy Ian!
>
> Thanks for the background on the Passthrough work.
>
> I reckon the best choice for us now is to use the traditional Neutron
> APIs instead of Passthrough. I think they cover all of our use cases
> as it stands now (many thanks to you for your earlier help with
> working this out :)). The idea is to put the SR-IOV hardware to work
> behind-the-scenes of a normal software switch.
>
> We will definitely check out the Passthrough when it's ready and see
> if we should also support that somehow.
>
>
> On 11 January 2014 01:04, Ian Wells  wrote:
> > Hey Luke,
> >
> > If you look at the passthrough proposals, the overview is that part of
> the
> > passthrough work is to ensure there's an PCI function available to
> allocate
> > to the VM, and part is to pass that function on to the Neutron plugin via
> > conventional means.  There's nothing that actually mandates that you
> connect
> > the SRIOV port using the passthrough mechanism, and we've been working on
> > the assumption that we would be supporting the 'macvtap' method of
> > attachment that Mellanox came up with some time ago.
> >
> > I think what we'll probably have is a set of standard attachments
> (including
> > passthrough) added to the Nova drivers - you'll see in the virtualisation
> > drivers that Neutron already gets to tell Nova how to attach the port and
> > can pass auxiliary information - and we will pass the PCI path and,
> > optionally, other parameters to Neutron in the port-update that precedes
> VIF
> > plugging.  That would leave you with the option of passing the path back
> and
> > requesting an actual passthrough or coming up with some other mechanism
> of
> > your own choosing (which may not involve changing Nova at all, if you're
> > using your standard virtual plugging mechanism).
> >
> > --
> > Ian.
> >
> >
> > On 10 January 2014 19:26, Luke Gorrie  wrote:
> >>
> >> Hi Mike,
> >>
> >> On 10 January 2014 17:35, Michael Bright  wrote:
> >>
> >> > Very pleased to see this initiative in the OpenStack/NFV space.
> >>
> >> Glad to hear it!
> >>
> >> > A dumb question - how do you see this related to the ongoing
> >> >  "[openstack-dev] [nova] [neutron] PCI pass-through network
> support"
> >> >
> >> > discussion on this list?
> >> >
> >> > Do you see that work as one component within your proposed
> architecture
> >> > for
> >> > example or an alternative implementation?
> >>
> >> Good question. I'd like to answer separately about the underlying
> >> technology on the one hand and the OpenStack API on the other.
> >>
> >> The underlying technology of SR-IOV and IOMMU hardware capabilities
> >> are the same in PCI pass-through and Snabb NFV. The difference is that
> >> we introduce a very thin layer of software over the top that preserves
> >> the basic zero-copy operation while adding a Virtio-net abstraction
> >> towards the VM, packet filtering, tunneling, and policing (to start
> >> off with). The design goal is to add quite a bit of functionality with
> >> only a modest processing cost.
> >>
> >> The OpenStack API question is more open. How should we best map our
> >> functionality onto Neutron APIs? This is something we need to thrash
> >> out together with the community. Our current best guess - which surely
> >> needs much revision, and is not based on the PCI pass-through
> >> blueprint - is here:
> >>
> >>
> https://github.com/SnabbCo/snabbswitch/tree/snabbnfv-readme/src/designs/nfv#neutron-configuration
> >>
> >> Cheers,
> >> -Luke
> >>
> >> ___
> >> OpenStack-dev mailing list
> >> OpenStack-dev@lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPv6] Subnet mode - API extension or change to core API?

2014-01-13 Thread Ian Wells
I would say that since v4 dhcp_mode is core, the DHCPv6/RA setting should
similarly be core.

To fill others in, we've had discussions on the rest of the patch and
Shixiong is working on it now, the current plan is:

New subnet attribute ipv6_address_auto_config (not catchy, but because of
the way that ipv6 works it's not simply DHCPv6) with the four values:

off - no packets sent to assign addresses for this subnet, do it yourself
slaac - RA packet sent telling the machine to choose its own address from
within the subnet; it will choose an address based on its own MAC; because
we're talking servers here, this will explicitly *not* work with ipv6
privacy extensions, because - as with the ipv4 implementation - we need
one, fixed, *known* address that's planned in advance to make firewalling
etc. work
dhcpv6-stateless - RA packet allocates address before, plus DHCPv6 running
to provide additional information if requested
dhcpv6-stateful - DHCPv6 will assign the address set on the port rather
than leaving the machine to work it out from the MAC, along with other
information as required.  (For the other settings, the address on the port
will be hard coded to the MAC-based address; for this one it may well be
hardcoded initially but will eventually be modifiable as for the v4
address.)

Port firewalling (i.e. security groups, antispoof) will consume the
information on the port and subnet as usual.

Obviously you can, as before, use static address config in your VM image or
config-drive setup, independent of the above options; this just determines
what network functions will be set up and running.
-- 
Ian.


On 13 January 2014 18:24, Collins, Sean wrote:

> Hi,
>
> I posted a message to the mailing list[1] when I first began work on the
> subnet mode keyword, asking if anyone had a suggestion about if it
> should be an API extension or can be a change to the core API.
>
> > I don't know if adding the "dhcp_mode" attribute to Subnets should be
> > considered an API extension (and the code should be converted to an API
> > extension) or if we're simply specifying behavior that was originally
> undefined.
>
> In the months since, we have iterated on the commit, and have continued
> working on IPv6 functionality in Neutron.
>
> Nachi recently -1'd the review[2], saying that it needs to be an API
> extension.
>
> I disagree that it should be an API extension, since I have added
> behavior that sets the subnet_mode keyword to default with the attribute
> is not specified, for backwards compatibility. Any plugin that inherits
> from the NeutronDbPluginV2 class will have backwards compatibility.
>
> Suggestions?
>
> [1]:
> http://lists.openstack.org/pipermail/openstack-dev/2013-October/017087.html
> [2]: https://review.openstack.org/#/c/52983/
> --
> Sean M. Collins
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Ian Wells
It's worth noting that this makes the scheduling a computationally hard
problem. The answer to that in this scheme is to reduce the number of
inputs to trivialise the problem.  It's going to be O(f(number of flavor
types requested, number of pci_stats pools)) and if you group appropriately
there shouldn't be an excessive number of pci_stats pools.  I am not going
to stand up and say this makes it achievable - and if it doesn't them I'm
not sure that anything would make overlapping flavors achievable - but I
think it gives us some hope.
-- 
Ian.


On 13 January 2014 19:27, Jiang, Yunhong  wrote:

>  Hi, Robert, scheduler keep count based on pci_stats instead of the pci
> flavor.
>
>
>
> As stated by Ian at
> https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg13455.htmlalready,
>  the flavor will only use the tags used by pci_stats.
>
>
>
> Thanks
>
> --jyh
>
>
>
> *From:* Robert Li (baoli) [mailto:ba...@cisco.com]
> *Sent:* Monday, January 13, 2014 8:22 AM
>
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
>
>
>
> As I have responded in the other email, and If I understand PCI flavor
> correctly, then the issue that we need to deal with is the overlapping
> issue. A simplest case of this overlapping is that you can define a flavor
> F1 as [vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v']
> .  Let's assume that only the admin can define the flavors. It's not hard
> to see that a device can belong to the two different flavors in the same
> time. This introduces an issue in the scheduler. Suppose the scheduler
> (counts or stats based) maintains counts based on flavors (or the keys
> corresponding to the flavors). To request a device with the flavor F1,
>  counts in F2 needs to be subtracted by one as well. There may be several
> ways to achieve that. But regardless, it introduces tremendous overhead in
> terms of system processing and administrative costs.
>
>
>
> What are the use cases for that? How practical are those use cases?
>
>
>
> thanks,
>
> Robert
>
>
>
> On 1/10/14 9:34 PM, "Ian Wells"  wrote:
>
>
>
>
> >
> > OK - so if this is good then I think the question is how we could change
> the 'pci_whitelist' parameter we have - which, as you say, should either
> *only* do whitelisting or be renamed - to allow us to add information.
>  Yongli has something along those lines but it's not flexible and it
> distinguishes poorly between which bits are extra information and which
> bits are matching expressions (and it's still called pci_whitelist) - but
> even with those criticisms it's very close to what we're talking about.
>  When we have that I think a lot of the rest of the arguments should simply
> resolve themselves.
> >
> >
> >
> > [yjiang5_1] The reason that not easy to find a flexible/distinguishable
> change to pci_whitelist is because it combined two things. So a
> stupid/naive solution in my head is, change it to VERY generic name,
> ‘pci_devices_information’,
> >
> > and change schema as an array of {‘devices_property’=regex exp,
> ‘group_name’ = ‘g1’} dictionary, and the device_property expression can be
> ‘address ==xxx, vendor_id == xxx’ (i.e. similar with current white list),
>  and we can squeeze more into the “pci_devices_information” in future, like
> ‘network_information’ = xxx or “Neutron specific information” you required
> in previous mail.
>
>
> We're getting to the stage that an expression parser would be useful,
> annoyingly, but if we are going to try and squeeze it into JSON can I
> suggest:
>
> { match = { class = "Acme inc. discombobulator" }, info = { group = "we
> like teh groups", volume = "11" } }
>
> >
> > All keys other than ‘device_property’ becomes extra information, i.e.
> software defined property. These extra information will be carried with the
> PCI devices,. Some implementation details, A)we can limit the acceptable
> keys, like we only support ‘group_name’, ‘network_id’, or we can accept any
> keys other than reserved (vendor_id, device_id etc) one.
>
>
> Not sure we have a good list of reserved keys at the moment, and with two
> dicts it isn't really necessary, I guess.  I would say that we have one
> match parser which looks something like this:
>
> # does this PCI device match the expression given?
> def match(expression, pci_details, extra_specs):
>for (k, v) in expression:
> if k.starts_with('e.'):
>m

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Ian Wells
If there are N flavor types there are N match expressions so I think it's
pretty much equivalent in terms of complexity.  It looks like some sort of
packing problem to me, trying to fit N objects into M boxes, hence my
statement that it's not going to be easy, but that's just a gut feeling -
some of the matches can be vague, such as only the vendor ID or a vendor
and two device types, so it's not as simple as one flavor matching one
stats row.
-- 
Ian.


On 13 January 2014 21:00, Jiang, Yunhong  wrote:

>  Ian, not sure if I get your question. Why should scheduler get the
> number of flavor types requested? The scheduler will only translate the PCI
> flavor to the pci property match requirement like it does now, (either
> vendor_id, device_id, or item in extra_info), then match the translated pci
> flavor, i.e. pci requests, to the pci stats.
>
>
>
> Thanks
>
> --jyh
>
>
>
> *From:* Ian Wells [mailto:ijw.ubu...@cack.org.uk]
> *Sent:* Monday, January 13, 2014 10:57 AM
>
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
>
>
>
> It's worth noting that this makes the scheduling a computationally hard
> problem. The answer to that in this scheme is to reduce the number of
> inputs to trivialise the problem.  It's going to be O(f(number of flavor
> types requested, number of pci_stats pools)) and if you group appropriately
> there shouldn't be an excessive number of pci_stats pools.  I am not going
> to stand up and say this makes it achievable - and if it doesn't them I'm
> not sure that anything would make overlapping flavors achievable - but I
> think it gives us some hope.
> --
>
> Ian.
>
>
>
> On 13 January 2014 19:27, Jiang, Yunhong  wrote:
>
> Hi, Robert, scheduler keep count based on pci_stats instead of the pci
> flavor.
>
>
>
> As stated by Ian at
> https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg13455.htmlalready,
>  the flavor will only use the tags used by pci_stats.
>
>
>
> Thanks
>
> --jyh
>
>
>
> *From:* Robert Li (baoli) [mailto:ba...@cisco.com]
> *Sent:* Monday, January 13, 2014 8:22 AM
>
>
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
>
>
>
> As I have responded in the other email, and If I understand PCI flavor
> correctly, then the issue that we need to deal with is the overlapping
> issue. A simplest case of this overlapping is that you can define a flavor
> F1 as [vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v']
> .  Let's assume that only the admin can define the flavors. It's not hard
> to see that a device can belong to the two different flavors in the same
> time. This introduces an issue in the scheduler. Suppose the scheduler
> (counts or stats based) maintains counts based on flavors (or the keys
> corresponding to the flavors). To request a device with the flavor F1,
>  counts in F2 needs to be subtracted by one as well. There may be several
> ways to achieve that. But regardless, it introduces tremendous overhead in
> terms of system processing and administrative costs.
>
>
>
> What are the use cases for that? How practical are those use cases?
>
>
>
> thanks,
>
> Robert
>
>
>
> On 1/10/14 9:34 PM, "Ian Wells"  wrote:
>
>
>
>
> >
> > OK - so if this is good then I think the question is how we could change
> the 'pci_whitelist' parameter we have - which, as you say, should either
> *only* do whitelisting or be renamed - to allow us to add information.
>  Yongli has something along those lines but it's not flexible and it
> distinguishes poorly between which bits are extra information and which
> bits are matching expressions (and it's still called pci_whitelist) - but
> even with those criticisms it's very close to what we're talking about.
>  When we have that I think a lot of the rest of the arguments should simply
> resolve themselves.
> >
> >
> >
> > [yjiang5_1] The reason that not easy to find a flexible/distinguishable
> change to pci_whitelist is because it combined two things. So a
> stupid/naive solution in my head is, change it to VERY generic name,
> ‘pci_devices_information’,
> >
> > and change schema as an array of {‘devices_property’=regex exp,
> ‘group_name’ = ‘g1’} dictionary, and the device_property expression can be
> ‘address ==xxx, vendor_id == xxx’ (i.e. similar with current white list),
>  and we can squeeze m

Re: [openstack-dev] [Neutron] About ports backing floating IPs

2014-01-15 Thread Ian Wells
Logically, the port is really an alias for the external port of the router,
rather than being just detached.  I'm not sure this adds much to the
discussion, but clearly that's where its traffic goes and is terminated.

>From past experience (don't ask) weird things happen if you start creating
your own ports on the external network.  It's actually not a completely
daft thing to do, if you're brave, but I don't think the IPAM for the
network plays well with the IPAM for floating IPs.  The answer is probably
that floating IPs should be pulling out of the network's address pool, at
the least.
-- 
Ian.


On 15 January 2014 00:50, Salvatore Orlando  wrote:

> TL;DR;
> I have been looking back at the API and found out that it's a bit weird
> how floating IPs are mapped to ports. This might or might not be an issue,
> and several things can be done about it.
> The rest of this post is a boring description of the problem and a
> possibly even more boring list of potential solutions.
>
> Floating IPs are backed by ports on the external network where they are
> implemented; while there are good reason for doing so, this has some
> seemingly weird side effects, which are usually not visible to tenants as
> only admins are allowed (by default) to view the ports backing the floating
> IPs.
>
> Assigning an external port to a floating IP is an easy way for ensuring
> the IP address used for the floating IP is then not reused for other
> allocation purposes on the external network; indeed admin users might start
> VMs on external networks as well. Conceptually, it is also an example of
> port-level insertion for a network service (DNAT/SNAT).
>
> However these are the tricky aspects:
> - IP Address changes: The API allows IP address updates for a floating IP
> port. However as it might be expected, the IP of the floating IP entities
> does not change, as well as the actual floating IP implemented in the
> backend (l3 agent or whatever the plugin uses).
> - operational status: It is always down at least for plugins based on
> OVS/LB agents. This is because there is no actual VIF backing a floating
> IP, so there is nothing to wire.
> - admin status: updating it just has no effect at all
> - Security groups and  allowed address pairs: The API allows for updating
> them, but it is not clear whether something actually happens in the
> backend, and I'm even not entirely sure this makes sense at all.
>
> Why these things happen, whether it's intended behaviour, and whether it's
> the right behaviour it's debatable.
>
> From my perspective, this leads to inconsistent state, as:
> - the address reported in the floating IP entity might differ from the one
> on the port backing the floating IP
> - operational status is wrongly represented as down
> - expectations concerning operations on the port are not met (eg: admin
> status update)
> And I reckon state inconsistencies should always be avoided.
>
> Considering the situation described above, there are few possible options.
>
> 1- don't do anything, since the port backing the floating IP is hidden
> from the tenant.
> This might be ok provided that a compelling reason for ignoring entities
> not visible to tenants is provided.
> However it has to be noted that Neutron authZ logic, which is based on
> openstack.common would allow deployers to change that (*)
>
> 2- remove the need for a floating IP to be backed from a port
> While this might seem simple, this has non-trivial implications as IPAM
> logic would need to become aware of floating IPs, and should  be discussed
> further.
>
> 3- leverage policy-based APIs, and transform floating IPs in a "remote
> access policy"
> In this way the floating IP will become a policy to apply to a port; it
> will be easier to solve conflicts with security policies and it will be
> possible to just use IPs (or addressing policies) configured on the port.
> However, this will be hardly backward compatible, and its feasibility
> depends on the outcome of the more general discussions on policy-based APIs
> for neutron.
>
> 4- Document the current behaviour
> This is something which is probably worth doing anyway until a solution is
> agreed upon
>
> Summarising, since all the 'technical' options sounds not feasible for the
> upcoming Icehouse release, it seems worth at least documenting the current
> behaviour, and start a discussion on whether we should do something about
> this and, if yes, what.
>
> Regards and apologies for the long post,
> Salvatore
>
> (*) As an interesting corollary, the flexibility of making authZ policies
> super-configurable causes the API to be non-portable. However, this is a
> subject for a different discussion.
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-15 Thread Ian Wells
To clarify a couple of Robert's points, since we had a conversation earlier:
On 15 January 2014 23:47, Robert Li (baoli)  wrote:

>   ---  do we agree that BDF address (or device id, whatever you call it),
> and node id shouldn't be used as attributes in defining a PCI flavor?
>

Note that the current spec doesn't actually exclude it as an option.  It's
just an unwise thing to do.  In theory, you could elect to define your
flavors using the BDF attribute but determining 'the card in this slot is
equivalent to all the other cards in the same slot in other machines' is
probably not the best idea...  We could lock it out as an option or we
could just assume that administrators wouldn't be daft enough to try.

* the compute node needs to know the PCI flavor. [...]
>   - to support live migration, we need to use it to create
> network xml
>

I didn't understand this at first and it took me a while to get what Robert
meant here.

This is based on Robert's current code for macvtap based live migration.
The issue is that if you wish to migrate a VM and it's tied to a physical
interface, you can't guarantee that the same physical interface is going to
be used on the target machine, but at the same time you can't change the
libvirt.xml as it comes over with the migrating machine.  The answer is to
define a network and refer out to it from libvirt.xml.  In Robert's current
code he's using the group name of the PCI devices to create a network
containing the list of equivalent devices (those in the group) that can be
macvtapped.  Thus when the host migrates it will find another, equivalent,
interface.  This falls over in the use case under consideration where a
device can be mapped using more than one flavor, so we have to discard the
use case or rethink the implementation.

There's a more complex solution - I think - where we create a temporary
network for each macvtap interface a machine's going to use, with a name
based on the instance UUID and port number, and containing the device to
map.  Before starting the migration we would create a replacement network
containing only the new device on the target host; migration would find the
network from the name in the libvirt.xml, and the content of that network
would behave identically.  We'd be creating libvirt networks on the fly and
a lot more of them, and we'd need decent cleanup code too ('when freeing a
PCI device, delete any network it's a member of'), so it all becomes a lot
more hairy.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] auto configration of local_ip

2014-01-16 Thread Ian Wells
On 16 January 2014 10:51, Robert Collins  wrote:

> > 1. assigned to the interface attached to default gateway
>

Which you may not have, or may be on the wrong interface (if I'm setting up
a control node I usually have the default gateway on the interface with the
API endpoints, which I emphatically don't use for internal traffic like
tunnelling)


> > 2. being in the specified network (CIDR)
>
> 3. assigned to the specified interface
> >(1 can be considered a special case of 3)
>

Except that (1) and (2) specify a subnet and a single address, and an
interface in (3) can have multiple addresses.

How about 4. Send a few packets with a nonce in them to any of the
> already meshed nodes, and those nodes can report what ip they
> originated from.
>

Which doesn't work unless you've discovered the IP address on another
machine - chicken, meet egg...

I appreciate the effort but I've seen people try this repeatedly and it's a
much harder problem than it appears to be.  There's no easy way, for a
given machine, to guess which interface you should be using.  Robert's
suggestion of a broadcast is actually the best idea I've seen so far - you
could, for instance, use MDNS to work out where the control node is and
which interface is which when you add a compute node, which would certainly
be elegant - but I'm concerned about taking a stab in the dark at an
important config item when there really isn't a good way of working it out.

Sorry,
--
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-16 Thread Ian Wells
On 16 January 2014 09:07, yongli he  wrote:

>  On 2014年01月16日 08:28, Ian Wells wrote:
>
> This is based on Robert's current code for macvtap based live migration.
> The issue is that if you wish to migrate a VM and it's tied to a physical
> interface, you can't guarantee that the same physical interface is going to
> be used on the target machine, but at the same time you can't change the
> libvirt.xml as it comes over with the migrating machine.  The answer is to
> define a network and refer out to it from libvirt.xml.  In Robert's current
> code he's using the group name of the PCI devices to create a network
> containing the list of equivalent devices (those in the group) that can be
> macvtapped.  Thus when the host migrates it will find another, equivalent,
> interface.  This falls over in the use case under
>
> but, with flavor we defined, the group could be a tag for this purpose,
> and all Robert's design still work, so it ok, right?
>

Well, you could make a label up consisting of the values of the attributes
in the group, but since a flavor can encompass multiple groups (for
instance, I group by device and vendor and then I use two device types in
my flavor) this still doesn't work.  Irena's solution does, though.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] "Evil" Firmware

2014-01-17 Thread Ian Wells
On 17 January 2014 01:16, Chris Friesen  wrote:

> On 01/16/2014 05:12 PM, CARVER, PAUL wrote:
>
>  Jumping back to an earlier part of the discussion, it occurs to me
>> that this has broader implications. There's some discussion going on
>> under the heading of Neutron with regard to PCI passthrough. I
>> imagine it's under Neutron because of a desire to provide passthrough
>> access to NICs, but given some of the activity around GPU based
>> computing it seems like sooner or later someone is going to try to
>> offer multi-tenant cloud servers with the ability to do GPU based
>> computing if they haven't already.
>>
>
> I'd expect that the situation with PCI passthrough may be a bit different,
> at least in the common case.
>
> The usual scenario is to use SR-IOV to have a single physical device
> expose a bunch of virtual functions, and then a virtual function is passed
> through into a guest.
>

That entirely depends on the card in question.  Some cards support SRIOV
and some don't (you wouldn't normally use SRIOV on a GPU, as I understand
it, though you might reasonably expect it on a modern network card).  Even
on cards that do support SRIOV there's nothing stopping you assigning the
whole card.

But from the discussion here it seems that (whole card passthrough) +
(reprorgrammable firmware) would be the danger, and programmatically
there's no way to tell from the passthrough code in Nova whether any given
card has programmable firmware.  It's a fairly safe bet you can't reprogram
firmware permanently from a VF, agreed.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] "Evil" Firmware

2014-01-17 Thread Ian Wells
On 17 January 2014 09:12, Robert Collins  wrote:

> > The physical function is the one with the "real" PCI config space, so as
> > long as the host controls it then there should be minimal risk from the
> > guests since they have limited access via the virtual
> functions--typically
> > mostly just message-passing to the physical function.
>
> As long as its a whitelist of audited message handlers, thats fine. Of
> course, if the message handlers haven't been audited, who knows whats
> lurking in there.


The description doesn't quite gel with my understanding - SRIOV VFs *do*
have a PCI space that you can map in, and a DMA as well, typically (which
is virtualised via the page table for the VM).  However, some functions of
the card may not be controllable in that space (e.g., for network devices,
VLAN encapsulation, promiscuity, and so on) and you may have to make a
request from the VF in the VM to the PF in the host kernel.

The message channels in question are implemented in the PF and VF drivers
in the Linux kernel code (the PF end being the one where security matters,
since a sufficiently malicious VM can try it on at the VF end and see what
happens).  I don't know whether you consider that audited enough.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-20 Thread Ian Wells
On 20 January 2014 09:28, Irena Berezovsky  wrote:

> Hi,
> Having post PCI meeting discussion with Ian based on his proposal
> https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#
> ,
> I am  not sure that the case that quite usable for SR-IOV based networking
> is covered well by this proposal. The understanding I got is that VM can
> land on the Host that will lack suitable PCI resource.
>

The issue we have is if we have multiple underlying networks in the system
and only some Neutron networks are trunked on the network that the PCI
device is attached to.  This can specifically happen in the case of
provider versus trunk networks, though it's very dependent on the setup of
your system.

The issue is that, in the design we have, Neutron at present has no input
into scheduling, and also that all devices in a flavor are precisely
equivalent.  So if I say 'I want a 10G card attached to network X' I will
get one of the cases in the 10G flavor with no regard as to whether it can
actually attach to network X.

I can see two options here:

1. What I'd do right now is I would make it so that a VM that is given an
unsuitable network card fails to run in nova-compute when Neutorn discovers
it can't attach the PCI device to the network.  This will get us a lot of
use cases and a Neutron driver without solving the problem elegantly.
You'd need to choose e.g. a provider or tenant network flavor, mindful of
the network you're connecting to, so that Neutron can actually succeed,
which is more visibility into the workings of Neutron than the user really
ought to need.

2. When Nova checks that all the networks exist - which, conveniently, is
in nova-api - it also gets attributes from the networks that can be used by
the scheduler to choose a device.  So the scheduler chooses from a flavor
*and*, within that flavor, from a subset of those devices with appopriate
connectivity.  If we do this then the Neutron connection code doesn't
change - it should still fail if the connection can't be made - but it
becomes an internal error, since it's now an issue of consistency of
setup.

To do this, I think we would tell Neutron 'PCI extra-info X should be set
to Y for this provider network and Z for tenant networks' - the precise
implementation would be somewhat up to the driver - and then add the
additional check in the scheduler.  The scheduling attributes list would
have to include that attribute.

Can you please provide an example for the required cloud admin PCI related
> configurations on nova-compute and controller node with regards to the
> following simplified scenario:
>  -- There are 2 provider networks (phy1, phy2), each one has associated
> range on vlan-ids
>  -- Each compute node has 2 vendor adapters with SR-IOV  enabled feature,
> exposing xx Virtual Functions.
>  -- Every VM vnic on virtual network on provider network  phy1 or phy2
>  should be pci pass-through vnic.
>

So, we would configure Neutron to check the 'e.physical_network' attribute
on connection and to return it as a requirement on networks.  Any PCI on
provider network 'phy1' would be tagged e.physical_network => 'phy1'.  When
returning the network, an extra attribute would be supplied (perhaps
something like 'pci_requirements => { e.physical_network => 'phy1'}'.  And
nova-api would know that, in the case of macvtap and PCI directmap, it
would need to pass this additional information to the scheduler which would
need to make use of it in finding a device, over and above the flavor
requirements.

Neutron, when mapping a PCI port, would similarly work out from the Neutron
network the trunk it needs to connect to, and would reject any mapping that
didn't conform. If it did, it would work out how to encapsulate the traffic
from the PCI device and set that up on the PF of the port.

I'm not saying this is the only or best solution, but it does have the
advantage that it keeps all of the networking behaviour in Neutron -
hopefully Nova remains almost completely ignorant of what the network setup
is, since the only thing we have to do is pass on PCI requirements, and we
already have a convenient call flow we can use that's there for the network
existence check.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neturon] firewall_driver and ML2 and vif_security discussion

2014-01-20 Thread Ian Wells
On 20 January 2014 10:13, Mathieu Rohon  wrote:

> With such an architecture, we wouldn't have to tell neutron about
> vif_security or vif_type when it creates a port. When Neutron get
> called with port_create, it should only return the tap created.
>

Not entirely true.  Not every libvirt port is a tap; if you're doing things
with PCI passthrough attachment you want different libvirt configuration
(and, in this instance, also different Xen and everything else
configuration), and you still need vif_type to distinguish.  You just don't
need 101 values for 'this is a *special and unique* sort of software
bridge'.

I don't know if such a proposal is reasonable since I can't find good

> informations about the ability of libvirt to use an already created
> tap, when it creates a VM. It seem to be usable with KVM.
> But I would love to have feedback of the communnity on this
> architecture. May be it has already been discussed on the ML, so
> please give me the pointer.
>

libvirt will attach to many things, but I'm damned if I can work out if it
will attach to a tap, either.

To my mind, it would make that much more sense if Neutron created,
networked and firewalled a tap and returned it completely set up (versus
now, where the VM can start with a half-configured set of separation and
firewall rules that get patched up asynchronously).
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-21 Thread Ian Wells
Document updated to talk about network aware scheduling (
https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit#-
section just before the use case list).

Based on yesterday's meeting, rkukura would also like to see network-aware
scheduling to work for non-PCI cases - where servers are not necessarily
connected to every physical segment and machines therefore need placing
based on where they can reach the networks they need.  I think this is an
exact parallel to the PCI case, except that we're also constrained by a
count of resources (you can connect an infinite number of VMs to a software
bridge, of course).  We should implement the scheduling changes as a
separate batch of work that solves both problems, if we can - and this
works with the two step approach, because step 1 brings us up to Neutron
parity and step 2 will add network-aware scheduling for both PCI and
non-PCI cases.

-- 
Ian.


On 20 January 2014 13:38, Ian Wells  wrote:

> On 20 January 2014 09:28, Irena Berezovsky  wrote:
>
>> Hi,
>> Having post PCI meeting discussion with Ian based on his proposal
>> https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#
>> ,
>> I am  not sure that the case that quite usable for SR-IOV based
>> networking is covered well by this proposal. The understanding I got is
>> that VM can land on the Host that will lack suitable PCI resource.
>>
>
> The issue we have is if we have multiple underlying networks in the system
> and only some Neutron networks are trunked on the network that the PCI
> device is attached to.  This can specifically happen in the case of
> provider versus trunk networks, though it's very dependent on the setup of
> your system.
>
> The issue is that, in the design we have, Neutron at present has no input
> into scheduling, and also that all devices in a flavor are precisely
> equivalent.  So if I say 'I want a 10G card attached to network X' I will
> get one of the cases in the 10G flavor with no regard as to whether it can
> actually attach to network X.
>
> I can see two options here:
>
> 1. What I'd do right now is I would make it so that a VM that is given an
> unsuitable network card fails to run in nova-compute when Neutorn discovers
> it can't attach the PCI device to the network.  This will get us a lot of
> use cases and a Neutron driver without solving the problem elegantly.
> You'd need to choose e.g. a provider or tenant network flavor, mindful of
> the network you're connecting to, so that Neutron can actually succeed,
> which is more visibility into the workings of Neutron than the user really
> ought to need.
>
> 2. When Nova checks that all the networks exist - which, conveniently, is
> in nova-api - it also gets attributes from the networks that can be used by
> the scheduler to choose a device.  So the scheduler chooses from a flavor
> *and*, within that flavor, from a subset of those devices with appopriate
> connectivity.  If we do this then the Neutron connection code doesn't
> change - it should still fail if the connection can't be made - but it
> becomes an internal error, since it's now an issue of consistency of
> setup.
>
> To do this, I think we would tell Neutron 'PCI extra-info X should be set
> to Y for this provider network and Z for tenant networks' - the precise
> implementation would be somewhat up to the driver - and then add the
> additional check in the scheduler.  The scheduling attributes list would
> have to include that attribute.
>
> Can you please provide an example for the required cloud admin PCI related
>> configurations on nova-compute and controller node with regards to the
>> following simplified scenario:
>>  -- There are 2 provider networks (phy1, phy2), each one has associated
>> range on vlan-ids
>>  -- Each compute node has 2 vendor adapters with SR-IOV  enabled feature,
>> exposing xx Virtual Functions.
>>  -- Every VM vnic on virtual network on provider network  phy1 or phy2
>>  should be pci pass-through vnic.
>>
>
> So, we would configure Neutron to check the 'e.physical_network' attribute
> on connection and to return it as a requirement on networks.  Any PCI on
> provider network 'phy1' would be tagged e.physical_network => 'phy1'.  When
> returning the network, an extra attribute would be supplied (perhaps
> something like 'pci_requirements => { e.physical_network => 'phy1'}'.  And
> nova-api would know that, in the case of macvtap and PCI directmap, it
> would need to pass this additional information to the scheduler which would
> need to make use of it in finding a device, 

Re: [openstack-dev] [Neutron] Selectively disabling certain built in iptables rules

2014-01-21 Thread Ian Wells
Paul,

There's an extension for this that is, I think, presently only implemented
by the Nicira plugin.  Look for portsecurity.  Whatever they do is probably
the way you should do it too.

Cheers,
-- 
Ian.


On 21 January 2014 13:10, CARVER, PAUL  wrote:

>  Feel free to tell me this is a bad idea and scold me for even asking,
> but please help me figure out how to do it anyway. This is for a specific
> tenant in a specific lab that was built specifically for that one tenant to
> do some experimental work that requires VMs to route and other VMs to act
> as DHCP/PXEBoot servers.
>
>
>
> I need to wrap a conditional around this line
> https://github.com/openstack/neutron/blob/master/neutron/agent/linux/iptables_firewall.py#L201and
>  this line
> https://github.com/openstack/neutron/blob/master/neutron/agent/linux/iptables_firewall.py#L241for
>  specific VM instances.
>
>
>
> The criteria could be something like pattern matching on the instance
> name, or based on a specific flavor image type. I don’t much care what the
> criteria is as long as it’s something the tenant can control. What I’m
> hoping someone can provide me with is an example line of code or two with
> which I can examine some property of the image that has been created from
> within the specific file referenced above in order to wrap if statements
> around those two lines of code so that I can prevent them from adding those
> specific iptables rules in the specific cases where my tenant needs to
> either route or respond to DHCP.
>
>
>
> Thanks
>
>
>
> --
>
> Paul Carver
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] PCI flavor objects - please review this proposal

2014-01-21 Thread Ian Wells
Hi,

We've had a few comments about creating a new table specifically for PCI
flavors versus using the already existing host aggregates table, and John
Garbutt asked me to explain the concepts involved here to see what the body
of opinion was on the subject.  My opinion, which which this account is
biased towards, is that the PCI flavor serves a specific purpose and host
aggregates are not terribly useful for any of the concepts here, but I
would like to ensure that whatever we put forward is going to be accepted
at review, so please have a read and see if you agree.  Apoogies for the
essay, but I'm trying to summarise from start to finish; I hope the
explanation makes sense if you stick with it.


The current situation - the PCI code in Havana - has one concept, a 'PCI
whitelist', which - on each compute host - is a config item that describes
the available PCI devices that VMs can use, by matching the vendor and
product IDs on the card (i.e. what sort of card it is).  The Nova flavor
then has an additional extra_specs item that is a set of matching
expressions and counts.  Free matching PCI devices will meet the
requirements of the Nova flavor; the VM will be scheduled to wherever we
can find a full set of devices.

This has advantages and issues.

I personally like the fact that the PCI whitelist lives on the compute node
- the hardware in the compute node is something very specific to that
compute node and doesn't change much, and I think that it's the right
approach to store it as config, therefore, rather than in the DB.  If a new
compute node is added, its config explains what PCI devices are available,
regardless of whether it's the same as, or different to, other nodes in the
system.  Speaking personally, managing these often repetitive configs
usually comes down to writing a bit of puppet, so while I do occasionally
get them wrong and have to fix them, it's not a massive overhead to roll
out new machines.

The biggest limitation is there are certain things with this scheme you
can't represent.  Sometimes you want to put your devices into Nova flavors,
but sometimes you want to use them in other ways.  For instance, I'd like
to do:

nova boot --nic pci-flavor=10g,net-id=XXX ...

... where I'm referring to the PCI type directly, not in a Nova flavor.

For this purpose we came up with the concept of a 'PCI flavor', a named,
user-available grouping of PCI devices.  The PCI flavor specifies one type
of device, however I'm grouping my devices together.

Also, we'd like administrators to have some control over the flavor at the
API level, even if the devices available are not API changeable.  I like to
think of this as the compute nodes reporting their resources, defined by
their configuration, and the API defining the requirements that a VM has,
based on what the administrator makes available (just as they do with Nova
flavors for an instance).

Finally, PCI devices are not all exactly the same, despite appearances.
Network devices can have specific connections; storage devices might be
connected to different SANs or have different devices attached.  You can't
do all your whitelisting and flavor defining using only the vendor and
product ID.


We've been through several design iterations, and where we stand at the
moment is that you can tag up the devices on the compute node with a config
item we've called pci_information, and you then group them using a PCI
flavor defined through the API.

pci_information lets you whitelist PCI devices as before.  This is still
important because you don't want to offer up your disk drives and control
network for users to map into their VMs.  But on top of that you can also
add extra information about PCI devices, generally information that details
things that can't be discovered about those devices but that you know about
when you're installing the machine - for instance, the network a NIC is
connected to.

PCI flavors, independently of the pci_information configuration, describe
which device groups are available to users.  So that would be the 10g
devices, the GPUs and so on.  If you want to change your offerings on the
fly you can do that, subject to the resources that the pci_information is
offering out.  You can select specific device types based on the basic PCI
information and the extra information that you put in the pci_information
configuration, which means you've got some flexibility with your
configuration.


Now we're good so far, but in recent meetings John Garbutt has been making
a strong case that host aggregates solve our problems better, and here's
where I'd like your opinions.

Firstly, they can be used to define the data that pci_information holds.
Instead of putting this in compute node configuration, you can use a host
aggregate with additional key-value information to define what devices
you're after from each compute node in the aggregate.  This will work, but
there are two issues I see with it - firstly, this information is precisely
the sor

Re: [openstack-dev] [TripleO][Neutron] PMTUd broken in gre networks

2014-01-21 Thread Ian Wells
On 21 January 2014 21:23, Robert Collins  wrote:

> In OpenStack we've got documentation[1] that advises setting a low MTU
> for tenants to workaround this issue (but the issue itself is
> unsolved) - this is a problem because PMTU is fairly important :)
> Lowering *every* tenant when one tenant somewhere hits a new tunnel
> with a lower physical packet size limit isn't an answer.
>

The right answer is probably that (a) GRE drops packets it can't take (it
used to return a spoofed PMTU exceeded, which was faintly naughty cos it's
not a router, and it breaks non-IP protocols; sounds like it fragments now,
which is probably no better), (b) we use the DHCP option to advertise the
right MTU, and (c) we require Neutron plugins to work out the MTU, which
for any encap except VLAN is (host interface MTU - header size).

At this point we probably discover that nothing respects the MTU option in
DHCP, mind you (I'm not saying it doesn't work; I'm just saying, have you
ever tried it?)

This solution is pedantically correct and I would actually like to see it
implemented, but there's probably something more pragmatic that can be done.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] Neutron should disallow /32 CIDR

2014-01-21 Thread Ian Wells
/30 is the minimum allowable mask, not /31.

On 21 January 2014 22:04, Edgar Magana  wrote:

> Wouldn't be easier just to check if:
>
> cidr is 32?
>
>  I believe it is a good idea to not allow /32 network but this is just my
> opinion
>
> Edgar
>
> From: Paul Ward 
> Reply-To: OpenStack List 
> Date: Tuesday, January 21, 2014 12:35 PM
> To: OpenStack List 
> Subject: [openstack-dev] [neutron] Neutron should disallow /32 CIDR
>
> Currently, NeutronDbPluginV2._validate_allocation_pools() does some very
> basic checking to be sure the specified subnet is valid.  One thing that's
> missing is checking for a CIDR of /32.  A subnet with one IP address in it
> is unusable as the sole IP address will be allocated to the gateway, and
> thus no IPs are left over to be allocated to VMs.
>
> The fix for this is simple.  In
> NeutronDbPluginV2._validate_allocation_pools(), we'd check for start_ip ==
> end_ip and raise an exception if that's true.
>
> I've opened lauchpad bug report 1271311 (
> https://bugs.launchpad.net/neutron/+bug/1271311) for this, but wanted to
> start a discussion here to see if others find this enhancement to be a
> valuable addition.
> ___ OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Neutron] PMTUd broken in gre networks

2014-01-22 Thread Ian Wells
On 22 January 2014 00:00, Robert Collins  wrote:

> I think dropping frames that can't be forwarded is entirely sane - at
>
a guess it's what a physical ethernet switch would do if you try to
> send a 1600 byte frame (on a non-jumbo-frame switched network) - but
> perhaps there is an actual standard for this we could follow?
>

Speaking from bitter experience, if you've misconfigured your switch so
that it's dropping packets for this reason, you will have a period of hair
tearing out to solve the problem before you work it out.  Believe me, been
there, rabbit messages that don't turn up because they're the first ones
that were too big are not a helpful diagnostic indicator.

Getting the MTU *right* on all hosts seems to be key to keeping your hair
attached to your head for a little longer.  Hence the DHCP suggestion to
set it to the right value.

> (c) we require Neutron plugins to work out the MTU, which for
> > any encap except VLAN is (host interface MTU - header size).
>
> do you mean tunnel wrap overheads? (What if a particular tunnel has a
> trailer.. crazy talk I know).
>

Yup, basically.  Unfortunately, thinking about this a bit more, you can't
easily be certain what the max packet size allowed in a GRE tunnel is going
to be, because you don't know which interface it's going over (or what's
between), but to a certain extent we can use config items to fix what we
can't discover.

-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][Neutron] PMTUd broken in gre networks

2014-01-22 Thread Ian Wells
On 22 January 2014 12:01, Robert Collins  wrote:

> > Getting the MTU *right* on all hosts seems to be key to keeping your hair
> > attached to your head for a little longer.  Hence the DHCP suggestion to
> set
> > it to the right value.
>
> I certainly think having the MTU set to the right value is important.
> I wonder if there's a standard way we can signal the MTU (e.g. in the
> virtio interface) other than DHCP. Not because DHCP is bad, but
> because that would work with statically injected network configs as
> well.
>

To the best of my knowledge, no.  And it wants to be a part of the static
config too.


And the static config, the last I checked, also sucks - we really want the
data to be in a metadata format that cloud-init absorbs, but the last I
checked there's a feature in config-drive et al that writes
/etc/network/interfaces.  Which is no use to anyone on Windows, or Redhat,
or...



> One thing we could do is encourage OS vendors to turn
> /proc/sys/net/ipv4/tcp_mtu_probing
> (http://www.ietf.org/rfc/rfc4821.txt) on in combination with dropping
> over-size frames. That should detect the actual MTU.
>

Though it's really a bit of a workaround.

Another thing would be for encapsulation failures in the switch to be
> reflected in the vNIC in the instance - export back media errors (e.g.
> babbles) so that users can diagnose problems.
>

Ditto.


> Note that IPv6 doesn't *have* a DF bit, because routers are not
> permitted to fragment - arguably encapsulating an ipv6 frame in GRE
> and then fragmenting the outer layer is a violation of that.
>

Fragmentation is fine for the tunnel, *if* the tunnel also reassembles. The
issue of fragmentation is it's horrible to implement on all your endpoints,
aiui, and used to lead to innumerable fragmentation attacks.

As for automatically determining the size - we can determine the PMTU
> between all hosts in the mesh, report those back centrally and take
> the lowest then subtract the GRE overhead.
>

If there's one path, and if there's no lower MTU on the GRE path (which can
go via routers)...  We can make an educated guess at the MTU but we can't
know it without testing each GRE tunnel as we set it up (and multiple
routes defeats even that) so I would recommend a config option as the best
of a nasty set of choices.  It can still go wrong but it's then blatantly
and obviously a config fault rather than some code guessing wrong, which
would be harder for an end user to work around.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPv6] A pair of mode keywords

2014-01-22 Thread Ian Wells
On 21 January 2014 22:46, Veiga, Anthony wrote:

>
>Hi, Sean and Xuhan:
>
>  I totally agree. This is not the ultimate solution with the assumption
> that we had to use “enable_dhcp”.
>
>  We haven’t decided the name of another parameter, however, we are open
> to any suggestions. As we mentioned during the meeting, the second
> parameter should highlight the need of addressing. If so, it should have at
> least four values:
>
>  1) off (i.e. address is assigned by external devices out of OpenStack
> control)
> 2) slaac (i.e. address is calculated based on RA sent by OpenStack dnsmasq)
> 3) dhcpv6-stateful (i.e. address is obtained from OpenStack dnsmasq acting
> as DHCPv6 stateful server)
> 4) dhcpv6-stateless (i.e. address is calculated based on RA sent from
> either OpenStack dnsmasq, or external router, and optional information is
> retrieved from OpenStack dnsmasq acting as DHCPv6 stateless server)
>
>
So how does this work if I have an external DHCPv6 server and an internal
router?  (How baroque do we have to get?)  enable_dhcp, for backward
compatibility reasons, should probably disable *both* RA and DHCPv6,
despite the name, so we can't use that to disable the DHCP server.  We
could add a *third* attribute, which I hate as an idea but does resolve the
problem - one flag for each of the servers, one for the mode the servers
are operating in, and enable_dhcp which needs to DIAF but will persist till
the API is revved.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] PCI pass-through SRIOV

2014-01-27 Thread Ian Wells
On 27 January 2014 15:58, Robert Li (baoli)  wrote:

>  Hi Folks,
>
>  In today's meeting, we discussed a scheduler issue for SRIOV. The basic
> requirement is for coexistence of the following compute nodes in a cloud:
>   -- SRIOV only compute nodes
>
  -- non-SRIOV only compute nodes
>   -- Compute nodes that can support both SRIOV and non-SRIOV ports.
> Lack of a proper name, let's call them compute nodes with hybrid NICs
> support, or simply hybrid compute nodes.
>
>  I'm not sure if it's practical in having hybrid compute nodes in a real
> cloud. But it may be useful in the lab to bench mark the performance
> differences between SRIOV, non-SRIOV, and coexistence of both.
>

I think in fact hybrid nodes would be the common case  - there's nothing
wrong with mixing virtual and physical NICs in a VM and it's been the
general case we've been discussing till now.  VMs that *only* support SRIOV
and have no soft switch sound like a complete outlier to me.  I'm assuming
that passthrough devices are a scarce resource and you wouldn't want to
waste them on a low traffic control connection, so you would always have a
softswitch on the host to take care of such cases.

I believe there *is* a use case here when  you have some, but not all,
machines that have SRIOV devices.  They will also have a softswitch of some
sort and are therefore not only 'SRIOV only' in that sense.  But the point
is that if you have a limited SRIOV resource you may want to preserve these
machines for VMs that have SRIOV requirements, and avoid mapping general
VMs with no SRIOV requirements onto them.

You can expand the problem further and avoid loading up machines with
specific PCI devices of any sort if you have a VM that doesn't need a
device of that sort, which comes down to prioritising your machines at
schedule time based on whether they're a good fit for the VM you intend to
schedule.

In any case, as discussed in the meeting, this is an optimisation and not
something we have to solve in the initial release, because:


>
> Irena brought up the idea of using host aggregate. This requires creation
> of a non-SRIOV host aggregate, and use that in the above 'nova boot'
> command. It should work.
>
>
So, while it's not the greatest solution, there's at least a way of
achieving it right now.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] PCI pass-through SRIOV on Jan. 28th

2014-01-27 Thread Ian Wells
Live migration for the first release is intended to be covered by macvtap,
in my mind - direct mapped devices have limited support in hypervisors
aiui.  It seemed we had a working theory for that, which we test out and
see if it's going to work.
-- 
Ian.


On 27 January 2014 21:38, Robert Li (baoli)  wrote:

>  Hi Folks,
>
>  Check out  1 Agenda on Jan 28th, 
> 2014.
> Please update if I have missed any thing. Let's finalize who's doing what
> tomorrow.
>
>  I'm thinking to work on the nova SRIOV items, but the live migration may
> be a stretch for the initial release.
>
>  thanks,
> Robert
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-29 Thread Ian Wells
My proposals:

On 29 January 2014 16:43, Robert Li (baoli)  wrote:

> 1. pci-flavor-attrs is configured through configuration files and will be
> available on both the controller node and the compute nodes. Can the cloud
> admin decide to add a new attribute in a running cloud? If that's
> possible, how is that done?
>

When nova-compute starts up, it requests the VIF attributes that the
schedulers need.  (You could have multiple schedulers; they could be in
disagreement; it picks the last answer.)  It returns pci_stats by the
selected combination of VIF attributes.

When nova-scheduler starts up, it sends an unsolicited cast of the
attributes.  nova-compute updates the attributes, clears its pci_stats and
recreates them.

If nova-scheduler receives pci_stats with incorrect attributes it discards
them.

(There is a row from nova-compute summarising devices for each unique
combination of vif_stats, including 'None' where no attribute is set.)

I'm assuming here that the pci_flavor_attrs are read on startup of
nova-scheduler and could be re-read and different when nova-scheduler is
reset.  There's a relatively straightforward move from here to an API for
setting it if this turns out to be useful, but firstly I think it would be
an uncommon occurrence and secondly it's not something we should implement
now.

2. PCI flavor will be defined using the attributes in pci-flavor-attrs. A
> flavor is defined with a matching expression in the form of attr1 = val11
> [| val12 Š.], [attr2 = val21 [| val22 Š]], Š. And this expression is used
> to match one or more PCI stats groups until a free PCI device is located.
> In this case, both attr1 and attr2 can have multiple values, and both
> attributes need to be satisfied. Please confirm this understanding is
> correct
>

This looks right to me as we've discussed it, but I think we'll be wanting
something that allows a top level AND.  In the above example, I can't say
an Intel NIC and a Mellanox NIC are equally OK, because I can't say (intel
+ product ID 1) AND (Mellanox + product ID 2).  I'll leave Yunhong to
decice how the details should look, though.

3. I'd like to see an example that involves multiple attributes. let's say
> pci-flavor-attrs = {gpu, net-group, device_id, product_id}. I'd like to
> know how PCI stats groups are formed on compute nodes based on that, and
> how many of PCI stats groups are there? What's the reasonable guidelines
> in defining the PCI flavors.
>

I need to write up the document for this, and it's overdue.  Leave it with
me.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] PCI pass-through SRIOV on Jan. 29th

2014-01-29 Thread Ian Wells
On 29 January 2014 23:50, Robert Kukura  wrote:

> On 01/29/2014 05:44 PM, Robert Li (baoli) wrote:
> > Hi Bob,
> >
> > that's a good find. profileid as part of IEEE 802.1br needs to be in
> > binding:profile, and can be specified by a normal user, and later
> possibly
> > the pci_flavor. Would it be wrong to say something as in below in the
> > policy.json?
> >  "create_port:binding:vnic_type": "rule:admin_or_network_owner"
> >  "create_port:binding:profile:profileid": "rule:admin_or_network_owner"
>
> Maybe, but a normal user that owns a network has no visibility into the
> underlying details (such as the providernet extension attributes.
>

I'm with Bob on this, I think - I would expect that vnic_type is passed in
by the user (user readable, and writeable, at least if the port is not
attached) and then may need to be reflected back, if present, in the
'binding' attribute via the port binding extension (unless Nova can just go
look for it - I'm not clear on what's possible here).


> Also, would a normal cloud user really know what pci_flavor to use?
> Isn't all this kind of detail hidden from a normal user within the nova
> VM flavor (or host aggregate or whatever) pre-configured by the admin?
>

Flavors are user-visible, analogous to Nova's machine flavors, they're just
not user editable.  I'm not sure where port profiles come from.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] L2 gateway as a service

2014-11-18 Thread Ian Wells
Sorry I'm a bit late to this, but that's what you get from being on
holiday...  (Which is also why there are no new MTU and VLAN specs yet, but
I swear I'll get to them.)

On 17 November 2014 01:13, Mathieu Rohon  wrote:

> Hi
>
> On Fri, Nov 14, 2014 at 6:26 PM, Armando M.  wrote:
> > Last Friday I recall we had two discussions around this topic. One in the
> > morning, which I think led to Maruti to push [1]. The way I understood
> [1]
> > was that it is an attempt at unifying [2] and [3], by choosing the API
> > approach of one and the architectural approach of the other.
> >
> > [1] https://review.openstack.org/#/c/134179/
> > [2] https://review.openstack.org/#/c/100278/
> > [3] https://review.openstack.org/#/c/93613/
> >
> > Then there was another discussion in the afternoon, but I am not 100% of
> the
> > outcome.
>
> Me neither, that's why I'd like ian, who led this discussion, to sum
> up the outcome from its point of view.
>

So, the gist of what I said is that we have three, independent, use cases:

- connecting two VMs that like to tag packets to each other (VLAN clean
networks)
- connecting many networks to a single VM (trunking ports)
- connecting the outside world to a set of virtual networks

We're discussing that last use case here.  The point I was made was that:

- there are more encaps in the world than just VLANs
- they can all be solved in the same way using an edge API
- if they are solved using an edge API, the job of describing the network
you're trying to bring in (be it switch/port/vlan, or MPLS label stack, or
l2tpv3 endpoint data) is best kept outside of Neutron's API, because
Neutron can't usefully do anything with it other than validate it and hand
it off to whatever network control code is being used.  (Note that most
encaps will likely *not* be implemented in Neutron's inbuilt control code.)

Now, the above argument says that we should keep this out of Neutron.  The
problem with that is that people are using the OVS mechanism driver and
would like a solution that works with that, implying something that's
*inside* Neutron.  For that case, it's certainly valid to consider another
means of implementation, but it wouldn't be my personal choice.  (For what
it's worth I'm looking at ODL based controller implementations, so this
isn't an issue for me personally.)

If one were to implement the code in the Neutron API, even as an extension,
I would question whether it's a sensible thing to attempt before the RPC
server/REST server split is done, since it also extends the API between
them.

> All this churn makes me believe that we probably just need to stop
> > pretending we can achieve any sort of consensus on the approach and let
> the
> > different alternatives develop independently, assumed they can all
> develop
> > independently, and then let natural evolution take its course :)
>
> I tend to agree, but I think that one of the reason why we are looking
> for a consensus, is because API evolutions proposed through
> Neutron-spec are rejected by core-dev, because they rely on external
> components (sdn controller, proprietary hardware...) or they are not a
> high priority for neutron core-dev.
> By finding a consensus, we show that several players are interested in
> such an API, and it helps to convince core-dev that this use-case, and
> its API, is missing in neutron.
>

There are lots of players interested in an API, that much is clear, and all
the more so if you consider that this feature has strong analogies with use
cases such as switch port exposure and MPLS.  The problem is that it's
clearly a fairly complex API with some variety of ways to implement it, and
both of these things work against its acceptance.  Additionally, per the
above discussion, I would say it's not essential for it to be core Neutron
functionality.

Now, if there is room for easily propose new API in Neutron, It make
> sense to leave new API appear and evolve, and then " let natural
> evolution take its course ", as you said.
>

Natural selection works poorly on APIs because once they exist they're hard
to change and/or retire, due to backward compatibility requirements.


> To me, this is in the scope of the "advanced services" project.
>

Advanced services or no, the point I was making is that this is not
something that should fit under the Neutron API endpoint.  Since it's not
really related to any of the other advanced services it's not particularly
necessary that it fit under the Advanced Services API endpoint either,
although it could.  My Unix design leanings say to me that if things are
not related they shouldn't be combined, though - the simplest thing that
does the job is the right answer.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc][neutron] Proposal to split Neutron into separate repositories

2014-11-18 Thread Ian Wells
On 18 November 2014 15:33, Mark McClain  wrote:

>
> > On Nov 18, 2014, at 5:45 PM, Doug Hellmann 
> wrote:
> >
> > There would not be a service or REST API associated with the Advanced
> Services code base? Would the REST API to talk to those services be part of
> the Neutron repository?
> >
> > Doug
>
> We had considered having a standalone REST service, but the Advance
> Services need a level of integration with Neutron that is more than REST
> and RPC can provide.


Actually, I don't agree with this.  I'm fairly sure that a combination of
REST and notifications *could* provide advanced services with the
information they require, so they could be a standalone, and optional,
service with its own API endpoint.  But the issue is that the interface
Neutron has today is clearly not sufficient for the needs of the advanced
services we have, and we would need to add APIs to facilitate that split
when we wanted to take this further.

-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] DB: transaction isolation and related questions

2014-11-19 Thread Ian Wells
On 19 November 2014 11:58, Jay Pipes  wrote:

> Some code paths that used locking in the past were rewritten to retry
>
>> the operation if they detect that an object was modified concurrently.
>> The problem here is that all DB operations (CRUD) are performed in the
>> scope of some transaction that makes complex operations to be executed
>> in atomic manner.
>>
>
> Yes. The root of the problem in Neutron is that the session object is
> passed through all of the various plugin methods and the
> session.begin(subtransactions=True) is used all over the place, when in
> reality many things should not need to be done in long-lived transactional
> containers.
>

I think the issue is one of design, and it's possible what we discussed at
the summit may address some of this.

At the moment, Neutron's a bit confused about what it is.  Some plugins
treat a call to Neutron as the period of time in which an action should be
completed - the 'atomicity' thing.  This is not really compatible with a
distributed system and it's certainly not compatible with the principle of
eventual consistency that Openstack is supposed to follow.  Some plugins
treat the call as a change to desired networking state, and the action on
the network is performed asynchronously to bring the network state into
alignment with the state of the database.  (Many plugins do a bit of both.)

When you have a plugin that's decided to be synchronous, then there are
cases where the DB lock is held for a technically indefinite period of
time.  This is basically broken.

What we said at the summit is that we should move to an entirely async
model for the API, which in turn gets us to the 'desired state' model for
the DB.  DB writes would take one of two forms:

- An API call has requested that the data be updated, which it can do
immediately - the DB transaction takes as long as it takes to write the DB
consistently, and can hold locks on referenced rows to main consistency
providing the whole operation remains brief
- A network change has completed and the plugin wants to update an object's
state - again, the DB transaction contains only DB ops and nothing else and
should be quick.

Now, if we moved to that model, DB locks would be very very brief for the
sort of queries we'd need to do.  Setting aside the joys of Galera (and I
believe we identified that using one Galera node and doing all writes
through it worked just fine, though we could probably distribute read-only
transactions across all of them in the future), would there be any need for
transaction retries in that scenario?  I would have thought that DB locking
would be just fine as long as there was nothing but DB operations for the
period a transaction was open, and thus significantly changing the DB
lock/retry model now is a waste of time because it's a problem that will go
away.

Does that theory hold water?

-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] L2 gateway as a service

2014-11-20 Thread Ian Wells
On 19 November 2014 17:19, Sukhdev Kapur  wrote:

> Folks,
>
> Like Ian, I am jumping in this very late as well - as I decided to travel
> Europe after the summit, just returned back and  catching up :-):-)
>
> I have noticed that this thread has gotten fairly convoluted and painful
> to read.
>
> I think Armando summed it up well in the beginning of the thread. There
> are basically three written proposals (listed in Armando's email - I pasted
> them again here).
>
> [1] https://review.openstack.org/#/c/134179/
> [2] https://review.openstack.org/#/c/100278/
> [3] https://review.openstack.org/#/c/93613/
>
> On this thread I see that the authors of first two proposals have already
> agreed to consolidate and work together. This leaves with two proposals.
> Both Ian and I were involved with the third proposal [3] and have
> reasonable idea about it. IMO, the use cases addressed by the third
> proposal are very similar to use cases addressed by proposal [1] and [2]. I
> can volunteer to  follow up with Racha and Stephen from Ericsson to see if
> their use case will be covered with the new combined proposal. If yes, we
> have one converged proposal. If no, then we modify the proposal to
> accommodate their use case as well. Regardless, I will ask them to review
> and post their comments on [1].
>
> Having said that, this covers what we discussed during the morning session
> on Friday in Paris. Now, comes the second part which Ian brought up in the
> afternoon session on Friday.
> My initial reaction was, when heard his use case, that this new
> proposal/API should cover that use case as well (I am being bit optimistic
> here :-)). If not, rather than going into the nitty gritty details of the
> use case, let's see what modification is required to the proposed API to
> accommodate Ian's use case and adjust it accordingly.
>

As far as I can see, the question of whether you mark a network as 'edge'
and therefore bridged to something you don't know about (my proposal) or
whether you attach a block to it that, behinds the scenes, bridges to
something you don't know about (Maruti's, if you take out all of the
details of *what* is being attached to from the API) are basically as good
as each other.

My API parallels the way that provider networks are used, because that's
what I had in mind at the time; Maruti's uses a block rather than marking
the network, and the only real difference that makes is that (a) you can
attach many networks to one block (which doesn't really seem to bring
anything special) and (b) uses a port to connect to the network (which is
not massively helpful because there's nothing sensible you can put on the
port; there may be many things behind the gateway).  At this point it
becomes a completely religious argument about which is better.  I still
prefer mine, from gut feel, but they are almost exactly equivalent at this
point.

Taking your statement above of 'let's take out the switch port stuff' then
Maruti's use case would need to explain where that data goes. The point I
made is that it becomes a Sisyphean task (endless and not useful) to
introduce a data model and API to introduce this into Neutron via an API
and that's what I didn't want to do.  Can we address that question?

-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Edge-VPN and Edge-Id

2014-11-29 Thread Ian Wells
On 27 November 2014 at 12:11, Mohammad Hanif  wrote:

>  Folks,
>
>  Recently, as part of the L2 gateway thread, there was some discussion on
> BGP/MPLS/Edge VPN and how to bridge any overlay networks to the neutron
> network.  Just to update everyone in the community, Ian and I have
> separately submitted specs which make an attempt to address the cloud edge
> connectivity.  Below are the links describing it:
>
>  Edge-Id: https://review.openstack.org/#/c/136555/
> Edge-VPN: https://review.openstack.org/#/c/136929 .  This is a resubmit
> of https://review.openstack.org/#/c/101043/ for the kilo release under
> the “Edge VPN” title.  “Inter-datacenter connectivity orchestration” was
> just too long and just too generic of a title to continue discussing about
> :-(
>

Per the summit discussions, the difference is one of approach.

The Edge-VPN case addresses MPLS attachments via a set of APIs to be added
to the core of Neutron.  Those APIs are all new objects and don't really
change the existing API so much as extend it.  There's talk of making it a
'service plugin' but if it were me I would simply argue for a new service
endpoint.  Keystone's good at service discovery, endpoints are pretty easy
to create and I don't see why you need to fold it in.

The edge-id case says 'Neutron doesn't really care about what happens
outside of the cloud at this point in time, there are loads of different
edge termination types, and so the best solution would be one where the
description of the actual edge datamodel does not make its way into core
Neutron'.  This avoids us folding in the information about edges in the
same way that we folded in the information about services and later
regretted it.  The notable downside is that this method would work with an
external network controller such as ODL, but probably will never make its
way into the inbuilt OVS/ML2 network controller if it's implemented as
described (explicitly *because* it's designed in such a way as to keep the
functionality out of core Neutron).  Basically, it's not completely
incompatible with the datamodel that the Edge-VPN change describes, but
pushes that datamodel out to an independent service which would have its
own service endpoint to avoid complicating the Neutron API with information
that, likely, Neutron itself could probably only ever validate, store and
pass on to an external controller.

Also, the Edge-VPN case is specified for only MPLS VPNs, and doesn't
consider other edge cases such as Kevin's switch-based edges in
https://review.openstack.org/#/c/87825/ .  The edge-ID one is agnostic of
termination types (since it absolves Neutron of all of that responsibility)
and would leave the edge type description to the determination of an
external service.

Obviously, I'm biased, having written the competing spec; but I prefer the
simple change that pushes complexity out of the core to the larger but
comprehensive change that keeps it as a part of Neutron.  And in fact if
you look at the two specs with that in mind, they do go together; the
Edge-VPN model is almost precisely what you need to describe an endpoint
that you could then associate with an Edge-ID to attach it to Neutron.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Edge-VPN and Edge-Id

2014-12-01 Thread Ian Wells
On 1 December 2014 at 04:43, Mathieu Rohon  wrote:

> This is not entirely true, as soon as a reference implementation,
> based on existing Neutron components (L2agent/L3agent...) can exist.
>

The specific thing I was saying is that that's harder with an edge-id
mechanism than one incorporated into Neutron, because the point of the
edge-id proposal is to make tunnelling explicitly *not* a responsibility of
Neutron.  So how do you get the agents to terminate tunnels when Neutron
doesn't know anything about tunnels and the agents are a part of Neutron?
Conversely, you can add a mechanism to the OVS subsystem so that you can
tap an L2 bridge into a network, which would probably be more
straightforward.

But even if it were true, this could at least give a standardized API
> to Operators that want to connect their Neutron networks to external
> VPNs, without coupling their cloud solution with whatever SDN
> controller. And to me, this is the main issue that we want to solve by
> proposing some neutron specs.
>

So the issue I worry about here is that if we start down the path of adding
the MPLS datamodels to Neutron we have to add Kevin's switch control work.
And the L2VPN descriptions for GRE, L2TPv3, VxLAN, and EVPN.  And whatever
else comes along.  And we get back to 'that's a lot of big changes that
aren't interesting to 90% of Neutron users' - difficult to get in and a lot
of overhead to maintain for the majority of Neutron developers who don't
want or need it.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Edge-VPN and Edge-Id

2014-12-01 Thread Ian Wells
On 1 December 2014 at 09:01, Mathieu Rohon  wrote:
This is an alternative that would say : you want an advanced service

> for your VM, please stretch your l2 network to this external
> component, that is driven by an external controller, and make your
> traffic goes to this component to take benefit of this advanced
> service. This is a valid alternative of course, but distributing the
> service directly to each compute node is much more valuable, ASA it is
> doable.
>

Right, so a lot rides on the interpretation of 'advanced service' here, and
also 'attachment'.

Firstly, the difference between this and the 'advanced services' (including
the L3 functionality, though it's not generally considered an 'advanced
service') is that advanced services that exist today attach via an
addressed port.  This bridges in.  That's quite a signifcant difference,
which is to an extent why I've avoided lumping the two together and haven't
called this an advanced service itself, although it's clearly similar.

Secondly, 'attachment' has historically meant a connection to that port.
But in DVRs, it can be a multipoint connection to the network - manifested
on several hosts - all through the auspices of a single port.  In the
edge-id proposal you'll note that I've carefully avoided defining what an
attachment is, largely because I have a natural tendency to want to see the
interface at the API level before I worry about the backend, I admit.  Your
point about distributed services is well taken, and I think would be
addressed by one of these distributed attachment types.

> So the issue I worry about here is that if we start down the path of
> adding
> > the MPLS datamodels to Neutron we have to add Kevin's switch control
> work.
> > And the L2VPN descriptions for GRE, L2TPv3, VxLAN, and EVPN.  And
> whatever
> > else comes along.  And we get back to 'that's a lot of big changes that
> > aren't interesting to 90% of Neutron users' - difficult to get in and a
> lot
> > of overhead to maintain for the majority of Neutron developers who don't
> > want or need it.
>
> This shouldn't be a lot of big changes, once interfaces between
> advanced services and neutron core services will be cleaner.


Well, incorporating a lot of models into Neutron *is*, clearly, quite a bit
of change, for starters.

The edge-id concept says 'the data models live outside neutron in a
separate system' and there, yes, absolutely, this proposes a clean model
for edge/Neutron separation in the way you're alluding to with advanced
services.  I think your primary complaint is that it doesn't define that
interface for an OVS driver based system.

The edge-vpn concept says 'the data models exists within neutron in an
integrated fashion' and, if you agree that separation is the way to go,
this seems to me to be exactly the wrong approach to be using.  It's the
way advanced services are working - for now - but that's because we believe
it would be hard to pull them out because the interfaces between service
and Neutron don't currently exist.  The argument for this seems to be 'we
should incorporate it so that we can pull it out at the same time as
advanced services' but it feels like that's making more work now so that we
can do even more work in the future.

For an entirely new thing that is in many respects not like a service I
would prefer not to integrate it in the first place, thus skipping over
that whole question of how to break it out in the future.  It's an open
question whether the work to make it play nicely with the existing ML2
model is worth the effort or not, because I didn't study that.  It's not
relevant to my needs, but if you're interested then we could talk about
what other specs would be required.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Edge-VPN and Edge-Id

2014-12-04 Thread Ian Wells
On 1 December 2014 at 21:26, Mohammad Hanif  wrote:

>  I hope we all understand how edge VPN works and what interactions are
> introduced as part of this spec.  I see references to neutron-network
> mapping to the tunnel which is not at all case and the edge-VPN spec
> doesn’t propose it.  At a very high level, there are two main concepts:
>
>1. Creation of a per tenant VPN “service” on a PE (physical router)
>which has a connectivity to other PEs using some tunnel (not known to
>tenant or tenant-facing).  An attachment circuit for this VPN service is
>also created which carries a “list" of tenant networks (the list is
>initially empty) .
>2. Tenant “updates” the list of tenant networks in the attachment
>circuit which essentially allows the VPN “service” to add or remove the
>network from being part of that VPN.
>
> A service plugin implements what is described in (1) and provides an API
> which is called by what is described in (2).  The Neutron driver only
> “updates” the attachment circuit using an API (attachment circuit is also
> part of the service plugin’ data model).   I don’t see where we are
> introducing large data model changes to Neutron?
>

Well, you have attachment types, tunnels, and so on - these are all objects
with data models, and your spec is on Neutron so I'm assuming you plan on
putting them into the Neutron database - where they are, for ever more, a
Neutron maintenance overhead both on the dev side and also on the ops side,
specifically at upgrade.

How else one introduces a network service in OpenStack if it is not through
> a service plugin?
>

Again, I've missed something here, so can you define 'service plugin' for
me?  How similar is it to a Neutron extension - which we agreed at the
summit we should take pains to avoid, per Salvatore's session?

And the answer to that is to stop talking about plugins or trying to
integrate this into the Neutron API or the Neutron DB, and make it an
independent service with a small and well defined interaction with Neutron,
which is what the edge-id proposal suggests.  If we do incorporate it into
Neutron then there are probably 90% of Openstack users and developers who
don't want or need it but care a great deal if it breaks the tests.  If it
isn't in Neutron they simply don't install it.


> As we can see, tenant needs to communicate (explicit or otherwise) to
> add/remove its networks to/from the VPN.  There has to be a channel and the
> APIs to achieve this.
>

Agreed.  I'm suggesting it should be a separate service endpoint.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] Boundary between Nova and Neutron involvement in network setup?

2014-12-04 Thread Ian Wells
On 4 December 2014 at 08:00, Neil Jerram  wrote:

> Kevin Benton  writes:
> I was actually floating a slightly more radical option than that: the
> idea that there is a VIF type (VIF_TYPE_NOOP) for which Nova does
> absolutely _nothing_, not even create the TAP device.
>

Nova always does something, and that something amounts to 'attaches the VM
to where it believes the endpoint to be'.  Effectively you should view the
VIF type as the form that's decided on during negotiation between Neutron
and Nova - Neutron says 'I will do this much and you have to take it from
there'.  (In fact, I would prefer that it was *more* of a negotiation, in
the sense that the hypervisor driver had a say to Neutron of what VIF types
it supported and preferred, and Neutron could choose from a selection, but
I don't think it adds much value at the moment and I didn't want to propose
a change just for the sake of it.)  I think you're just proposing that the
hypervisor driver should do less of the grunt work of connection.

Also, libvirt is not the only hypervisor driver and I've found it
interesting to nose through the others for background reading, even if
you're not using them much.

For example, suppose someone came along and wanted to implement a new
> OVS-like networking infrastructure?  In principle could they do that
> without having to enhance the Nova VIF driver code?  I think at the
> moment they couldn't, but that they would be able to if VIF_TYPE_NOOP
> (or possibly VIF_TYPE_TAP) was already in place.  In principle I think
> it would then be possible for the new implementation to specify
> VIF_TYPE_NOOP to Nova, and to provide a Neutron agent that does the kind
> of configuration and vSwitch plugging that you've described above.
>

At the moment, the rule is that *if* you create a new type of
infrastructure then *at that point* you create your new VIF plugging type
to support it - vhostuser being a fine example, having been rejected on the
grounds that it was, at the end of Juno, speculative.  I'm not sure I
particularly like this approach but that's how things are at the moment -
largely down to not wanting to add code that isn;t used and therefore
tested.

None of this is criticism of your proposal, which sounds reasonable; I was
just trying to provide a bit of context.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Edge-VPN and Edge-Id

2014-12-05 Thread Ian Wells
I have no problem with standardising the API, and I would suggest that a
service that provided nothing but endpoints could be begun as the next
phase of 'advanced services' broken out projects to standardise that API.
I just don't want it in Neutron itself.

On 5 December 2014 at 00:33, Erik Moe  wrote:

>
>
> One reason for trying to get an more complete API into Neutron is to have
> a standardized API. So users know what to expect and for providers to have
> something to comply to. Do you suggest we bring this standardization work
> to some other forum, OPNFV for example? Neutron provides low level hooks
> and the rest is defined elsewhere. Maybe this could work, but there would
> probably be other issues if the actual implementation is not on the edge or
> outside Neutron.
>
>
>
> /Erik
>
>
>
>
>
> *From:* Ian Wells [mailto:ijw.ubu...@cack.org.uk]
> *Sent:* den 4 december 2014 20:19
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [Neutron] Edge-VPN and Edge-Id
>
>
>
> On 1 December 2014 at 21:26, Mohammad Hanif  wrote:
>
>   I hope we all understand how edge VPN works and what interactions are
> introduced as part of this spec.  I see references to neutron-network
> mapping to the tunnel which is not at all case and the edge-VPN spec
> doesn’t propose it.  At a very high level, there are two main concepts:
>
>1. Creation of a per tenant VPN “service” on a PE (physical router)
>which has a connectivity to other PEs using some tunnel (not known to
>tenant or tenant-facing).  An attachment circuit for this VPN service is
>also created which carries a “list" of tenant networks (the list is
>initially empty) .
>2. Tenant “updates” the list of tenant networks in the attachment
>circuit which essentially allows the VPN “service” to add or remove the
>network from being part of that VPN.
>
>  A service plugin implements what is described in (1) and provides an API
> which is called by what is described in (2).  The Neutron driver only
> “updates” the attachment circuit using an API (attachment circuit is also
> part of the service plugin’ data model).   I don’t see where we are
> introducing large data model changes to Neutron?
>
>
>
> Well, you have attachment types, tunnels, and so on - these are all
> objects with data models, and your spec is on Neutron so I'm assuming you
> plan on putting them into the Neutron database - where they are, for ever
> more, a Neutron maintenance overhead both on the dev side and also on the
> ops side, specifically at upgrade.
>
>
>
>   How else one introduces a network service in OpenStack if it is not
> through a service plugin?
>
>
>
> Again, I've missed something here, so can you define 'service plugin' for
> me?  How similar is it to a Neutron extension - which we agreed at the
> summit we should take pains to avoid, per Salvatore's session?
>
> And the answer to that is to stop talking about plugins or trying to
> integrate this into the Neutron API or the Neutron DB, and make it an
> independent service with a small and well defined interaction with Neutron,
> which is what the edge-id proposal suggests.  If we do incorporate it into
> Neutron then there are probably 90% of Openstack users and developers who
> don't want or need it but care a great deal if it breaks the tests.  If it
> isn't in Neutron they simply don't install it.
>
>
>
>   As we can see, tenant needs to communicate (explicit or otherwise) to
> add/remove its networks to/from the VPN.  There has to be a channel and the
> APIs to achieve this.
>
>
>
> Agreed.  I'm suggesting it should be a separate service endpoint.
> --
>
> Ian.
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Neutron] out-of-tree plugin for Mech driver/L2 and vif_driver

2014-12-10 Thread Ian Wells
On 10 December 2014 at 01:31, Daniel P. Berrange 
wrote:

>
> So the problem of Nova review bandwidth is a constant problem across all
> areas of the code. We need to solve this problem for the team as a whole
> in a much broader fashion than just for people writing VIF drivers. The
> VIF drivers are really small pieces of code that should be straightforward
> to review & get merged in any release cycle in which they are proposed.
> I think we need to make sure that we focus our energy on doing this and
> not ignoring the problem by breaking stuff off out of tree.
>

The problem is that we effectively prevent running an out of tree Neutron
driver (which *is* perfectly legitimate) if it uses a VIF plugging
mechanism that isn't in Nova, as we can't use out of tree code and we won't
accept in code ones for out of tree drivers.  This will get more confusing
as *all* of the Neutron drivers and plugins move out of the tree, as that
constraint becomes essentially arbitrary.

Your issue is one of testing.  Is there any way we could set up a better
testing framework for VIF drivers where Nova interacts with something to
test the plugging mechanism actually passes traffic?  I don't believe
there's any specific limitation on it being *Neutron* that uses the
plugging interaction.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] bridge name generator for vif plugging

2014-12-15 Thread Ian Wells
Hey Ryota,

A better way of describing it would be that the bridge name is, at present,
generated in *both* Nova *and* Neutron, and the VIF type semantics define
how it's calculated.  I think you're right that in both cases it would make
more sense for Neutron to tell Nova what the connection endpoint was going
to be rather than have Nova calculate it independently.  I'm not sure that
that necessarily requires two blueprints, and you don't have a spec there
at the moment, which is a problem because the Neutron spec deadline is upon
us, but the idea's a good one.  (You might get away without a Neutron spec,
since the change to Neutron to add the information should be small and
backward compatible, but that's not something I can make judgement on.)

If we changed this, then your options are to make new plugging types where
the name is exchanged rather than calculated or use the old plugging types
and provide (from Neutron) and use if provided (in Nova) the name.  You'd
need to think carefully about upgrade scenarios to make sure that changing
version on either side is going to work.

VIF_TYPE_TAP, while somewhat different in its focus, is also moving in the
same direction of having a more logical interface between Nova and
Neutron.  That plus this points that we should have VIF_TYPE_TAP handing
over the TAP device name to use, and similarly create a VIF_TYPE_BRIDGE
(passing bridge name) and slightly modify VIF_TYPE_VHOSTUSER before it gets
established (to add the socket name).

Did you have any thoughts on how the metadata should be stored on the port?
-- 
Ian.


On 15 December 2014 at 10:01, Ryota Mibu  wrote:
>
> Hi all,
>
>
> We are proposing a change to move bridge name generator (creating bridge
> name from net-id or reading integration bridge name from nova.conf) from
> Nova to Neutron. The followings are BPs in Nova and Neutron.
>
> https://blueprints.launchpad.net/nova/+spec/neutron-vif-bridge-details
> https://blueprints.launchpad.net/neutron/+spec/vif-plugging-metadata
>
> I'd like to get your comments on this change whether this is relevant
> direction. I found related comment in Nova code [3] and guess these
> discussion had in context of vif-plugging and port-binding, but I'm not
> sure there was consensus about bridge name.
>
>
> https://github.com/openstack/nova/blob/2014.2/nova/network/neutronv2/api.py#L1298-1299
>
>
> Thanks,
> Ryota
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][neutron] bridge name generator for vif plugging

2014-12-15 Thread Ian Wells
Let me write a spec and see what you both think.  I have a couple of things
we could address here and while it's a bit late it wouldn't be a dramatic
thing to fix and it might be acceptable.

On 15 December 2014 at 11:28, Daniel P. Berrange 
wrote:
>
> On Mon, Dec 15, 2014 at 11:15:56AM +0100, Ian Wells wrote:
> > Hey Ryota,
> >
> > A better way of describing it would be that the bridge name is, at
> present,
> > generated in *both* Nova *and* Neutron, and the VIF type semantics define
> > how it's calculated.  I think you're right that in both cases it would
> make
> > more sense for Neutron to tell Nova what the connection endpoint was
> going
> > to be rather than have Nova calculate it independently.  I'm not sure
> that
> > that necessarily requires two blueprints, and you don't have a spec there
> > at the moment, which is a problem because the Neutron spec deadline is
> upon
> > us, but the idea's a good one.  (You might get away without a Neutron
> spec,
> > since the change to Neutron to add the information should be small and
> > backward compatible, but that's not something I can make judgement on.)
>
> Yep, the fact that both Nova & Neutron calculat the bridge name is a
> historical accident. Originally Nova did it, because nova-network was
> the only solution. Then Neutron did it too, so it matched what Nova
> was doing. Clearly if we had Neutron right from the start, then it
> would have been Neutrons responsibility todo this. Nothing in Nova
> cares what the names are from a functional POV - it just needs to
> be told what to use.
>
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
> :|
> |: http://libvirt.org  -o- http://virt-manager.org
> :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/
> :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc
> :|
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][L2-Gateway] Meetings announcement

2015-01-03 Thread Ian Wells
Sukhdev,

Since the term is quite broad and has meant many things in the past, can
you define what you're thinking of when you say 'L2 gateway'?

Cheers,
-- 
Ian.

On 2 January 2015 at 18:28, Sukhdev Kapur  wrote:

> Hi all,
>
> HAPPY NEW YEAR.
>
> Starting Monday (Jan 5th, 2015) we will be kicking of bi-weekly meetings
> for L2 Gateway discussions.
>
> We are hoping to come up with an initial version of L2 Gateway API in Kilo
> cycle. The intent of these bi-weekly meetings is to discuss issues related
> to L2 Gateway API.
>
> Anybody interested in this topic is invited to join us in these meetings
> and share your wisdom with the similar minded members.
>
> Here is the details of these meetings:
>
> https://wiki.openstack.org/wiki/Meetings#Networking_L2_Gateway_meeting
>
> I have put together a wiki for this project. Next week is the initial
> meeting and the agenda is pretty much open. We will give introduction of
> the members of the team as well the progress made so far on this topic. If
> you would like to add anything to the agenda, feel free to update the
> agenda at the following wiki:
>
> https://wiki.openstack.org/wiki/Meetings/L2Gateway
>
> Look forward to on the IRC.
>
> -Sukhdev
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][neutron]VIF_VHOSTUSER

2015-01-09 Thread Ian Wells
Once more, I'd like to revisit the VIF_VHOSTUSER discussion [1].  I still
think this is worth getting into Nova's libvirt driver - specifically
because there's actually no way to distribute this as an extension; since
we removed the plugin mechanism for VIF drivers, it absolutely requires a
code change in the libvirt driver.  This means that there's no graceful way
of distributing an aftermarket VHOSTUSER driver for libvirt.

The standing counterargument to adding it is that nothing in the upstream
or 3rd party CI would currently test the VIF_VHOSTUSER code.  I'm not sure
that's a showstopper, given the code is zero risk to anyone when it's not
being used, and clearly is going to be experimental when it's enabled.  So,
Nova cores, would it be possible to incorporate this without a
corresponding driver in base Neutron?

Cheers,
-- 
Ian.

[1] https://review.openstack.org/#/c/96140/
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Neutron] Thoughts on the nova<->neutron interface

2015-01-25 Thread Ian Wells
Lots of open questions in here, because I think we need a long conversation
on the subject.

On 23 January 2015 at 15:51, Kevin Benton  wrote:

> It seems like a change to using internal RPC interfaces would be pretty
> unstable at this point.
>


> Can we start by identifying the shortcomings of the HTTP interface and see
> if we can address them before making the jump to using an interface which
> has been internal to Neutron so far?
>

I think the protocol being used is a distraction from the actual
shortcomings.

Firstly, you'd have to explain to me why HTTP is so much slower than RPC.
If HTTP is incredibly slow, can be be sped up?  If RPC is moving the data
around using the same calls, what changes?  Secondly, the problem seems
more that we make too many roundtrips - which would be the same over RPC -
and if that's true, perhaps we should be doing bulk operations - which is
not transport-specific.


I absolutely do agree that Neutron should be doing more of the work, and
Nova less, when it comes to port binding.  (And, in fact, I'd like that we
stopped considering it 'Nova-Neutron' port binding, since in theory another
service attaching stuff to the network could request a port be bound; it
just happens at the moment that it's always Nova.)

One other problem, not yet raised,  is that Nova doesn't express its needs
when it asks for a port to be bound, and this is actually becoming a
problem for me right now.  At the moment, Neutron knows, almost
psychically, what binding type Nova will accept, and hands it over; Nova
then deals with whatever binding type it receives (optimisitically
expecting it's one it will support, and getting shirty if it isn't).  The
problem I'm seeing at the moment, and other people have mentioned, is that
certain forwarders can only bind a vhostuser port to a VM if the VM itself
has hugepages enabled.  They could fall back to another binding type but at
the moment that isn't an option: Nova doesn't tell Neutron anything about
what it supports, so there's no data on which to choose.  It should be
saying 'I will take these binding types in this preference order'.  I
think, in fact, that asking Neutron for bindings of a certain preference
type order, would give us much more flexibility - like, for instance, not
having to know exactly which binding type to deliver to which compute node
in multi-hypervisor environments, where at the moment the choice is made in
Neutron.

I scanned through the etherpad and I really like Salvatore's idea of adding
> a service plugin to Neutron that is designed specifically for interacting
> with Nova. All of the Nova notification interactions can be handled there
> and we can add new API components designed for Nova's use (e.g. syncing
> data, etc). Does anyone have any objections to that approach?
>

I think we should be leaning the other way, actually - working out what a
generic service - think a container management service, or an edge network
service - would want to ask when it wanted to connect to a virtual network,
and making an Neutron interface that supports that properly *without* being
tailored to Nova.  The requirements are similar in all cases, so it's not
clear that a generic interface would be any more complex.

Notifications on data changes in Neutron to prevent orphaning is another
example of a repeating pattern.  It's probably the same for any service
that binds to Neutron, but right now Neutron has Nova-specific code in it.
Broadening the scope, it's also likely the same in Cinder, and in fact it's
also pretty similar to the problem you get when you delete a project in
Keystone and all your resources get orphaned.  Is a Nova-Neutron specific
solution the right thing to do?
-- 
Ian.
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-dev][Neutron] Port Mirroring Extension in Neutron

2014-05-15 Thread Ian Wells
There's a sheet down there.  There's actually something on at the Neutron
pod at that time, but you might as well meet up then and see who's
interested (I certainly am).


On 15 May 2014 09:31, Vinay Yadhav  wrote:

> Hi,
>
> Cool, I propose today at 2:20 PM near the neutron pod. Can you tell me how
> do i sign up for it.
>
>
> Cheers,
>  main(i){putchar((5852758>>((i-1)/2)*8)-!(1&i)*'\r')^89&&main(++i);}
>
>
> On Thu, May 15, 2014 at 12:03 PM, Kanzhe Jiang  > wrote:
>
>> Hi Vinay,
>>
>> I am interested. You could sign up a slot for a Network POD discussion.
>>
>> Thanks,
>> Kanzhe
>>
>>
>>  On Thu, May 15, 2014 at 7:13 AM, Vinay Yadhav wrote:
>>
>>>  Hi,
>>>
>>> I am Vinay, working with Ericsson.
>>>
>>> I am interested in the following blueprint regarding port mirroring
>>> extension in neutron:
>>> https://blueprints.launchpad.net/neutron/+spec/port-mirroring
>>>
>>> I am close to finishing an implementation for this extension in OVS
>>> plugin and would be submitting a neutron spec related to the blueprint soon.
>>>
>>> I would like to know other who are also interested in introducing Port
>>> Mirroring extension in neutron.
>>>
>>> It would be great if we can discuss and collaborate in development and
>>> testing this extension
>>>
>>> I am currently attending the OpenStack Summit in Atlanta, so if any of
>>> you are interested in the blueprint, we can meet here in the summit and
>>> discuss how to proceed with the blueprint.
>>>
>>> Cheers,
>>> main(i){putchar((5852758>>((i-1)/2)*8)-!(1&i)*'\r')^89&&main(++i);}
>>>
>>> ___
>>> OpenStack-dev mailing list
>>> OpenStack-dev@lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>
>>
>> --
>> Kanzhe Jiang
>> MTS at BigSwitch
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][IPv6] Privacy extension

2014-05-15 Thread Ian Wells
I was just about to respond to that in the session when we ran out of
time.  I would vote for simply insisting that VMs run without the privacy
extension enabled, and only permitting the expected ipv6 address based on
MAC.  Its primary purpose is to conceal your MAC address so that your IP
address can't be used to track you, as I understand it, and I don't think
that's as relevant in a cloud environment and where the MAC addresses are
basically fake.  Someone interested in desktop virtualisation with
Openstack may wish to contradict me...
-- 
Ian.


On 15 May 2014 09:30, Shixiong Shang  wrote:

> Hi, guys:
>
> Nice to meet with all of you in the technical session and design session.
> I mentioned the challenge of privacy extension in the meeting, but would
> like to hear your opinions of how to address the problem. If you have any
> comments or suggestions, please let me know. I will create a BP for this
> problem.
>
> Thanks!
>
> Shixiong
>
>
>  *Shixiong Shang*
>
>  *!--- Stay Hungry, Stay Foolish ---!*
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][NFV] NFV BoF at design summit

2014-05-19 Thread Ian Wells
I think the Service VM discussion resolved itself in a way that reduces the
problem to a form of NFV - there are standing issues using VMs for
services, orchestration is probably not a responsibility that lies in
Neutron, and as such the importance is in identifying the problems with the
plumbing features of Neutron that cause implementation difficulties.  The
end result will be that VMs implementing tenant services and implementing
NFV should be much the same, with the addition of offering a multitenant
interface to Openstack users on the tenant service VM case.

Geoff Arnold is dealing with the collating of information from people that
have made the attempt to implement service VMs.  The problem areas should
fall out of his effort.  I also suspect that the key points of NFV that
cause problems (for instance, dealing with VLANs and trunking) will
actually appear quite high up the service VM list as well.
-- 
Ian.



On 18 May 2014 20:01, Steve Gordon  wrote:

> - Original Message -
> > From: "Sumit Naiksatam" 
> >
> > Thanks for initiating this conversation. Unfortunately I was not able
> > to participate during the summit on account of overlapping sessions.
> > As has been identified in the wiki and etherpad, there seem to be
> > obvious/potential touch points with the advanced services' discussion
> > we are having in Neutron [1]. Our sub team, and I, will track and
> > participate in this NFV discussion. Needless to say, we are definitely
> > very keen to understand and accommodate the NFV requirements.
> >
> > Thanks,
> > ~Sumit.
> > [1] https://wiki.openstack.org/wiki/Neutron/AdvancedServices
>
> Yes, there are definitely touch points across a number of different
> existing projects and sub teams. The consensus seemed to be that while a
> lot of people in the community have been working in independent groups on
> advancing the support for NFV use cases in OpenStack we haven't necessarily
> been coordinating our efforts effectively. Hopefully having a cross-project
> sub team will allow us to do this.
>
> In the BoF sessions we started adding relevant *existing* blueprints on
> the wiki page, we probably need to come up with a more robust way to track
> these from launchpad :). Further proposals will no doubt need to be built
> out from use cases as we discuss them further:
>
> https://wiki.openstack.org/wiki/Meetings/NFV
>
> Feel free to add any blueprints from the Advanced Services efforts that
> were missed!
>
> Thanks,
>
> Steve
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [NFV] Mission statement prosal

2014-05-19 Thread Ian Wells
I would go with 'define the use cases and identify and prioritise the
requirements', personally, but that's a nit.  We seem to have absolved our
members from actually providing the implementation, which is a bit cheeky...
-- 
Ian.


On 19 May 2014 10:19, Nicolas Barcet  wrote:

> Hello,
>
> As promised during the second BoF session (thanks a lot to Chris Wright
> for leading this), here is a first try at defining the purpose of our
> special interest group.
>
> ---
> Mission statement for the OpenStack NFV Special Interest Group:
>
> The SIG aims to define and prioritize the use cases which are required to
> run Network Function Virtualization (NFV) instances on top of OpenStack.
> The requirements are to be passed on to various projects within OpenStack
> to promote their implementation.
>
> The requirements expressed by this group should be made so that each of
> them have a test case which can be verified using a an OpenSource
> implementation. This is to ensure that tests can be done without any
> special hardware or proprietary software, which is key for continuous
> integration tests in the OpenStack gate.
> ---
>
> Comments, suggestions and fixes are obviously welcome!
>
> Best,
> Nick
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][L3] BGP Dynamic Routing Proposal

2014-05-31 Thread Ian Wells
I've tested exabgp against a v6 peer, and it's an independent feature, so I
added that as a row separately from whether v6 advertisements work.  Might
be worth making the page general and adding in the vpn feature set too.


On 30 May 2014 16:50, Nachi Ueno  wrote:

> Hi folks
>
> ExaBGP won't suit for BGPVPN implementation because it isn't support vpnv4.
> Ryu is supporting it, however they have no internal api to binding
> neutron network & route target.
> so I think contrail is a only solution for  BGPVPN implementation now.
>
>
>
> 2014-05-30 2:22 GMT-07:00 Mathieu Rohon :
> > Hi,
> >
> > I was about mentionning ExaBGP too! can we also consider using those
> > BGP speakers for BGPVPN implementation [1].
> > This would be consistent to have the same BGP speaker used for every
> > BGP needs inside Neutron.
> >
> > [1]https://review.openstack.org/#/c/93329/
> >
> >
> > On Fri, May 30, 2014 at 10:54 AM, Jaume Devesa 
> wrote:
> >> Hello Takashi,
> >>
> >> thanks for doing this! As we have proposed ExaBgp[1] in the Dynamic
> Routing
> >> blueprint[2], I've added a new column for this speaker in the wiki
> page. I
> >> plan to fill it soon.
> >>
> >> ExaBgp was our first choice because we thought that run something in
> library
> >> mode would be much more easy to deal with (especially the exceptions and
> >> corner cases) and the code would be much cleaner. But seems that Ryu BGP
> >> also can fit in this requirement. And having the help from a Ryu
> developer
> >> like you turns it into a promising candidate!
> >>
> >> I'll start working now in a proof of concept to run the agent with these
> >> implementations and see if we need more requirements to compare between
> the
> >> speakers.
> >>
> >> [1]: https://wiki.openstack.org/wiki/Neutron/BGPSpeakersComparison
> >> [2]: https://review.openstack.org/#/c/90833/
> >>
> >> Regards,
> >>
> >>
> >> On 29 May 2014 18:42, YAMAMOTO Takashi  wrote:
> >>>
> >>> as per discussions on l3 subteem meeting today, i started
> >>> a bgp speakers comparison wiki page for this bp.
> >>>
> >>> https://wiki.openstack.org/wiki/Neutron/BGPSpeakersComparison
> >>>
> >>> Artem, can you add other requirements as columns?
> >>>
> >>> as one of ryu developers, i'm naturally biased to ryu bgp.
> >>> i appreciate if someone provides more info for other bgp speakers.
> >>>
> >>> YAMAMOTO Takashi
> >>>
> >>> > Good afternoon Neutron developers!
> >>> >
> >>> > There has been a discussion about dynamic routing in Neutron for the
> >>> > past few weeks in the L3 subteam weekly meetings. I've submitted a
> review
> >>> > request of the blueprint documenting the proposal of this feature:
> >>> > https://review.openstack.org/#/c/90833/. If you have any feedback or
> >>> > suggestions for improvement, I would love to hear your comments and
> include
> >>> > your thoughts in the document.
> >>> >
> >>> > Thank you.
> >>> >
> >>> > Sincerely,
> >>> > Artem Dmytrenko
> >>>
> >>> ___
> >>> OpenStack-dev mailing list
> >>> OpenStack-dev@lists.openstack.org
> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >>
> >>
> >>
> >> --
> >> Jaume Devesa
> >> Software Engineer at Midokura
> >>
> >> ___
> >> OpenStack-dev mailing list
> >> OpenStack-dev@lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [NFV] No-brainer BPs submitted and specs on NFV meeting page

2014-06-04 Thread Ian Wells
I've written up three blueprints for some problems that were repeatedly
aired at the summit's NFV BOF.  This may well only be a start of the
blockers list, but at least it is a start, and I think at least two of them
are straightforward.  (I might even have found a workaround for one, not
had chance to check it yet - but if I have it's only more proof that the
issue itself wants fixing, so I wrote it up anyway.)  They're listed on the
NFV meeting page [1].  Don't forget the meeting tomorrow at some ungodly
hour of the morning (thanks, Russell!).

There are one or two other things I was considering adding (specifically
disabling firewalls on ports), but in the meantime I've had a few potential
workarounds pointed out to me for various of the problems that I and others
have encountered, and I would like to test a couple of things out before I
add proposals for completely unnecessary features. It may be that, as well
as the changes we require here, we also need to come up with a list of
design patterns (or workarounds, depending on your viewpoint) typically
used in NFV coding.

Aside from that, there's now a ton of NFV related blueprints there, but
have yet to really break them into priorities.  Clearly we're not going to
do all of them in the remaining time before Juno, so some prioritisation is
required.  For the minute I think we have to isolate the blockers and the
things holding us back.  Many of the ideas there are useful, but not
universal to all NFV work, and I think we need to concentrate our efforts
on making it possible before we worry about making it efficient, elegant,
or anything else.  Also, many of them are not NFV-centric, they're just
good ideas that stand on their own and also happen to improve NFV
functioning; where that's the case, please, don't wait on us, go away and
get them implemented!

Since the list is so long, we're clearly not going to go through them all
in a one hour IRC meeting. If you have hands on experience implementing NFV
functions or infrastructure, and if you spot something in there that is the
answer to all your problems (or, at least, one of your problems), flag it
to Russell, privately, before the meeting, and if you're unsatisfied with
the result, then there's a mailing list here to discuss it afterward.
Otherwise I think we need to have an offline hack at putting them in
priority order - after the meeting is over.

Cheers,
-- 
Ian.

[1] https://wiki.openstack.org/wiki/Meetings/NFV
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][ipv6] Support ipv6 RA in neutron

2014-06-13 Thread Ian Wells
I'm only part way through reviewing this, but I think there's a fundamental
error in it.  We were at one point going to use 'enable_dhcp' in the
current set of flags to indicate something meaningful, but eventually we
decided that its current behaviour (despite the naming) really meant 'no
address assignment protocols are active' - which is why setting
enable_dhcp=False completely disables DHCP and RA activity in Neutron.  (I
believe Mark pointed that out, and agitated for backward compatibility,
though I couldn't tell you which forum it was in.)  Your proposal reverses
that decision, and says that it can be set to false and yet RAs will still
be sent out.

It's also why there are two additional attributes, because a single option
isn't really expressive enough once you consider the provider network
cases.  You are, in your own way, also using two attributes, by repurposing
enable_dhcp - your attribute values are different, but you're doing it for
similar reasons.

I'll put up my other review comments a bit later on when I've had another
think.
-- 
Ian.



On 11 June 2014 06:07, Robert Li (baoli)  wrote:

>  Hi,
>
>  I was mistakenly reusing the Blueprint
> https://blueprints.launchpad.net/neutron/+spec/neutron-ipv6-radvd-ra to
> draft up the ipv6 RA support in neutron. I apologize for any confusion that
> this may have caused. To correct it, I created a new blueprint
> https://blueprints.launchpad.net/neutron/+spec/neutron-ipv6-ra and linked
> it to a neutron spec https://review.openstack.org/92164.
>
>  The basic idea behind this blueprint is that neutron dhcp service as it
> is can hand out IPv6 addresses either in IPv6 only data network or in a
> dual stack data network. But the RA service including SLAAC support is
> missing in neutron. This service is important. Without RA, VMs won’t be
> able to automatically install default routes. Without SLAAC, a deployment
> that prefers SLAAC won’t be able to do it with neutron. The BP takes a
> straightforward approach to achieve that without requiring significant
> change to the existing dhcp service.
>
>  Any feedback is appreciated.
>
>  thanks,
> Robert
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Hosts within two Availability Zones : possible or not ?

2014-04-07 Thread Ian Wells
On 3 April 2014 08:21, Khanh-Toan Tran wrote:

> Otherwise we cannot provide redundancy to client except using Region which
> is dedicated infrastructure and networked separated and anti-affinity
> filter which IMO is not pragmatic as it has tendency of abusive usage.
>

I'm sorry, could you explain what you mean here by 'abusive usage'?
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron][Heat] The Neutron API and orchestration

2014-04-09 Thread Ian Wells
On 8 April 2014 10:35, Zane Bitter  wrote:

> To attach a port to a network and give it an IP from a specific subnet
>
>> on that network, you would use the *--fixed-ip subnet_id *option.
>>
>> Otherwise, the create port request will use the first subnet it finds
>> attached to that network to allocate the port an IP address. This is why
>> you are encountering the port-> subnet-> network chain. Subnets provide
>> the addresses. Networks are the actual layer 2 boundaries.�
>>
>
> It sounds like maybe Subnets need to be created independently of Networks
> and then passed as a list to the Network when it is created. In Heat
> there's no way to even predict which Subnet will be "first" unless the user
> adds explicit "depends_on" annotations (and even then, a Subnet could have
> been created outside of the template already).
>

A longstanding issue I've had with networks (now fixed, I believe, but
don't hold me to that) is that they don't work without subnets, but they
should - because ports don't work without an address, and yet, again, they
should - because our antispoofing is completely tied up with addresses and
has historically been hard-to-impossible to disable.  In fact, ports have
long been intended to have *one* ipv4 address - no more, which is annoying
for many sorts of IP based failover, and no fewer, which is annoying when
you're not using IP addresses in an obvious fashion (such as Openstack
deployments, if you've ever tried to use Openstack as your testbed for
testing Openstack itself).

Also, subnets seem to be branching out.

In ipv4, subnets are clearly 'here's another chunk of address space for
this network'.  You do need a router attached to be able to *reach* that
additional address space, and that's rather silly - but I've always seen
them as an artifact of ipv4 scarcity.

In ipv6, I believe we're using them, or going to use them, to allow
multiple global addresses on a port.  That's a pretty normal thing in ipv6,
which pretty much starts with the assumption that you have two addresses
per port and works upward from there.

-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Enabling vlan trunking on neutron port.

2014-09-20 Thread Ian Wells
Aaron: untrue.  It does, but OVS doesn't, and so networks implemented with
the OVS driver will drop packets.  Use Linuxbridge instead.
-- 
Ian.

On 19 September 2014 22:27, Aaron Rosen  wrote:

> Neutron doesn't allow you to send tagged traffic from the guest today
> https://github.com/openstack/neutron/blob/master/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py#L384
>
>
> On Fri, Sep 19, 2014 at 7:01 AM, Parikshit Manur <
> parikshit.ma...@citrix.com> wrote:
>
>>  Hi All,
>>
>> I have a setup which has VM on flat provider network ,
>> and I want to reach VM on VLAN provider network. The packets are forwarded
>> till veth pair and are getting dropped by br-int.
>>
>>
>>
>> Can neutron port be configured to allow vlan  trunking?
>>
>>
>>
>> Thanks,
>>
>> Parikshit Manur
>>
>> ___
>> OpenStack-dev mailing list
>> OpenStack-dev@lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] [nfv] VM-based VLAN trunking blueprints

2014-10-23 Thread Ian Wells
There are two categories of problems:

1. some networks don't pass VLAN tagged traffic, and it's impossible to
detect this from the API
2. it's not possible to pass traffic from multiple networks to one port on
one machine as (e.g.) VLAN tagged traffic

(1) is addressed by the VLAN trunking network blueprint, XXX. Nothing else
addresses this, particularly in the case that one VM is emitting tagged
packets that another one should receive and Openstack knows nothing about
what's going on.

We should get this in, and ideally in quickly and in a simple form where it
simply tells you if a network is capable of passing tagged traffic.  In
general, this is possible to calculate but a bit tricky in ML2 - anything
using the OVS mechanism driver won't pass VLAN traffic, anything using
VLANs should probably also claim it doesn't pass VLAN traffic (though
actually it depends a little on the switch), and combinations of L3 tunnels
plus Linuxbridge seem to pass VLAN traffic just fine.  Beyond that, it's
got a backward compatibility mode, so it's possible to ensure that any
plugin that doesn't implement VLAN reporting is still behaving correctly
per the specification.

(2) is addressed by several blueprints, and these have overlapping ideas
that all solve the problem.  I would summarise the possibilities as follows:

A. Racha's L2 gateway blueprint,
https://blueprints.launchpad.net/neutron/+spec/gateway-api-extension, which
(at its simplest, though it's had features added on and is somewhat
OVS-specific in its detail) acts as a concentrator to multiplex multiple
networks onto one as a trunk.  This is a very simple approach and doesn't
attempt to resolve any of the hairier questions like making DHCP work as
you might want it to on the ports attached to the trunk network.
B. Isaku's L2 gateway blueprint, https://review.openstack.org/#/c/100278/,
which is more limited in that it refers only to external connections.
C. Erik's VLAN port blueprint,
https://blueprints.launchpad.net/neutron/+spec/vlan-aware-vms, which tries
to solve the addressing problem mentioned above by having ports within
ports (much as, on the VM side, interfaces passing trunk traffic tend to
have subinterfaces that deal with the traffic streams).
D. Not a blueprint, but an idea I've come across: create a network that is
a collection of other networks, each 'subnetwork' being a VLAN in the
network trunk.
E. Kyle's very old blueprint,
https://blueprints.launchpad.net/neutron/+spec/quantum-network-bundle-api -
where we attach a port, not a network, to multiple networks.  Probably
doesn't work with appliances.

I would recommend we try and find a solution that works with both external
hardware and internal networks.  (B) is only a partial solution.

Considering the others, note that (C) and (D) add significant complexity to
the data model, independently of the benefits they bring.  (A) adds one new
functional block to networking (similar to today's routers, or even today's
Nova instances).

Finally, I suggest we consider the most prominent use case for multiplexing
networks.  This seems to be condensing traffic from many networks to either
a service VM or a service appliance.  It's useful, but not essential, to
have Neutron control the addresses on the trunk port subinterfaces.

So, that said, I personally favour (A) is the simplest way to solve our
current needs, and I recommend paring (A) right down to its basics: a block
that has access ports that we tag with a VLAN ID, and one trunk port that
has all of the access networks multiplexed onto it.  This is a slightly
dangerous block, in that you can actually set up forwarding blocks with it,
and that's a concern; but it's a simple service block like a router, it's
very, very simple to implement, and it solves our immediate problems so
that we can make forward progress.  It also doesn't affect the other
solutions significantly, so someone could implement (C) or (D) or (E) in
the future.
-- 
Ian.


On 23 October 2014 02:13, Alan Kavanagh  wrote:

> +1 many thanks to Kyle for putting this as a priority, its most welcome.
> /Alan
>
> -Original Message-
> From: Erik Moe [mailto:erik@ericsson.com]
> Sent: October-22-14 5:01 PM
> To: Steve Gordon; OpenStack Development Mailing List (not for usage
> questions)
> Cc: iawe...@cisco.com
> Subject: Re: [openstack-dev] [neutron] [nfv] VM-based VLAN trunking
> blueprints
>
>
> Hi,
>
> Great that we can have more focus on this. I'll attend the meeting on
> Monday and also attend the summit, looking forward to these discussions.
>
> Thanks,
> Erik
>
>
> -Original Message-
> From: Steve Gordon [mailto:sgor...@redhat.com]
> Sent: den 22 oktober 2014 16:29
> To: OpenStack Development Mailing List (not for usage questions)
> Cc: Erik Moe; iawe...@cisco.com; calum.lou...@metaswitch.com
> Subject: Re: [openstack-dev] [neutron] [nfv] VM-based VLAN trunking
> blueprints
>
> - Original Message -
> > From: "Kyle Mestery" 
> > To: "OpenStack Development Mail

Re: [openstack-dev] [neutron] [nfv] VM-based VLAN trunking blueprints

2014-10-27 Thread Ian Wells
On 25 October 2014 15:36, Erik Moe  wrote:

>  Then I tried to just use the trunk network as a plain pipe to the
> L2-gateway and connect to normal Neutron networks. One issue is that the
> L2-gateway will bridge the networks, but the services in the network you
> bridge to is unaware of your existence. This IMO is ok then bridging
> Neutron network to some remote network, but if you have an Neutron VM and
> want to utilize various resources in another Neutron network (since the one
> you sit on does not have any resources), things gets, let’s say non
> streamlined.
>

Indeed.  However, non-streamlined is not the end of the world, and I
wouldn't want to have to tag all VLANs a port is using on the port in
advance of using it (this works for some use cases, and makes others
difficult, particularly if you just want a native trunk and are happy for
Openstack not to have insight into what's going on on the wire).


>  Another issue with trunk network is that it puts new requirements on the
> infrastructure. It needs to be able to handle VLAN tagged frames. For a
> VLAN based network it would be QinQ.
>

Yes, and that's the point of the VLAN trunk spec, where we flag a network
as passing VLAN tagged packets; if the operator-chosen network
implementation doesn't support trunks, the API can refuse to make a trunk
network.  Without it we're still in the situation that on some clouds
passing VLANs works and on others it doesn't, and that the tenant can't
actually tell in advance which sort of cloud they're working on.

Trunk networks are a requirement for some use cases independent of the port
awareness of VLANs.  Based on the maxim, 'make the easy stuff easy and the
hard stuff possible' we can't just say 'no Neutron network passes VLAN
tagged packets'.  And even if we did, we're evading a problem that exists
with exactly one sort of network infrastructure - VLAN tagging for network
separation - while making it hard to use for all of the many other cases in
which it would work just fine.

In summary, if we did port-based VLAN knowledge I would want to be able to
use VLANs without having to use it (in much the same way that I would like,
in certain circumstances, not to have to use Openstack's address allocation
and DHCP - it's nice that I can, but I shouldn't be forced to).

My requirements were to have low/no extra cost for VMs using VLAN trunks
> compared to normal ports, no new bottlenecks/single point of failure. Due
> to this and previous issues I implemented the L2 gateway in a distributed
> fashion and since trunk network could not be realized in reality I only had
> them in the model and optimized them away.
>

Again, this is down to your choice of VLAN tagged networking and/or the OVS
ML2 driver; it doesn't apply to all deployments.


> But the L2-gateway + trunk network has a flexible API, what if someone
> connects two VMs to one trunk network, well, hard to optimize away.
>

That's certainly true, but it wasn't really intended to be optimised away.

 Anyway, due to these and other issues, I limited my scope and switched to
> the current trunk port/subport model.
>
>
>
> The code that is for review is functional, you can boot a VM with a trunk
> port + subports (each subport maps to a VLAN). The VM can send/receive VLAN
> traffic. You can add/remove subports on a running VM. You can specify IP
> address per subport and use DHCP to retrieve them etc.
>

I'm coming to realise that the two solutions address different needs - the
VLAN port one is much more useful for cases where you know what's going on
in the network and you want Openstack to help, but it's just not broad
enough to solve every problem.  It may well be that we want both solutions,
in which case we just need to agree that 'we shouldn't do trunk networking
because VLAN aware ports solve this problem' is not a valid argument during
spec review.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] vm can not transport large file under neutron ml2 + linux bridge + vxlan

2014-10-27 Thread Ian Wells
Path MTU discovery works on a path - something with an L3 router in the way
- where the outbound interface has a smaller MTU than the inbound one.
You're transmitting across an L2 network - no L3 routers present.  You send
a 1500 byte packet, the network fabric (which is not L3, has no address,
and therefore has no means to answer you) does all that it can do with that
packet - it drops it.  The sender retransmits, assuming congestion, but the
same thing happens.  Eventually the sender decides there's a network
problem and times out.

This is a common problem with Openstack deployments, although various
features of the virtual networking let you get away with it, with some
configs and not others.  OVS used to fake a PMTU exceeded message from the
destination if you tried to pass an overlarge packet - not in spec, but it
hid the problem nicely.  I have a suspicion that some implementations will
fragment the containing UDP packet, which is also not in spec and also
solves the problem (albeit with poor performance).

The right answer for you is to set the MTU in your machines to the same MTU
you've given the network, that is, 1450 bytes.  You can do this by setting
a DHCP option for MTU, providing your VMs support that option (search the
web for the solution, I don't have it offhand) or lower the MTU by hand or
by script when you start your VM.

The right answer for everyone is to properly determine and advertise the
network MTU to VMs (which, with provider networks, is not even consistent
from one network to the next) and that's the spec Kyle is referring to.
We'll be fixing this in Kilo.
-- 
Ian.


On 27 October 2014 20:14, Li Tianqing  wrote:

>
>
>
>
>
> --
> Best
> Li Tianqing
>
>
> At 2014-10-27 17:42:41, "Ihar Hrachyshka"  wrote:
> >-BEGIN PGP SIGNED MESSAGE-
> >Hash: SHA512
> >
> >On 27/10/14 02:18, Li Tianqing wrote:
> >> Hello, Right now, we test neutron under havana release. We
> >> configured network_device_mtu=1450 in neutron.conf, After create
> >> vm, we found the vm interface's mtu is 1500, the ping, ssh, is ok.
> >> But if we scp large file between vms then scp display 'stalled'.
> >> And iperf is also can not completed. If we configured vm's mtu to
> >> 1450, then iperf, scp all is ok. If we iperf with -M 1300, then the
> >> iperf is ok too. The vms path mtu discovery is set by default. I do
> >> not know why the vm whose mtu is 1500 can not send large file.
> >
> >There is a neutron spec currently in discussion for Kilo to finally
> >fix MTU issues due to tunneling, that also tries to propagate MTU
> >inside instances: https://review.openstack.org/#/c/105989/
>
> The problem is i do not know why the vm with 1500 mtu can not send large file?
> I found the packet send out all with DF, and is it because the DF set default 
> by linux cause the packet
> be dropped? And the application do not handle the return back icmp packet 
> with the smaller mtu?
>
>  >
> >/Ihar
> >-BEGIN PGP SIGNATURE-
> >Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
> >
> >iQEcBAEBCgAGBQJUThORAAoJEC5aWaUY1u571u4H/3EqEVPL1Q9KgymrudLpAdRh
> >fwNarwPWT8Ed+0x7WIXAr7OFXX1P90cKRAZKTlAEEI94vOrdr0s608ZX8awMuLeu
> >+LB6IA7nMpgJammfDb8zNmYLHuTQGGatXblOinvtm3XXIcNbkNu8840MTV3y/Jdq
> >Mndtz69TrjTrjn7r9REJ4bnRIlL4DGo+gufXPD49+yax1y/woefqwZPU13kO6j6R
> >Q0+MAy13ptg2NwX26OI+Sb801W0kpDXby6WZjfekXqxqv62fY1/lPQ3oqqJBd95K
> >EFe5NuogLV7UGH5vydQJa0eO2jw5lh8qLuHSShGcDEp/N6oQWiDzXYYYoEQdUic=
> >=jRQ/
> >-END PGP SIGNATURE-
> >
> >___
> >OpenStack-dev mailing list
> >OpenStack-dev@lists.openstack.org
> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] [nfv] VM-based VLAN trunking blueprints

2014-10-28 Thread Ian Wells
This all appears to be referring to trunking ports, rather than anything
else, so I've addressed the points in that respect.

On 28 October 2014 00:03, A, Keshava  wrote:

>  Hi,
>
> 1.   How many Trunk ports can be created ?
>
Why would there be a limit?

> Will there be any Active-Standby concepts will be there ?
>
I don't believe active-standby, or any HA concept, is directly relevant.
Did you have something in mind?

>   2.   Is it possible to configure multiple IP address configured on
> these ports ?
>
Yes, in the sense that you can have addresses per port.  The usual
restrictions to ports would apply, and they don't currently allow multiple
IP addresses (with the exception of the address-pair extension).

> In case IPv6 there can be multiple primary address configured will this be
> supported ?
>
No reason why not - we're expecting to re-use the usual port, so you'd
expect the features there to apply (in addition to having multiple sets of
subnet on a trunking port).

>   3.   If required can these ports can be aggregated into single one
> dynamically ?
>
That's not really relevant to trunk ports or networks.

>  4.   Will there be requirement to handle Nested tagged packet on
> such interfaces ?
>
For trunking ports, I don't believe anyone was considering it.


>
>
>
>
>
>
> Thanks & Regards,
>
> Keshava
>
>
>
> *From:* Ian Wells [mailto:ijw.ubu...@cack.org.uk]
> *Sent:* Monday, October 27, 2014 9:45 PM
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [neutron] [nfv] VM-based VLAN trunking
> blueprints
>
>
>
> On 25 October 2014 15:36, Erik Moe  wrote:
>
>  Then I tried to just use the trunk network as a plain pipe to the
> L2-gateway and connect to normal Neutron networks. One issue is that the
> L2-gateway will bridge the networks, but the services in the network you
> bridge to is unaware of your existence. This IMO is ok then bridging
> Neutron network to some remote network, but if you have an Neutron VM and
> want to utilize various resources in another Neutron network (since the one
> you sit on does not have any resources), things gets, let’s say non
> streamlined.
>
>
>
> Indeed.  However, non-streamlined is not the end of the world, and I
> wouldn't want to have to tag all VLANs a port is using on the port in
> advance of using it (this works for some use cases, and makes others
> difficult, particularly if you just want a native trunk and are happy for
> Openstack not to have insight into what's going on on the wire).
>
>
>
>   Another issue with trunk network is that it puts new requirements on
> the infrastructure. It needs to be able to handle VLAN tagged frames. For a
> VLAN based network it would be QinQ.
>
>
>
> Yes, and that's the point of the VLAN trunk spec, where we flag a network
> as passing VLAN tagged packets; if the operator-chosen network
> implementation doesn't support trunks, the API can refuse to make a trunk
> network.  Without it we're still in the situation that on some clouds
> passing VLANs works and on others it doesn't, and that the tenant can't
> actually tell in advance which sort of cloud they're working on.
>
> Trunk networks are a requirement for some use cases independent of the
> port awareness of VLANs.  Based on the maxim, 'make the easy stuff easy and
> the hard stuff possible' we can't just say 'no Neutron network passes VLAN
> tagged packets'.  And even if we did, we're evading a problem that exists
> with exactly one sort of network infrastructure - VLAN tagging for network
> separation - while making it hard to use for all of the many other cases in
> which it would work just fine.
>
> In summary, if we did port-based VLAN knowledge I would want to be able to
> use VLANs without having to use it (in much the same way that I would like,
> in certain circumstances, not to have to use Openstack's address allocation
> and DHCP - it's nice that I can, but I shouldn't be forced to).
>
>  My requirements were to have low/no extra cost for VMs using VLAN trunks
> compared to normal ports, no new bottlenecks/single point of failure. Due
> to this and previous issues I implemented the L2 gateway in a distributed
> fashion and since trunk network could not be realized in reality I only had
> them in the model and optimized them away.
>
>
>
> Again, this is down to your choice of VLAN tagged networking and/or the
> OVS ML2 driver; it doesn't apply to all deployments.
>
>
>
>  But the L2-gateway + trunk network has a flexible API, what if someone
> connects two VMs to

Re: [openstack-dev] [neutron] vm can not transport large file under neutron ml2 + linux bridge + vxlan

2014-10-28 Thread Ian Wells
On 28 October 2014 00:18, A, Keshava  wrote:

>  Hi,
>
>
>
> Currently OpenStack have any framework to notify the Tennant/Service-VM
> for such kind of notification based on VM’s interest ?
>

It's possible to use DHCP or RA to notify a VM of the MTU but there are
limitations (RAs don't let you increase the MTU, only decrease it, and
obviously VMs must support the MTU element of DHCP) and Openstack doesn't
currently use it.  You can statically configure the DHCP MTU number that
DHCP transmits; this is useful to work around problems but not really the
right answer to the problem.


>  VM may be very much interested for such kind of notification like
>
> 1.   Path MTU.
>
This will be correctly discovered from the ICMP PMTU exceeded message, and
Neutron routers should certainly be expected to send that.  (In fact the
namespace implementation of routers would do this if the router ever had
different MTUs on its ports; it's in the kernel network stack.)  There's no
requirement for a special notification, and indeed you couldn't do it that
way anyway.

>  2.   Based on specific incoming Tennant traffic, block/Allow
>  particular traffic flow at infrastructure level itself, instead of at VM.
>
I don't see the relevance; and you appear to be describing security groups.

>  This may require OpenStack infrastructure notification support to
> Tenant/Service VM.
>
Not particularly, as MTU doesn't generally change, and I think we would
forbid changing the MTU of a network after creation.  It's only an initial
configuration thing, therefore.  It might involve better cloud-init support
for network configuration, something that gets discussed periodically.

-- 
Ian.

>
>
> …
>
> Thanks & regards,
>
> Keshava
>
>
>
> *From:* Ian Wells [mailto:ijw.ubu...@cack.org.uk]
> *Sent:* Tuesday, October 28, 2014 11:40 AM
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [neutron] vm can not transport large file
> under neutron ml2 + linux bridge + vxlan
>
>
>
> Path MTU discovery works on a path - something with an L3 router in the
> way - where the outbound interface has a smaller MTU than the inbound one.
> You're transmitting across an L2 network - no L3 routers present.  You send
> a 1500 byte packet, the network fabric (which is not L3, has no address,
> and therefore has no means to answer you) does all that it can do with that
> packet - it drops it.  The sender retransmits, assuming congestion, but the
> same thing happens.  Eventually the sender decides there's a network
> problem and times out.
>
> This is a common problem with Openstack deployments, although various
> features of the virtual networking let you get away with it, with some
> configs and not others.  OVS used to fake a PMTU exceeded message from the
> destination if you tried to pass an overlarge packet - not in spec, but it
> hid the problem nicely.  I have a suspicion that some implementations will
> fragment the containing UDP packet, which is also not in spec and also
> solves the problem (albeit with poor performance).
>
> The right answer for you is to set the MTU in your machines to the same
> MTU you've given the network, that is, 1450 bytes.  You can do this by
> setting a DHCP option for MTU, providing your VMs support that option
> (search the web for the solution, I don't have it offhand) or lower the MTU
> by hand or by script when you start your VM.
>
> The right answer for everyone is to properly determine and advertise the
> network MTU to VMs (which, with provider networks, is not even consistent
> from one network to the next) and that's the spec Kyle is referring to.
> We'll be fixing this in Kilo.
> --
>
> Ian.
>
>
>
> On 27 October 2014 20:14, Li Tianqing  wrote:
>
>
>
>
>
>  --
>
> Best
>
> Li Tianqing
>
>
>
>
> At 2014-10-27 17:42:41, "Ihar Hrachyshka"  wrote:
>
> >-BEGIN PGP SIGNED MESSAGE-
>
> >Hash: SHA512
>
> >
>
> >On 27/10/14 02:18, Li Tianqing wrote:
>
> >> Hello, Right now, we test neutron under havana release. We
>
> >> configured network_device_mtu=1450 in neutron.conf, After create
>
> >> vm, we found the vm interface's mtu is 1500, the ping, ssh, is ok.
>
> >> But if we scp large file between vms then scp display 'stalled'.
>
> >> And iperf is also can not completed. If we configured vm's mtu to
>
> >> 1450, then iperf, scp all is ok. If we iperf with -M 1300, then the
>
> >> iperf is ok too. The vms path mtu discovery is set by default. I do
>
> >> not

Re: [openstack-dev] [neutron] vm can not transport large file under neutron ml2 + linux bridge + vxlan

2014-10-28 Thread Ian Wells
On 28 October 2014 00:30, Li Tianqing  wrote:

> lan, you are right, the receiver only receive packet that small than 1450.
> Because the sender does not send large packets at the begining, so
> tcpdump can catch some small packets.
>
> Another question about the mtu, what if we clear the  DF in the ip
> packets?  Then l2 can split packets into smaller mtu size?
>

Routers can split packets.  L2 networks don't understand IP headers and
therefore can't fragment packets.  DF doesn't change that.

(DF is there to make PMTU discovery work, incidentially; it's what prompts
routers to return PMTU exceeded messages.)
-- 
Ian.

At 2014-10-28 15:15:51, "Li Tianqing"  wrote:
>
> The problem is that it is not at the begining to transmit large file. It
> is after some packets trasmited, then the connection is choked.
> After the connection choked, from the bridge in compute host we can see
> the sender send packets, and the receiver can not get the packets.
> If it is the pmtud, then at the very begining, the packet can not transmit
> from the begining.
>
> At 2014-10-28 14:10:09, "Ian Wells"  wrote:
>
> Path MTU discovery works on a path - something with an L3 router in the
> way - where the outbound interface has a smaller MTU than the inbound one.
> You're transmitting across an L2 network - no L3 routers present.  You send
> a 1500 byte packet, the network fabric (which is not L3, has no address,
> and therefore has no means to answer you) does all that it can do with that
> packet - it drops it.  The sender retransmits, assuming congestion, but the
> same thing happens.  Eventually the sender decides there's a network
> problem and times out.
>
> This is a common problem with Openstack deployments, although various
> features of the virtual networking let you get away with it, with some
> configs and not others.  OVS used to fake a PMTU exceeded message from the
> destination if you tried to pass an overlarge packet - not in spec, but it
> hid the problem nicely.  I have a suspicion that some implementations will
> fragment the containing UDP packet, which is also not in spec and also
> solves the problem (albeit with poor performance).
>
> The right answer for you is to set the MTU in your machines to the same
> MTU you've given the network, that is, 1450 bytes.  You can do this by
> setting a DHCP option for MTU, providing your VMs support that option
> (search the web for the solution, I don't have it offhand) or lower the MTU
> by hand or by script when you start your VM.
>
> The right answer for everyone is to properly determine and advertise the
> network MTU to VMs (which, with provider networks, is not even consistent
> from one network to the next) and that's the spec Kyle is referring to.
> We'll be fixing this in Kilo.
> --
> Ian.
>
>
> On 27 October 2014 20:14, Li Tianqing  wrote:
>
>>
>>
>>
>>
>>
>> --
>> Best
>> Li Tianqing
>>
>>
>> At 2014-10-27 17:42:41, "Ihar Hrachyshka"  wrote:
>> >-BEGIN PGP SIGNED MESSAGE-
>> >Hash: SHA512
>> >
>> >On 27/10/14 02:18, Li Tianqing wrote:
>> >> Hello, Right now, we test neutron under havana release. We
>> >> configured network_device_mtu=1450 in neutron.conf, After create
>> >> vm, we found the vm interface's mtu is 1500, the ping, ssh, is ok.
>> >> But if we scp large file between vms then scp display 'stalled'.
>> >> And iperf is also can not completed. If we configured vm's mtu to
>> >> 1450, then iperf, scp all is ok. If we iperf with -M 1300, then the
>> >> iperf is ok too. The vms path mtu discovery is set by default. I do
>> >> not know why the vm whose mtu is 1500 can not send large file.
>> >
>> >There is a neutron spec currently in discussion for Kilo to finally
>> >fix MTU issues due to tunneling, that also tries to propagate MTU
>> >inside instances: https://review.openstack.org/#/c/105989/
>>
>> The problem is i do not know why the vm with 1500 mtu can not send large 
>> file?
>> I found the packet send out all with DF, and is it because the DF set 
>> default by linux cause the packet
>> be dropped? And the application do not handle the return back icmp packet 
>> with the smaller mtu?
>>
>>  >
>> >/Ihar
>> >-BEGIN PGP SIGNATURE-
>> >Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
>> >
>> >iQEcBAEBCgAGBQJUThORAAoJEC5aWaUY1u571u4H/3EqEVPL1Q9KgymrudLpAdRh
>> >fwNarwPWT8Ed+0x7WIXAr7OFXX1P90cKRAZKTlAEEI94vOrdr0s608ZX8awMuLeu
>> >+LB6IA7nMpgJamm

Re: [openstack-dev] [neutron] Lightning talks during the Design Summit!

2014-10-31 Thread Ian Wells
Maruti's talk is, in fact, so interesting that we should probably get
together and talk about this earlier in the week.  I very much want to see
virtual-physical programmatic bridging, and I know Kevin Benton is also
interested.  Arguably the MPLS VPN stuff also is similar in scope.  Can I
propose we have a meeting on cloud edge functionality?
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron] [nfv] VM-based VLAN trunking blueprints

2014-10-31 Thread Ian Wells
On 31 October 2014 06:29, Erik Moe  wrote:

>
>
>
>
> I thought Monday network meeting agreed on that “VLAN aware VMs”, Trunk
> network + L2GW were different use cases.
>
>
>
> Still I get the feeling that the proposals are put up against each other.
>

I think we agreed they were different, or at least the light was beginning
to dawn on the differences, but Maru's point was that if we really want to
decide what specs we have we need to show use cases not just for each spec
independently, but also include use cases where e.g. two specs are required
and the third doesn't help, so as to show that *all* of them are needed.
In fact, I suggest that first we do that - here - and then we meet up one
lunchtime and attack the specs in etherpad before submitting them.  In
theory we could have them reviewed and approved by the end of the week.
(This theory may not be very realistic, but it's good to set lofty goals,
my manager tells me.)

 Here are some examples why bridging between Neutron internal networks
> using trunk network and L2GW IMO should be avoided. I am still fine with
> bridging to external networks.
>
>
>
> Assuming VM with trunk port wants to use floating IP on specific VLAN.
> Router has to be created on a Neutron network behind L2GW since Neutron
> router cannot handle VLANs. (Maybe not too common use case, but just to
> show what kind of issues you can get into)
>
> neutron floatingip-associate FLOATING_IP_ID INTERNAL_VM_PORT_ID
>
> The code to check if valid port has to be able to traverse the L2GW.
> Handing of IP addresses of VM will most likely be affected since VM port is
> connected to several broadcast domains. Alternatively new API can be
> created.
>

Now, this is a very good argument for 'trunk ports', yes.  It's not
actually an argument against bridging between networks.  I think the
bridging case addresses use cases (generally NFV use cases) where you're
not interested in Openstack managing addresses - often because you're
forwarding traffic rather than being an endpoint, and/or you plan on
disabling all firewalling for speed reasons, but perhaps because you wish
to statically configure an address rather than use DHCP.  The point is
that, in the absence of a need for address-aware functions, you don't
really care much about ports, and in fact configuring ports with many
addresses may simply be overhead.  Also, as you say, this doesn't address
the external bridging use case where what you're bridging to is not
necessarily in Openstack's domain of control.

 In “VLAN aware VMs” trunk port mac address has to be globally unique since
> it can be connected to any network, other ports still only has to be unique
> per network. But for L2GW all mac addresses has to be globally unique since
> they might be bridged together at a later stage.
>

I'm not sure that that's particularly a problem - any VM with a port will
have one globally unique MAC address.  I wonder if I'm missing the point
here, though.

Also some implementations might not be able to take VID into account when
> doing mac address learning, forcing at least unique macs on a trunk network.
>

If an implementation struggles with VLANs then the logical thing to do
would be not to implement them in that driver.  Which is fine: I would
expect (for instance) LB-driver networking to work for this and leave
OVS-driver networking to never work for this, because there's little point
in fixing it.


>  Benefits with “VLAN aware VMs” are integration with existing Neutron
> services.
>
> Benefits with Trunk networks are less consumption of Neutron networks,
> less management per VLAN.
>

Actually, the benefit of trunk networks is:

- if I use an infrastructure where all networks are trunks, I can find out
that a network is a trunk
- if I use an infrastructure where no networks are trunks, I can find out
that a network is not a trunk
- if I use an infrastructure where trunk networks are more expensive, my
operator can price accordingly

And, again, this is all entirely independent of either VLAN-aware ports or
L2GW blocks.

 Benefits with L2GW is ease to do network stitching.
>
> There are other benefits with the different proposals, the point is that
> it might be beneficial to have all solutions.
>

I totally agree with this.

So, use cases that come to mind:

1. I want to pass VLAN-encapped traffic from VM A to VM B.  I do not know
at network setup time what VLANs I will use.
case A: I'm simulating a network with routers in.  The router config is not
under my control, so I don't know addresses or the number of VLANs in use.
(Yes, this use case exists, search for 'Cisco VIRL'.)
case B: NFV scenarios where the VNF orchestrator decides how few or many
VLANs are used, where the endpoints may or may not be addressed, and where
the addresses are selected by the VNF manager.  (For instance, every time I
add a customer to a VNF service I create another VLAN on an internal link.
The orchestrator is intelligent and selects the VLAN; telling Openstack 

  1   2   3   >