subject:"Re\: \[openstack\-dev\] \[nova\] \[neutron\] PCI pass\-through network support"

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-30 Thread Robert Li (baoli)

Ian,

I hope that you guys are in agreement on this. But take a look at the wiki: 
https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support and see if it has 
any difference from your proposals.  IMO, it's the critical piece of the 
proposal, and hasn't been specified in exact term yet. I'm not sure about 
vif_attributes or vif_stats, which I just heard from you. In any case, I'm not 
convinced with the flexibility and/or complexity, and so far I haven't seen a 
use case that really demands it. But I'd be happy to see one.

thanks,
Robert

On 1/29/14 4:43 PM, "Ian Wells" 
mailto:ijw.ubu...@cack.org.uk>> wrote:

My proposals:

On 29 January 2014 16:43, Robert Li (baoli) 
mailto:ba...@cisco.com>> wrote:
1. pci-flavor-attrs is configured through configuration files and will be
available on both the controller node and the compute nodes. Can the cloud
admin decide to add a new attribute in a running cloud? If that's
possible, how is that done?

When nova-compute starts up, it requests the VIF attributes that the schedulers 
need.  (You could have multiple schedulers; they could be in disagreement; it 
picks the last answer.)  It returns pci_stats by the selected combination of 
VIF attributes.

When nova-scheduler starts up, it sends an unsolicited cast of the attributes.  
nova-compute updates the attributes, clears its pci_stats and recreates them.

If nova-scheduler receives pci_stats with incorrect attributes it discards them.

(There is a row from nova-compute summarising devices for each unique 
combination of vif_stats, including 'None' where no attribute is set.)

I'm assuming here that the pci_flavor_attrs are read on startup of 
nova-scheduler and could be re-read and different when nova-scheduler is reset. 
 There's a relatively straightforward move from here to an API for setting it 
if this turns out to be useful, but firstly I think it would be an uncommon 
occurrence and secondly it's not something we should implement now.

2. PCI flavor will be defined using the attributes in pci-flavor-attrs. A
flavor is defined with a matching expression in the form of attr1 = val11
[| val12 Š.], [attr2 = val21 [| val22 Š]], Š. And this expression is used
to match one or more PCI stats groups until a free PCI device is located.
In this case, both attr1 and attr2 can have multiple values, and both
attributes need to be satisfied. Please confirm this understanding is
correct

This looks right to me as we've discussed it, but I think we'll be wanting 
something that allows a top level AND.  In the above example, I can't say an 
Intel NIC and a Mellanox NIC are equally OK, because I can't say (intel + 
product ID 1) AND (Mellanox + product ID 2).  I'll leave Yunhong to decice how 
the details should look, though.

3. I'd like to see an example that involves multiple attributes. let's say
pci-flavor-attrs = {gpu, net-group, device_id, product_id}. I'd like to
know how PCI stats groups are formed on compute nodes based on that, and
how many of PCI stats groups are there? What's the reasonable guidelines
in defining the PCI flavors.

I need to write up the document for this, and it's overdue.  Leave it with me.
--
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-29 Thread Ian Wells

My proposals:

On 29 January 2014 16:43, Robert Li (baoli)  wrote:

> 1. pci-flavor-attrs is configured through configuration files and will be
> available on both the controller node and the compute nodes. Can the cloud
> admin decide to add a new attribute in a running cloud? If that's
> possible, how is that done?
>

When nova-compute starts up, it requests the VIF attributes that the
schedulers need.  (You could have multiple schedulers; they could be in
disagreement; it picks the last answer.)  It returns pci_stats by the
selected combination of VIF attributes.

When nova-scheduler starts up, it sends an unsolicited cast of the
attributes.  nova-compute updates the attributes, clears its pci_stats and
recreates them.

If nova-scheduler receives pci_stats with incorrect attributes it discards
them.

(There is a row from nova-compute summarising devices for each unique
combination of vif_stats, including 'None' where no attribute is set.)

I'm assuming here that the pci_flavor_attrs are read on startup of
nova-scheduler and could be re-read and different when nova-scheduler is
reset.  There's a relatively straightforward move from here to an API for
setting it if this turns out to be useful, but firstly I think it would be
an uncommon occurrence and secondly it's not something we should implement
now.

2. PCI flavor will be defined using the attributes in pci-flavor-attrs. A
> flavor is defined with a matching expression in the form of attr1 = val11
> [| val12 Š.], [attr2 = val21 [| val22 Š]], Š. And this expression is used
> to match one or more PCI stats groups until a free PCI device is located.
> In this case, both attr1 and attr2 can have multiple values, and both
> attributes need to be satisfied. Please confirm this understanding is
> correct
>

This looks right to me as we've discussed it, but I think we'll be wanting
something that allows a top level AND.  In the above example, I can't say
an Intel NIC and a Mellanox NIC are equally OK, because I can't say (intel
+ product ID 1) AND (Mellanox + product ID 2).  I'll leave Yunhong to
decice how the details should look, though.

3. I'd like to see an example that involves multiple attributes. let's say
> pci-flavor-attrs = {gpu, net-group, device_id, product_id}. I'd like to
> know how PCI stats groups are formed on compute nodes based on that, and
> how many of PCI stats groups are there? What's the reasonable guidelines
> in defining the PCI flavors.
>

I need to write up the document for this, and it's overdue.  Leave it with
me.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-29 Thread Robert Li (baoli)

Hi Yongli,

Thank you for addressing my comments, and for adding the encryption card
use case. One thing that I want to point out is that in this use case, you
may not use the pci-flavor in the --nic option because it's not a neutron
feature.

I have a few more questions:
1. pci-flavor-attrs is configured through configuration files and will be
available on both the controller node and the compute nodes. Can the cloud
admin decide to add a new attribute in a running cloud? If that's
possible, how is that done?
2. PCI flavor will be defined using the attributes in pci-flavor-attrs. A
flavor is defined with a matching expression in the form of attr1 = val11
[| val12 Š.], [attr2 = val21 [| val22 Š]], Š. And this expression is used
to match one or more PCI stats groups until a free PCI device is located.
In this case, both attr1 and attr2 can have multiple values, and both
attributes need to be satisfied. Please confirm this understanding is
correct
3. I'd like to see an example that involves multiple attributes. let's say
pci-flavor-attrs = {gpu, net-group, device_id, product_id}. I'd like to
know how PCI stats groups are formed on compute nodes based on that, and
how many of PCI stats groups are there? What's the reasonable guidelines
in defining the PCI flavors.

thanks,
Robert

On 1/28/14 10:16 PM, "Robert Li (baoli)"  wrote:

>Hi,
>
>I added a few comments in this wiki that Yongli came up with:
>https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support
>
>Please check it out and look for Robert in the wiki.
>
>Thanks,
>Robert
>
>On 1/21/14 9:55 AM, "Robert Li (baoli)"  wrote:
>
>>Yunhong, 
>>
>>Just try to understand your use case:
>>-- a VM can only work with cards from vendor V1
>>-- a VM can work with cards from both vendor V1 and V2
>>
>>  So stats in the two flavors will overlap in the PCI flavor
>>solution.
>>I'm just trying to say that this is something that needs to be properly
>>addressed.
>>
>>
>>Just for the sake of discussion, another solution to meeting the above
>>requirement is able to say in the nova flavor's extra-spec
>>
>>   encrypt_card = card from vendor V1 OR encrypt_card = card from
>>vendor V2
>>
>>
>>In other words, this can be solved in the nova flavor, rather than
>>introducing a new flavor.
>>
>>Thanks,
>>Robert
>>   
>>
>>On 1/17/14 7:03 PM, "yunhong jiang" 
>>wrote:
>>
>>>On Fri, 2014-01-17 at 22:30 +, Robert Li (baoli) wrote:
 Yunhong,

 I'm hoping that these comments can be directly addressed:
   a practical deployment scenario that requires arbitrary
 attributes.
>>>
>>>I'm just strongly against to support only one attributes (your PCI
>>>group) for scheduling and management, that's really TOO limited.
>>>
>>>A simple scenario is, I have 3 encryption card:
>>> Card 1 (vendor_id is V1, device_id =0xa)
>>> card 2(vendor_id is V1, device_id=0xb)
>>> card 3(vendor_id is V2, device_id=0xb)
>>>
>>> I have two images. One image only support Card 1 and another image
>>>support Card 1/3 (or any other combination of the 3 card type). I don't
>>>only one attributes will meet such requirement.
>>>
>>>As to arbitrary attributes or limited list of attributes, my opinion is,
>>>as there are so many type of PCI devices and so many potential of PCI
>>>devices usage, support arbitrary attributes will make our effort more
>>>flexible, if we can push the implementation into the tree.
>>>
   detailed design on the following (that also take into account
 the
 introduction of predefined attributes):
 * PCI stats report since the scheduler is stats based
>>>
>>>I don't think there are much difference with current implementation.
>>>
 * the scheduler in support of PCI flavors with arbitrary
 attributes and potential overlapping.
>>>
>>>As Ian said, we need make sure the pci_stats and the PCI flavor have the
>>>same set of attributes, so I don't think there are much difference with
>>>current implementation.
>>>
   networking requirements to support multiple provider
 nets/physical
 nets
>>>
>>>Can't the extra info resolve this issue? Can you elaborate the issue?
>>>
>>>Thanks
>>>--jyh

 I guess that the above will become clear as the discussion goes on.
 And we
 also need to define the deliveries

 Thanks,
 Robert 
>>>
>>>
>>>___
>>>OpenStack-dev mailing list
>>>OpenStack-dev@lists.openstack.org
>>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-28 Thread Robert Li (baoli)

Hi,

I added a few comments in this wiki that Yongli came up with:
https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support

Please check it out and look for Robert in the wiki.

Thanks,
Robert

On 1/21/14 9:55 AM, "Robert Li (baoli)"  wrote:

>Yunhong, 
>
>Just try to understand your use case:
>-- a VM can only work with cards from vendor V1
>-- a VM can work with cards from both vendor V1 and V2
>
>  So stats in the two flavors will overlap in the PCI flavor solution.
>I'm just trying to say that this is something that needs to be properly
>addressed.
>
>
>Just for the sake of discussion, another solution to meeting the above
>requirement is able to say in the nova flavor's extra-spec
>
>   encrypt_card = card from vendor V1 OR encrypt_card = card from
>vendor V2
>
>
>In other words, this can be solved in the nova flavor, rather than
>introducing a new flavor.
>
>Thanks,
>Robert
>   
>
>On 1/17/14 7:03 PM, "yunhong jiang"  wrote:
>
>>On Fri, 2014-01-17 at 22:30 +, Robert Li (baoli) wrote:
>>> Yunhong,
>>> 
>>> I'm hoping that these comments can be directly addressed:
>>>   a practical deployment scenario that requires arbitrary
>>> attributes.
>>
>>I'm just strongly against to support only one attributes (your PCI
>>group) for scheduling and management, that's really TOO limited.
>>
>>A simple scenario is, I have 3 encryption card:
>>  Card 1 (vendor_id is V1, device_id =0xa)
>>  card 2(vendor_id is V1, device_id=0xb)
>>  card 3(vendor_id is V2, device_id=0xb)
>>
>>  I have two images. One image only support Card 1 and another image
>>support Card 1/3 (or any other combination of the 3 card type). I don't
>>only one attributes will meet such requirement.
>>
>>As to arbitrary attributes or limited list of attributes, my opinion is,
>>as there are so many type of PCI devices and so many potential of PCI
>>devices usage, support arbitrary attributes will make our effort more
>>flexible, if we can push the implementation into the tree.
>>
>>>   detailed design on the following (that also take into account
>>> the
>>> introduction of predefined attributes):
>>> * PCI stats report since the scheduler is stats based
>>
>>I don't think there are much difference with current implementation.
>>
>>> * the scheduler in support of PCI flavors with arbitrary
>>> attributes and potential overlapping.
>>
>>As Ian said, we need make sure the pci_stats and the PCI flavor have the
>>same set of attributes, so I don't think there are much difference with
>>current implementation.
>>
>>>   networking requirements to support multiple provider
>>> nets/physical
>>> nets
>>
>>Can't the extra info resolve this issue? Can you elaborate the issue?
>>
>>Thanks
>>--jyh
>>> 
>>> I guess that the above will become clear as the discussion goes on.
>>> And we
>>> also need to define the deliveries
>>>  
>>> Thanks,
>>> Robert 
>>
>>
>>___
>>OpenStack-dev mailing list
>>OpenStack-dev@lists.openstack.org
>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-21 Thread Robert Li (baoli)

Yunhong, 

Just try to understand your use case:
-- a VM can only work with cards from vendor V1
-- a VM can work with cards from both vendor V1 and V2

  So stats in the two flavors will overlap in the PCI flavor solution.
I'm just trying to say that this is something that needs to be properly
addressed.


Just for the sake of discussion, another solution to meeting the above
requirement is able to say in the nova flavor's extra-spec

   encrypt_card = card from vendor V1 OR encrypt_card = card from
vendor V2


In other words, this can be solved in the nova flavor, rather than
introducing a new flavor.

Thanks,
Robert
   

On 1/17/14 7:03 PM, "yunhong jiang"  wrote:

>On Fri, 2014-01-17 at 22:30 +, Robert Li (baoli) wrote:
>> Yunhong,
>> 
>> I'm hoping that these comments can be directly addressed:
>>   a practical deployment scenario that requires arbitrary
>> attributes.
>
>I'm just strongly against to support only one attributes (your PCI
>group) for scheduling and management, that's really TOO limited.
>
>A simple scenario is, I have 3 encryption card:
>   Card 1 (vendor_id is V1, device_id =0xa)
>   card 2(vendor_id is V1, device_id=0xb)
>   card 3(vendor_id is V2, device_id=0xb)
>
>   I have two images. One image only support Card 1 and another image
>support Card 1/3 (or any other combination of the 3 card type). I don't
>only one attributes will meet such requirement.
>
>As to arbitrary attributes or limited list of attributes, my opinion is,
>as there are so many type of PCI devices and so many potential of PCI
>devices usage, support arbitrary attributes will make our effort more
>flexible, if we can push the implementation into the tree.
>
>>   detailed design on the following (that also take into account
>> the
>> introduction of predefined attributes):
>> * PCI stats report since the scheduler is stats based
>
>I don't think there are much difference with current implementation.
>
>> * the scheduler in support of PCI flavors with arbitrary
>> attributes and potential overlapping.
>
>As Ian said, we need make sure the pci_stats and the PCI flavor have the
>same set of attributes, so I don't think there are much difference with
>current implementation.
>
>>   networking requirements to support multiple provider
>> nets/physical
>> nets
>
>Can't the extra info resolve this issue? Can you elaborate the issue?
>
>Thanks
>--jyh
>> 
>> I guess that the above will become clear as the discussion goes on.
>> And we
>> also need to define the deliveries
>>  
>> Thanks,
>> Robert 
>
>
>___
>OpenStack-dev mailing list
>OpenStack-dev@lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-21 Thread Robert Li (baoli)

Just one comment:
  The devices allocated for an instance are immediately known after
the domain is created. Therefore it's possible to do a port update and
have the device configured while the instance is booting.

--Robert

On 1/19/14 2:15 AM, "Irena Berezovsky"  wrote:

>Hi Robert, Yonhong,
>Although network XML solution (option 1) is very elegant, it has one
>major disadvantage. As Robert mentioned, the disadvantage of the network
>XML is the inability to know what SR-IOV PCI device was actually
>allocated. When neutron is responsible to set networking configuration,
>manage admin status, set security groups, it should be able to identify
>the SR-IOV PCI device to apply configuration. Within current libvirt
>Network XML implementation, it does not seem possible.
>Between option (2) and (3), I do not have any preference, it should be as
>simple as possible.
>Option (3) that I raised can be achieved by renaming the network
>interface of Virtual Function via 'ip link set  name'. Interface logical
>name can be based on neutron port UUID. This will  allow neutron to
>discover devices, if backend plugin requires it. Once VM is migrating,
>suitable Virtual Function on the target node should be allocated, and
>then its corresponding network interface should be renamed to same
>logical name. This can be done without system rebooting. Still need to
>check how the Virtual Function corresponding network interface can be
>returned to its original name once is not used anymore as VM vNIC.
>
>Regards,
>Irena 
>
>-Original Message-
>From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
>Sent: Friday, January 17, 2014 9:06 PM
>To: OpenStack Development Mailing List (not for usage questions)
>Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
>support
>
>Robert, thanks for your long reply. Personally I'd prefer option 2/3 as
>it keep Nova the only entity for PCI management.
>
>Glad you are ok with Ian's proposal and we have solution to resolve the
>libvirt network scenario in that framework.
>
>Thanks
>--jyh
>
>> -Original Message-
>> From: Robert Li (baoli) [mailto:ba...@cisco.com]
>> Sent: Friday, January 17, 2014 7:08 AM
>> To: OpenStack Development Mailing List (not for usage questions)
>> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>> 
>> Yunhong,
>> 
>> Thank you for bringing that up on the live migration support. In
>> addition to the two solutions you mentioned, Irena has a different
>> solution. Let me put all the them here again:
>> 1. network xml/group based solution.
>>In this solution, each host that supports a provider
>> net/physical net can define a SRIOV group (it's hard to avoid the term
>> as you can see from the suggestion you made based on the PCI flavor
>> proposal). For each SRIOV group supported on a compute node, A network
>> XML will be created the first time the nova compute service is running
>> on that node.
>> * nova will conduct scheduling, but not PCI device allocation
>> * it's a simple and clean solution, documented in libvirt as
>> the way to support live migration with SRIOV. In addition, a network
>> xml is nicely mapped into a provider net.
>> 2. network xml per PCI device based solution
>>This is the solution you brought up in this email, and Ian
>> mentioned this to me as well. In this solution, a network xml is
>> created when A VM is created. the network xml needs to be removed once
>> the VM is removed. This hasn't been tried out as far as I  know.
>> 3. interface xml/interface rename based solution
>>Irena brought this up. In this solution, the ethernet interface
>> name corresponding to the PCI device attached to the VM needs to be
>> renamed. One way to do so without requiring system reboot is to change
>> the udev rule's file for interface renaming, followed by a udev
>> reload.
>> 
>> Now, with the first solution, Nova doesn't seem to have control over
>> or visibility of the PCI device allocated for the VM before the VM is
>> launched. This needs to be confirmed with the libvirt support and see
>> if such capability can be provided. This may be a potential drawback
>> if a neutron plugin requires detailed PCI device information for
>>operation.
>> Irena may provide more insight into this. Ideally, neutron shouldn't
>> need this information because the device configuration can be done by
>> libvirt invoking the PCI device driver.
>> 
>> The other two solutions are similar. For example, you can

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-21 Thread Ian Wells

Document updated to talk about network aware scheduling (
https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit#-
section just before the use case list).

Based on yesterday's meeting, rkukura would also like to see network-aware
scheduling to work for non-PCI cases - where servers are not necessarily
connected to every physical segment and machines therefore need placing
based on where they can reach the networks they need.  I think this is an
exact parallel to the PCI case, except that we're also constrained by a
count of resources (you can connect an infinite number of VMs to a software
bridge, of course).  We should implement the scheduling changes as a
separate batch of work that solves both problems, if we can - and this
works with the two step approach, because step 1 brings us up to Neutron
parity and step 2 will add network-aware scheduling for both PCI and
non-PCI cases.

-- 
Ian.


On 20 January 2014 13:38, Ian Wells  wrote:

> On 20 January 2014 09:28, Irena Berezovsky  wrote:
>
>> Hi,
>> Having post PCI meeting discussion with Ian based on his proposal
>> https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#
>> ,
>> I am  not sure that the case that quite usable for SR-IOV based
>> networking is covered well by this proposal. The understanding I got is
>> that VM can land on the Host that will lack suitable PCI resource.
>>
>
> The issue we have is if we have multiple underlying networks in the system
> and only some Neutron networks are trunked on the network that the PCI
> device is attached to.  This can specifically happen in the case of
> provider versus trunk networks, though it's very dependent on the setup of
> your system.
>
> The issue is that, in the design we have, Neutron at present has no input
> into scheduling, and also that all devices in a flavor are precisely
> equivalent.  So if I say 'I want a 10G card attached to network X' I will
> get one of the cases in the 10G flavor with no regard as to whether it can
> actually attach to network X.
>
> I can see two options here:
>
> 1. What I'd do right now is I would make it so that a VM that is given an
> unsuitable network card fails to run in nova-compute when Neutorn discovers
> it can't attach the PCI device to the network.  This will get us a lot of
> use cases and a Neutron driver without solving the problem elegantly.
> You'd need to choose e.g. a provider or tenant network flavor, mindful of
> the network you're connecting to, so that Neutron can actually succeed,
> which is more visibility into the workings of Neutron than the user really
> ought to need.
>
> 2. When Nova checks that all the networks exist - which, conveniently, is
> in nova-api - it also gets attributes from the networks that can be used by
> the scheduler to choose a device.  So the scheduler chooses from a flavor
> *and*, within that flavor, from a subset of those devices with appopriate
> connectivity.  If we do this then the Neutron connection code doesn't
> change - it should still fail if the connection can't be made - but it
> becomes an internal error, since it's now an issue of consistency of
> setup.
>
> To do this, I think we would tell Neutron 'PCI extra-info X should be set
> to Y for this provider network and Z for tenant networks' - the precise
> implementation would be somewhat up to the driver - and then add the
> additional check in the scheduler.  The scheduling attributes list would
> have to include that attribute.
>
> Can you please provide an example for the required cloud admin PCI related
>> configurations on nova-compute and controller node with regards to the
>> following simplified scenario:
>>  -- There are 2 provider networks (phy1, phy2), each one has associated
>> range on vlan-ids
>>  -- Each compute node has 2 vendor adapters with SR-IOV  enabled feature,
>> exposing xx Virtual Functions.
>>  -- Every VM vnic on virtual network on provider network  phy1 or phy2
>>  should be pci pass-through vnic.
>>
>
> So, we would configure Neutron to check the 'e.physical_network' attribute
> on connection and to return it as a requirement on networks.  Any PCI on
> provider network 'phy1' would be tagged e.physical_network => 'phy1'.  When
> returning the network, an extra attribute would be supplied (perhaps
> something like 'pci_requirements => { e.physical_network => 'phy1'}'.  And
> nova-api would know that, in the case of macvtap and PCI directmap, it
> would need to pass this additional information to the scheduler which would
> need to make use of it in finding a device, over and above the flavor
> requirements.
>
> Neutron, when mapping a PCI port, would similarly work out from the
> Neutron network the trunk it needs to connect to, and would reject any
> mapping that didn't conform. If it did, it would work out how to
> encapsulate the traffic from the PCI device and set that up on the PF of
> the port.
>
> I'm not saying this is the only or best solution

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-20 Thread Ian Wells

On 20 January 2014 09:28, Irena Berezovsky  wrote:

> Hi,
> Having post PCI meeting discussion with Ian based on his proposal
> https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#
> ,
> I am  not sure that the case that quite usable for SR-IOV based networking
> is covered well by this proposal. The understanding I got is that VM can
> land on the Host that will lack suitable PCI resource.
>

The issue we have is if we have multiple underlying networks in the system
and only some Neutron networks are trunked on the network that the PCI
device is attached to.  This can specifically happen in the case of
provider versus trunk networks, though it's very dependent on the setup of
your system.

The issue is that, in the design we have, Neutron at present has no input
into scheduling, and also that all devices in a flavor are precisely
equivalent.  So if I say 'I want a 10G card attached to network X' I will
get one of the cases in the 10G flavor with no regard as to whether it can
actually attach to network X.

I can see two options here:

1. What I'd do right now is I would make it so that a VM that is given an
unsuitable network card fails to run in nova-compute when Neutorn discovers
it can't attach the PCI device to the network.  This will get us a lot of
use cases and a Neutron driver without solving the problem elegantly.
You'd need to choose e.g. a provider or tenant network flavor, mindful of
the network you're connecting to, so that Neutron can actually succeed,
which is more visibility into the workings of Neutron than the user really
ought to need.

2. When Nova checks that all the networks exist - which, conveniently, is
in nova-api - it also gets attributes from the networks that can be used by
the scheduler to choose a device.  So the scheduler chooses from a flavor
*and*, within that flavor, from a subset of those devices with appopriate
connectivity.  If we do this then the Neutron connection code doesn't
change - it should still fail if the connection can't be made - but it
becomes an internal error, since it's now an issue of consistency of
setup.

To do this, I think we would tell Neutron 'PCI extra-info X should be set
to Y for this provider network and Z for tenant networks' - the precise
implementation would be somewhat up to the driver - and then add the
additional check in the scheduler.  The scheduling attributes list would
have to include that attribute.

Can you please provide an example for the required cloud admin PCI related
> configurations on nova-compute and controller node with regards to the
> following simplified scenario:
>  -- There are 2 provider networks (phy1, phy2), each one has associated
> range on vlan-ids
>  -- Each compute node has 2 vendor adapters with SR-IOV  enabled feature,
> exposing xx Virtual Functions.
>  -- Every VM vnic on virtual network on provider network  phy1 or phy2
>  should be pci pass-through vnic.
>

So, we would configure Neutron to check the 'e.physical_network' attribute
on connection and to return it as a requirement on networks.  Any PCI on
provider network 'phy1' would be tagged e.physical_network => 'phy1'.  When
returning the network, an extra attribute would be supplied (perhaps
something like 'pci_requirements => { e.physical_network => 'phy1'}'.  And
nova-api would know that, in the case of macvtap and PCI directmap, it
would need to pass this additional information to the scheduler which would
need to make use of it in finding a device, over and above the flavor
requirements.

Neutron, when mapping a PCI port, would similarly work out from the Neutron
network the trunk it needs to connect to, and would reject any mapping that
didn't conform. If it did, it would work out how to encapsulate the traffic
from the PCI device and set that up on the PF of the port.

I'm not saying this is the only or best solution, but it does have the
advantage that it keeps all of the networking behaviour in Neutron -
hopefully Nova remains almost completely ignorant of what the network setup
is, since the only thing we have to do is pass on PCI requirements, and we
already have a convenient call flow we can use that's there for the network
existence check.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-20 Thread Irena Berezovsky

Hi,
Having post PCI meeting discussion with Ian based on his proposal 
https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#,
I am  not sure that the case that quite usable for SR-IOV based networking is 
covered well by this proposal. The understanding I got is that VM can land on 
the Host that will lack suitable PCI resource.
Can you please provide an example for the required cloud admin PCI related 
configurations on nova-compute and controller node with regards to the 
following simplified scenario:
 -- There are 2 provider networks (phy1, phy2), each one has associated range 
on vlan-ids
 -- Each compute node has 2 vendor adapters with SR-IOV  enabled feature, 
exposing xx Virtual Functions.
 -- Every VM vnic on virtual network on provider network  phy1 or phy2  should 
be pci pass-through vnic. 

Thanks a lot,
Irena

-Original Message-
From: Robert Li (baoli) [mailto:ba...@cisco.com] 
Sent: Saturday, January 18, 2014 12:33 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Yunhong,

I'm hoping that these comments can be directly addressed:
  a practical deployment scenario that requires arbitrary attributes.
  detailed design on the following (that also take into account the 
introduction of predefined attributes):
* PCI stats report since the scheduler is stats based
* the scheduler in support of PCI flavors with arbitrary attributes and 
potential overlapping.
  networking requirements to support multiple provider nets/physical nets

I guess that the above will become clear as the discussion goes on. And we also 
need to define the deliveries
 
Thanks,
Robert

On 1/17/14 2:02 PM, "Jiang, Yunhong"  wrote:

>Robert, thanks for your long reply. Personally I'd prefer option 2/3 as 
>it keep Nova the only entity for PCI management.
>
>Glad you are ok with Ian's proposal and we have solution to resolve the 
>libvirt network scenario in that framework.
>
>Thanks
>--jyh
>
>> -Original Message-
>> From: Robert Li (baoli) [mailto:ba...@cisco.com]
>> Sent: Friday, January 17, 2014 7:08 AM
>> To: OpenStack Development Mailing List (not for usage questions)
>> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through 
>> network support
>> 
>> Yunhong,
>> 
>> Thank you for bringing that up on the live migration support. In 
>>addition  to the two solutions you mentioned, Irena has a different 
>>solution. Let me  put all the them here again:
>> 1. network xml/group based solution.
>>In this solution, each host that supports a provider 
>>net/physical  net can define a SRIOV group (it's hard to avoid the 
>>term as you can see  from the suggestion you made based on the PCI 
>>flavor proposal). For each  SRIOV group supported on a compute node, A 
>>network XML will be  created the  first time the nova compute service 
>>is running on that node.
>> * nova will conduct scheduling, but not PCI device allocation
>> * it's a simple and clean solution, documented in libvirt as 
>>the  way to support live migration with SRIOV. In addition, a network 
>>xml is  nicely mapped into a provider net.
>> 2. network xml per PCI device based solution
>>This is the solution you brought up in this email, and Ian  
>>mentioned this to me as well. In this solution, a network xml is 
>>created  when A VM is created. the network xml needs to be removed 
>>once the  VM is  removed. This hasn't been tried out as far as I  
>>know.
>> 3. interface xml/interface rename based solution
>>Irena brought this up. In this solution, the ethernet 
>>interface  name corresponding to the PCI device attached to the VM 
>>needs to be  renamed. One way to do so without requiring system reboot 
>>is to change  the  udev rule's file for interface renaming, followed 
>>by a udev reload.
>> 
>> Now, with the first solution, Nova doesn't seem to have control over 
>>or  visibility of the PCI device allocated for the VM before the VM is  
>>launched. This needs to be confirmed with the libvirt support and see 
>>if  such capability can be provided. This may be a potential drawback 
>>if a  neutron plugin requires detailed PCI device information for operation.
>> Irena may provide more insight into this. Ideally, neutron shouldn't 
>>need  this information because the device configuration can be done by 
>>libvirt  invoking the PCI device driver.
>> 
>> The other two solutions are similar. For example, you can view the 
>>second  solution as one way to rename an

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-18 Thread Irena Berezovsky

Hi Robert, Yonhong,
Although network XML solution (option 1) is very elegant, it has one major 
disadvantage. As Robert mentioned, the disadvantage of the network XML is the 
inability to know what SR-IOV PCI device was actually allocated. When neutron 
is responsible to set networking configuration, manage admin status, set 
security groups, it should be able to identify the SR-IOV PCI device to apply 
configuration. Within current libvirt Network XML implementation, it does not 
seem possible.
Between option (2) and (3), I do not have any preference, it should be as 
simple as possible.
Option (3) that I raised can be achieved by renaming the network interface of 
Virtual Function via 'ip link set  name'. Interface logical name can be based 
on neutron port UUID. This will  allow neutron to discover devices, if backend 
plugin requires it. Once VM is migrating, suitable Virtual Function on the 
target node should be allocated, and then its corresponding network interface 
should be renamed to same logical name. This can be done without system 
rebooting. Still need to check how the Virtual Function corresponding network 
interface can be returned to its original name once is not used anymore as VM 
vNIC.

Regards,
Irena 

-Original Message-
From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com] 
Sent: Friday, January 17, 2014 9:06 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Robert, thanks for your long reply. Personally I'd prefer option 2/3 as it keep 
Nova the only entity for PCI management.

Glad you are ok with Ian's proposal and we have solution to resolve the libvirt 
network scenario in that framework.

Thanks
--jyh

> -Original Message-
> From: Robert Li (baoli) [mailto:ba...@cisco.com]
> Sent: Friday, January 17, 2014 7:08 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network 
> support
> 
> Yunhong,
> 
> Thank you for bringing that up on the live migration support. In 
> addition to the two solutions you mentioned, Irena has a different 
> solution. Let me put all the them here again:
> 1. network xml/group based solution.
>In this solution, each host that supports a provider 
> net/physical net can define a SRIOV group (it's hard to avoid the term 
> as you can see from the suggestion you made based on the PCI flavor 
> proposal). For each SRIOV group supported on a compute node, A network 
> XML will be created the first time the nova compute service is running 
> on that node.
> * nova will conduct scheduling, but not PCI device allocation
> * it's a simple and clean solution, documented in libvirt as 
> the way to support live migration with SRIOV. In addition, a network 
> xml is nicely mapped into a provider net.
> 2. network xml per PCI device based solution
>This is the solution you brought up in this email, and Ian 
> mentioned this to me as well. In this solution, a network xml is 
> created when A VM is created. the network xml needs to be removed once 
> the VM is removed. This hasn't been tried out as far as I  know.
> 3. interface xml/interface rename based solution
>Irena brought this up. In this solution, the ethernet interface 
> name corresponding to the PCI device attached to the VM needs to be 
> renamed. One way to do so without requiring system reboot is to change 
> the udev rule's file for interface renaming, followed by a udev 
> reload.
> 
> Now, with the first solution, Nova doesn't seem to have control over 
> or visibility of the PCI device allocated for the VM before the VM is 
> launched. This needs to be confirmed with the libvirt support and see 
> if such capability can be provided. This may be a potential drawback 
> if a neutron plugin requires detailed PCI device information for operation.
> Irena may provide more insight into this. Ideally, neutron shouldn't 
> need this information because the device configuration can be done by 
> libvirt invoking the PCI device driver.
> 
> The other two solutions are similar. For example, you can view the 
> second solution as one way to rename an interface, or camouflage an 
> interface under a network name. They all require additional works 
> before the VM is created and after the VM is removed.
> 
> I also agree with you that we should take a look at XenAPI on this.
> 
> 
> With regard to your suggestion on how to implement the first solution 
> with some predefined group attribute, I think it definitely can be 
> done. As I have pointed it out earlier, the PCI flavor proposal is 
> actually a generalized version of the PCI group. In othe

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-17 Thread yunhong jiang

On Fri, 2014-01-17 at 22:30 +, Robert Li (baoli) wrote:
> Yunhong,
> 
> I'm hoping that these comments can be directly addressed:
>   a practical deployment scenario that requires arbitrary
> attributes.

I'm just strongly against to support only one attributes (your PCI
group) for scheduling and management, that's really TOO limited.

A simple scenario is, I have 3 encryption card:
Card 1 (vendor_id is V1, device_id =0xa)
card 2(vendor_id is V1, device_id=0xb)
card 3(vendor_id is V2, device_id=0xb)

I have two images. One image only support Card 1 and another image
support Card 1/3 (or any other combination of the 3 card type). I don't
only one attributes will meet such requirement.

As to arbitrary attributes or limited list of attributes, my opinion is,
as there are so many type of PCI devices and so many potential of PCI
devices usage, support arbitrary attributes will make our effort more
flexible, if we can push the implementation into the tree.

>   detailed design on the following (that also take into account
> the
> introduction of predefined attributes):
> * PCI stats report since the scheduler is stats based

I don't think there are much difference with current implementation.

> * the scheduler in support of PCI flavors with arbitrary
> attributes and potential overlapping.

As Ian said, we need make sure the pci_stats and the PCI flavor have the
same set of attributes, so I don't think there are much difference with
current implementation.

>   networking requirements to support multiple provider
> nets/physical
> nets

Can't the extra info resolve this issue? Can you elaborate the issue?

Thanks
--jyh
> 
> I guess that the above will become clear as the discussion goes on.
> And we
> also need to define the deliveries
>  
> Thanks,
> Robert 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-17 Thread Robert Li (baoli)

Yunhong,

I'm hoping that these comments can be directly addressed:
  a practical deployment scenario that requires arbitrary attributes.
  detailed design on the following (that also take into account the
introduction of predefined attributes):
* PCI stats report since the scheduler is stats based
* the scheduler in support of PCI flavors with arbitrary
attributes and potential overlapping.
  networking requirements to support multiple provider nets/physical
nets

I guess that the above will become clear as the discussion goes on. And we
also need to define the deliveries
 
Thanks,
Robert

On 1/17/14 2:02 PM, "Jiang, Yunhong"  wrote:

>Robert, thanks for your long reply. Personally I'd prefer option 2/3 as
>it keep Nova the only entity for PCI management.
>
>Glad you are ok with Ian's proposal and we have solution to resolve the
>libvirt network scenario in that framework.
>
>Thanks
>--jyh
>
>> -Original Message-
>> From: Robert Li (baoli) [mailto:ba...@cisco.com]
>> Sent: Friday, January 17, 2014 7:08 AM
>> To: OpenStack Development Mailing List (not for usage questions)
>> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>> 
>> Yunhong,
>> 
>> Thank you for bringing that up on the live migration support. In
>>addition
>> to the two solutions you mentioned, Irena has a different solution. Let
>>me
>> put all the them here again:
>> 1. network xml/group based solution.
>>In this solution, each host that supports a provider net/physical
>> net can define a SRIOV group (it's hard to avoid the term as you can see
>> from the suggestion you made based on the PCI flavor proposal). For each
>> SRIOV group supported on a compute node, A network XML will be
>> created the
>> first time the nova compute service is running on that node.
>> * nova will conduct scheduling, but not PCI device allocation
>> * it's a simple and clean solution, documented in libvirt as the
>> way to support live migration with SRIOV. In addition, a network xml is
>> nicely mapped into a provider net.
>> 2. network xml per PCI device based solution
>>This is the solution you brought up in this email, and Ian
>> mentioned this to me as well. In this solution, a network xml is created
>> when A VM is created. the network xml needs to be removed once the
>> VM is
>> removed. This hasn't been tried out as far as I  know.
>> 3. interface xml/interface rename based solution
>>Irena brought this up. In this solution, the ethernet interface
>> name corresponding to the PCI device attached to the VM needs to be
>> renamed. One way to do so without requiring system reboot is to change
>> the
>> udev rule's file for interface renaming, followed by a udev reload.
>> 
>> Now, with the first solution, Nova doesn't seem to have control over or
>> visibility of the PCI device allocated for the VM before the VM is
>> launched. This needs to be confirmed with the libvirt support and see if
>> such capability can be provided. This may be a potential drawback if a
>> neutron plugin requires detailed PCI device information for operation.
>> Irena may provide more insight into this. Ideally, neutron shouldn't
>>need
>> this information because the device configuration can be done by libvirt
>> invoking the PCI device driver.
>> 
>> The other two solutions are similar. For example, you can view the
>>second
>> solution as one way to rename an interface, or camouflage an interface
>> under a network name. They all require additional works before the VM is
>> created and after the VM is removed.
>> 
>> I also agree with you that we should take a look at XenAPI on this.
>> 
>> 
>> With regard to your suggestion on how to implement the first solution
>>with
>> some predefined group attribute, I think it definitely can be done. As I
>> have pointed it out earlier, the PCI flavor proposal is actually a
>> generalized version of the PCI group. In other words, in the PCI group
>> proposal, we have one predefined attribute called PCI group, and
>> everything else works on top of that. In the PCI flavor proposal,
>> attribute is arbitrary. So certainly we can define a particular
>>attribute
>> for networking, which let's temporarily call sriov_group. But I can see
>> with this idea of predefined attributes, more of them will be required
>>by
>> different types of devices in the future. I'm sure it will keep us busy
>> although I'm not s

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-17 Thread Jiang, Yunhong

Robert, thanks for your long reply. Personally I'd prefer option 2/3 as it keep 
Nova the only entity for PCI management.

Glad you are ok with Ian's proposal and we have solution to resolve the libvirt 
network scenario in that framework.

Thanks
--jyh

> -Original Message-
> From: Robert Li (baoli) [mailto:ba...@cisco.com]
> Sent: Friday, January 17, 2014 7:08 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
> 
> Yunhong,
> 
> Thank you for bringing that up on the live migration support. In addition
> to the two solutions you mentioned, Irena has a different solution. Let me
> put all the them here again:
> 1. network xml/group based solution.
>In this solution, each host that supports a provider net/physical
> net can define a SRIOV group (it's hard to avoid the term as you can see
> from the suggestion you made based on the PCI flavor proposal). For each
> SRIOV group supported on a compute node, A network XML will be
> created the
> first time the nova compute service is running on that node.
> * nova will conduct scheduling, but not PCI device allocation
> * it's a simple and clean solution, documented in libvirt as the
> way to support live migration with SRIOV. In addition, a network xml is
> nicely mapped into a provider net.
> 2. network xml per PCI device based solution
>This is the solution you brought up in this email, and Ian
> mentioned this to me as well. In this solution, a network xml is created
> when A VM is created. the network xml needs to be removed once the
> VM is
> removed. This hasn't been tried out as far as I  know.
> 3. interface xml/interface rename based solution
>Irena brought this up. In this solution, the ethernet interface
> name corresponding to the PCI device attached to the VM needs to be
> renamed. One way to do so without requiring system reboot is to change
> the
> udev rule's file for interface renaming, followed by a udev reload.
> 
> Now, with the first solution, Nova doesn't seem to have control over or
> visibility of the PCI device allocated for the VM before the VM is
> launched. This needs to be confirmed with the libvirt support and see if
> such capability can be provided. This may be a potential drawback if a
> neutron plugin requires detailed PCI device information for operation.
> Irena may provide more insight into this. Ideally, neutron shouldn't need
> this information because the device configuration can be done by libvirt
> invoking the PCI device driver.
> 
> The other two solutions are similar. For example, you can view the second
> solution as one way to rename an interface, or camouflage an interface
> under a network name. They all require additional works before the VM is
> created and after the VM is removed.
> 
> I also agree with you that we should take a look at XenAPI on this.
> 
> 
> With regard to your suggestion on how to implement the first solution with
> some predefined group attribute, I think it definitely can be done. As I
> have pointed it out earlier, the PCI flavor proposal is actually a
> generalized version of the PCI group. In other words, in the PCI group
> proposal, we have one predefined attribute called PCI group, and
> everything else works on top of that. In the PCI flavor proposal,
> attribute is arbitrary. So certainly we can define a particular attribute
> for networking, which let's temporarily call sriov_group. But I can see
> with this idea of predefined attributes, more of them will be required by
> different types of devices in the future. I'm sure it will keep us busy
> although I'm not sure it's in a good way.
> 
> I was expecting you or someone else can provide a practical deployment
> scenario that would justify the flexibilities and the complexities.
> Although I'd prefer to keep it simple and generalize it later once a
> particular requirement is clearly identified, I'm fine to go with it if
> that's most of the folks want to do.
> 
> --Robert
> 
> 
> 
> On 1/16/14 8:36 PM, "yunhong jiang" 
> wrote:
> 
> >On Thu, 2014-01-16 at 01:28 +0100, Ian Wells wrote:
> >> To clarify a couple of Robert's points, since we had a conversation
> >> earlier:
> >> On 15 January 2014 23:47, Robert Li (baoli)  wrote:
> >>   ---  do we agree that BDF address (or device id, whatever
> >> you call it), and node id shouldn't be used as attributes in
> >> defining a PCI flavor?
> >>
> >>
> >> Note that the current spec doesn't a

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-17 Thread Robert Li (baoli)

Yunhong,

Thank you for bringing that up on the live migration support. In addition
to the two solutions you mentioned, Irena has a different solution. Let me
put all the them here again:
1. network xml/group based solution.
   In this solution, each host that supports a provider net/physical
net can define a SRIOV group (it's hard to avoid the term as you can see
from the suggestion you made based on the PCI flavor proposal). For each
SRIOV group supported on a compute node, A network XML will be created the
first time the nova compute service is running on that node.
* nova will conduct scheduling, but not PCI device allocation
* it's a simple and clean solution, documented in libvirt as the
way to support live migration with SRIOV. In addition, a network xml is
nicely mapped into a provider net.
2. network xml per PCI device based solution
   This is the solution you brought up in this email, and Ian
mentioned this to me as well. In this solution, a network xml is created
when A VM is created. the network xml needs to be removed once the VM is
removed. This hasn't been tried out as far as I  know.
3. interface xml/interface rename based solution
   Irena brought this up. In this solution, the ethernet interface
name corresponding to the PCI device attached to the VM needs to be
renamed. One way to do so without requiring system reboot is to change the
udev rule's file for interface renaming, followed by a udev reload.

Now, with the first solution, Nova doesn't seem to have control over or
visibility of the PCI device allocated for the VM before the VM is
launched. This needs to be confirmed with the libvirt support and see if
such capability can be provided. This may be a potential drawback if a
neutron plugin requires detailed PCI device information for operation.
Irena may provide more insight into this. Ideally, neutron shouldn't need
this information because the device configuration can be done by libvirt
invoking the PCI device driver.

The other two solutions are similar. For example, you can view the second
solution as one way to rename an interface, or camouflage an interface
under a network name. They all require additional works before the VM is
created and after the VM is removed.

I also agree with you that we should take a look at XenAPI on this.

With regard to your suggestion on how to implement the first solution with
some predefined group attribute, I think it definitely can be done. As I
have pointed it out earlier, the PCI flavor proposal is actually a
generalized version of the PCI group. In other words, in the PCI group
proposal, we have one predefined attribute called PCI group, and
everything else works on top of that. In the PCI flavor proposal,
attribute is arbitrary. So certainly we can define a particular attribute
for networking, which let's temporarily call sriov_group. But I can see
with this idea of predefined attributes, more of them will be required by
different types of devices in the future. I'm sure it will keep us busy
although I'm not sure it's in a good way.

I was expecting you or someone else can provide a practical deployment
scenario that would justify the flexibilities and the complexities.
Although I'd prefer to keep it simple and generalize it later once a
particular requirement is clearly identified, I'm fine to go with it if
that's most of the folks want to do.

--Robert

On 1/16/14 8:36 PM, "yunhong jiang"  wrote:

>On Thu, 2014-01-16 at 01:28 +0100, Ian Wells wrote:
>> To clarify a couple of Robert's points, since we had a conversation
>> earlier:
>> On 15 January 2014 23:47, Robert Li (baoli)  wrote:
>>   ---  do we agree that BDF address (or device id, whatever
>> you call it), and node id shouldn't be used as attributes in
>> defining a PCI flavor?
>> 
>> 
>> Note that the current spec doesn't actually exclude it as an option.
>> It's just an unwise thing to do.  In theory, you could elect to define
>> your flavors using the BDF attribute but determining 'the card in this
>> slot is equivalent to all the other cards in the same slot in other
>> machines' is probably not the best idea...  We could lock it out as an
>> option or we could just assume that administrators wouldn't be daft
>> enough to try.
>> 
>> 
>> * the compute node needs to know the PCI flavor.
>> [...] 
>>   - to support live migration, we need to use
>> it to create network xml
>> 
>> 
>> I didn't understand this at first and it took me a while to get what
>> Robert meant here.
>> 
>> This is based on Robert's current code for macvtap based live
>> migration.  The issue is that if you wish to migrate a VM and it's
>> tied to a physical interface, you can't guarantee that the same
>> physical interface is going to be used on the target machine, but at
>> the same time you can't change the libvirt.xml as it comes over with
>> the migrating machine.  The answer is

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-16 Thread yunhong jiang

On Thu, 2014-01-16 at 01:28 +0100, Ian Wells wrote:
> To clarify a couple of Robert's points, since we had a conversation
> earlier:
> On 15 January 2014 23:47, Robert Li (baoli)  wrote:
>   ---  do we agree that BDF address (or device id, whatever
> you call it), and node id shouldn't be used as attributes in
> defining a PCI flavor?
> 
> 
> Note that the current spec doesn't actually exclude it as an option.
> It's just an unwise thing to do.  In theory, you could elect to define
> your flavors using the BDF attribute but determining 'the card in this
> slot is equivalent to all the other cards in the same slot in other
> machines' is probably not the best idea...  We could lock it out as an
> option or we could just assume that administrators wouldn't be daft
> enough to try.
> 
> 
> * the compute node needs to know the PCI flavor.
> [...] 
>   - to support live migration, we need to use
> it to create network xml
> 
> 
> I didn't understand this at first and it took me a while to get what
> Robert meant here.
> 
> This is based on Robert's current code for macvtap based live
> migration.  The issue is that if you wish to migrate a VM and it's
> tied to a physical interface, you can't guarantee that the same
> physical interface is going to be used on the target machine, but at
> the same time you can't change the libvirt.xml as it comes over with
> the migrating machine.  The answer is to define a network and refer
> out to it from libvirt.xml.  In Robert's current code he's using the
> group name of the PCI devices to create a network containing the list
> of equivalent devices (those in the group) that can be macvtapped.
> Thus when the host migrates it will find another, equivalent,
> interface.  This falls over in the use case under consideration where
> a device can be mapped using more than one flavor, so we have to
> discard the use case or rethink the implementation.
> 
> There's a more complex solution - I think - where we create a
> temporary network for each macvtap interface a machine's going to use,
> with a name based on the instance UUID and port number, and containing
> the device to map.  Before starting the migration we would create a
> replacement network containing only the new device on the target host;
> migration would find the network from the name in the libvirt.xml, and
> the content of that network would behave identically.  We'd be
> creating libvirt networks on the fly and a lot more of them, and we'd
> need decent cleanup code too ('when freeing a PCI device, delete any
> network it's a member of'), so it all becomes a lot more hairy.
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Ian/Robert, below is my understanding to the method Robet want to use,
am I right?

a) Define a libvirt network as  "Using a macvtap "direct" connection"
section at "http://libvirt.org/formatnetwork.html . For example, like
followed one: 

 group_name1 

b) When assign SRIOV NIC devices to an instance, as in "Assignment from
a pool of SRIOV VFs in a libvirt  definition" section in
http://wiki.libvirt.org/page/Networking#PCI_Passthrough_of_host_network_devices 
, use libvirt network definition group_name1. For example, like followed one:

If my understanding is correct, then I have something unclear yet:
a) How will the libvirt create the libvirt network (i.e. libvirt network
group_name1)? Will it has be created when compute boot up, or it will be
created before instance creation? I suppose per Robert's design, it's
created when compute node is up, am I right?

b) If all the interface are used up by instance, what will happen.
Considering that 4 interface allocated to the group_name1 libvirt
network, and user try to migrate 6 instance with 'group_name1' network,
what will happen?

And below is my comments:

a) Yes, this is in fact different with the current nova PCI support
philosophy. Currently we assume Nova owns the devices, manage the device
assignment to each instance. While in such situation, libvirt network is
in fact another layer of PCI device management layer (although very
thin) !

b) This also remind me that possibly other VMM like XenAPI has special
requirement and we need input/confirmation from them also.

As how to resolve the issue, I think there are several solution:

a) Create one libvirt network for each SRIOV NIC assigned to each
instance dynamic, i.e. the libvirt network always has only one interface
included, it may be static created or dynamical created. This solution
in fact removes the allocation functionality of the libvirt network and
leaves only the configuration functionality.

b) Change Nova PCI to support a special type of PCI device attribute
(like th

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-16 Thread Sandhya Dasu (sadasu)

Hi Irena,
   Thanks for pointing out an alternative to the network xml solution to live 
migration. I am still not clear about the solution.

Some questions:

  1.  Where does the rename of the PCI device network interface name occur?
  2.  Can this rename be done for a VF? I think your example shows rename of a 
PF.

Thanks,
Sandhya

From: Irena Berezovsky mailto:ire...@mellanox.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
mailto:openstack-dev@lists.openstack.org>>
Date: Thursday, January 16, 2014 4:43 AM
To: "OpenStack Development Mailing List (not for usage questions)" 
mailto:openstack-dev@lists.openstack.org>>
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Ian,
Thank you for putting in writing the ongoing discussed specification.
I have added few comments on the Google doc [1].

As for live migration support, this can be done also without libvirt network 
usage.
Not very elegant, but working:  rename the interface of the PCI device to some 
logical name, let’s say based on neutron port UUID and put it into the 
interface XML, i.e.:
If PCI device network interface name  is eth8 and neutron port UUID is   
02bc4aec-b4f4-436f-b651-024 then rename it to something like: eth02bc4aec-b4'. 
The interface XML will look like this:

  ...










...

[1] 
https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#heading=h.308b0wqn1zde

BR,
Irena
From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Thursday, January 16, 2014 2:34 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

To clarify a couple of Robert's points, since we had a conversation earlier:
On 15 January 2014 23:47, Robert Li (baoli) 
mailto:ba...@cisco.com>> wrote:
  ---  do we agree that BDF address (or device id, whatever you call it), and 
node id shouldn't be used as attributes in defining a PCI flavor?

Note that the current spec doesn't actually exclude it as an option.  It's just 
an unwise thing to do.  In theory, you could elect to define your flavors using 
the BDF attribute but determining 'the card in this slot is equivalent to all 
the other cards in the same slot in other machines' is probably not the best 
idea...  We could lock it out as an option or we could just assume that 
administrators wouldn't be daft enough to try.
* the compute node needs to know the PCI flavor. [...]
  - to support live migration, we need to use it to create 
network xml

I didn't understand this at first and it took me a while to get what Robert 
meant here.

This is based on Robert's current code for macvtap based live migration.  The 
issue is that if you wish to migrate a VM and it's tied to a physical 
interface, you can't guarantee that the same physical interface is going to be 
used on the target machine, but at the same time you can't change the 
libvirt.xml as it comes over with the migrating machine.  The answer is to 
define a network and refer out to it from libvirt.xml.  In Robert's current 
code he's using the group name of the PCI devices to create a network 
containing the list of equivalent devices (those in the group) that can be 
macvtapped.  Thus when the host migrates it will find another, equivalent, 
interface.  This falls over in the use case under consideration where a device 
can be mapped using more than one flavor, so we have to discard the use case or 
rethink the implementation.

There's a more complex solution - I think - where we create a temporary network 
for each macvtap interface a machine's going to use, with a name based on the 
instance UUID and port number, and containing the device to map.  Before 
starting the migration we would create a replacement network containing only 
the new device on the target host; migration would find the network from the 
name in the libvirt.xml, and the content of that network would behave 
identically.  We'd be creating libvirt networks on the fly and a lot more of 
them, and we'd need decent cleanup code too ('when freeing a PCI device, delete 
any network it's a member of'), so it all becomes a lot more hairy.
--
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-16 Thread Ian Wells

On 16 January 2014 09:07, yongli he  wrote:

>  On 2014年01月16日 08:28, Ian Wells wrote:
>
> This is based on Robert's current code for macvtap based live migration.
> The issue is that if you wish to migrate a VM and it's tied to a physical
> interface, you can't guarantee that the same physical interface is going to
> be used on the target machine, but at the same time you can't change the
> libvirt.xml as it comes over with the migrating machine.  The answer is to
> define a network and refer out to it from libvirt.xml.  In Robert's current
> code he's using the group name of the PCI devices to create a network
> containing the list of equivalent devices (those in the group) that can be
> macvtapped.  Thus when the host migrates it will find another, equivalent,
> interface.  This falls over in the use case under
>
> but, with flavor we defined, the group could be a tag for this purpose,
> and all Robert's design still work, so it ok, right?
>

Well, you could make a label up consisting of the values of the attributes
in the group, but since a flavor can encompass multiple groups (for
instance, I group by device and vendor and then I use two device types in
my flavor) this still doesn't work.  Irena's solution does, though.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-16 Thread Irena Berezovsky

Ian,
Thank you for putting in writing the ongoing discussed specification.
I have added few comments on the Google doc [1].

As for live migration support, this can be done also without libvirt network 
usage.
Not very elegant, but working:  rename the interface of the PCI device to some 
logical name, let's say based on neutron port UUID and put it into the 
interface XML, i.e.:
If PCI device network interface name  is eth8 and neutron port UUID is   
02bc4aec-b4f4-436f-b651-024 then rename it to something like: eth02bc4aec-b4'. 
The interface XML will look like this:

  ...










...

[1] 
https://docs.google.com/document/d/1vadqmurlnlvZ5bv3BlUbFeXRS_wh-dsgi5plSjimWjU/edit?pli=1#heading=h.308b0wqn1zde

BR,
Irena
From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Thursday, January 16, 2014 2:34 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

To clarify a couple of Robert's points, since we had a conversation earlier:
On 15 January 2014 23:47, Robert Li (baoli) 
mailto:ba...@cisco.com>> wrote:
  ---  do we agree that BDF address (or device id, whatever you call it), and 
node id shouldn't be used as attributes in defining a PCI flavor?

Note that the current spec doesn't actually exclude it as an option.  It's just 
an unwise thing to do.  In theory, you could elect to define your flavors using 
the BDF attribute but determining 'the card in this slot is equivalent to all 
the other cards in the same slot in other machines' is probably not the best 
idea...  We could lock it out as an option or we could just assume that 
administrators wouldn't be daft enough to try.
* the compute node needs to know the PCI flavor. [...]
  - to support live migration, we need to use it to create 
network xml

I didn't understand this at first and it took me a while to get what Robert 
meant here.

This is based on Robert's current code for macvtap based live migration.  The 
issue is that if you wish to migrate a VM and it's tied to a physical 
interface, you can't guarantee that the same physical interface is going to be 
used on the target machine, but at the same time you can't change the 
libvirt.xml as it comes over with the migrating machine.  The answer is to 
define a network and refer out to it from libvirt.xml.  In Robert's current 
code he's using the group name of the PCI devices to create a network 
containing the list of equivalent devices (those in the group) that can be 
macvtapped.  Thus when the host migrates it will find another, equivalent, 
interface.  This falls over in the use case under consideration where a device 
can be mapped using more than one flavor, so we have to discard the use case or 
rethink the implementation.

There's a more complex solution - I think - where we create a temporary network 
for each macvtap interface a machine's going to use, with a name based on the 
instance UUID and port number, and containing the device to map.  Before 
starting the migration we would create a replacement network containing only 
the new device on the target host; migration would find the network from the 
name in the libvirt.xml, and the content of that network would behave 
identically.  We'd be creating libvirt networks on the fly and a lot more of 
them, and we'd need decent cleanup code too ('when freeing a PCI device, delete 
any network it's a member of'), so it all becomes a lot more hairy.
--
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-16 Thread yongli he


On 2014?01?16? 08:28, Ian Wells wrote:
To clarify a couple of Robert's points, since we had a conversation 
earlier:
On 15 January 2014 23:47, Robert Li (baoli) > wrote:


---  do we agree that BDF address (or device id, whatever you call
it), and node id shouldn't be used as attributes in defining a PCI
flavor?


Note that the current spec doesn't actually exclude it as an option.  
It's just an unwise thing to do.  In theory, you could elect to define 
your flavors using the BDF attribute but determining 'the card in this 
slot is equivalent to all the other cards in the same slot in other 
machines' is probably not the best idea...  We could lock it out as an 
option or we could just assume that administrators wouldn't be daft 
enough to try.


  * the compute node needs to know the PCI flavor. [...]
  - to support live migration, we need to use it
to create network xml


I didn't understand this at first and it took me a while to get what 
Robert meant here.


This is based on Robert's current code for macvtap based live 
migration.  The issue is that if you wish to migrate a VM and it's 
tied to a physical interface, you can't guarantee that the same 
physical interface is going to be used on the target machine, but at 
the same time you can't change the libvirt.xml as it comes over with 
the migrating machine.  The answer is to define a network and refer 
out to it from libvirt.xml.  In Robert's current code he's using the 
group name of the PCI devices to create a network containing the list 
of equivalent devices (those in the group) that can be macvtapped.  
Thus when the host migrates it will find another, equivalent, 
interface. This falls over in the use case under
but, with flavor we defined, the group could be a tag for this purpose, 
and all Robert's design still work, so it ok, right?
consideration where a device can be mapped using more than one flavor, 
so we have to discard the use case or rethink the implementation.


There's a more complex solution - I think - where we create a 
temporary network for each macvtap interface a machine's going to use, 
with a name based on the instance UUID and port number, and containing 
the device to map. Before starting the migration we would create a 
replacement network containing only the new device on the target host; 
migration would find the network from the name in the libvirt.xml, and 
the content of that network would behave identically.  We'd be 
creating libvirt networks on the fly and a lot more of them, and we'd 
need decent cleanup code too ('when freeing a PCI device, delete any 
network it's a member of'), so it all becomes a lot more hairy.

--
Ian.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-15 Thread Ian Wells

To clarify a couple of Robert's points, since we had a conversation earlier:
On 15 January 2014 23:47, Robert Li (baoli)  wrote:

>   ---  do we agree that BDF address (or device id, whatever you call it),
> and node id shouldn't be used as attributes in defining a PCI flavor?
>

Note that the current spec doesn't actually exclude it as an option.  It's
just an unwise thing to do.  In theory, you could elect to define your
flavors using the BDF attribute but determining 'the card in this slot is
equivalent to all the other cards in the same slot in other machines' is
probably not the best idea...  We could lock it out as an option or we
could just assume that administrators wouldn't be daft enough to try.

* the compute node needs to know the PCI flavor. [...]
>   - to support live migration, we need to use it to create
> network xml
>

I didn't understand this at first and it took me a while to get what Robert
meant here.

This is based on Robert's current code for macvtap based live migration.
The issue is that if you wish to migrate a VM and it's tied to a physical
interface, you can't guarantee that the same physical interface is going to
be used on the target machine, but at the same time you can't change the
libvirt.xml as it comes over with the migrating machine.  The answer is to
define a network and refer out to it from libvirt.xml.  In Robert's current
code he's using the group name of the PCI devices to create a network
containing the list of equivalent devices (those in the group) that can be
macvtapped.  Thus when the host migrates it will find another, equivalent,
interface.  This falls over in the use case under consideration where a
device can be mapped using more than one flavor, so we have to discard the
use case or rethink the implementation.

There's a more complex solution - I think - where we create a temporary
network for each macvtap interface a machine's going to use, with a name
based on the instance UUID and port number, and containing the device to
map.  Before starting the migration we would create a replacement network
containing only the new device on the target host; migration would find the
network from the name in the libvirt.xml, and the content of that network
would behave identically.  We'd be creating libvirt networks on the fly and
a lot more of them, and we'd need decent cleanup code too ('when freeing a
PCI device, delete any network it's a member of'), so it all becomes a lot
more hairy.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Ian Wells

If there are N flavor types there are N match expressions so I think it's
pretty much equivalent in terms of complexity.  It looks like some sort of
packing problem to me, trying to fit N objects into M boxes, hence my
statement that it's not going to be easy, but that's just a gut feeling -
some of the matches can be vague, such as only the vendor ID or a vendor
and two device types, so it's not as simple as one flavor matching one
stats row.
-- 
Ian.


On 13 January 2014 21:00, Jiang, Yunhong  wrote:

>  Ian, not sure if I get your question. Why should scheduler get the
> number of flavor types requested? The scheduler will only translate the PCI
> flavor to the pci property match requirement like it does now, (either
> vendor_id, device_id, or item in extra_info), then match the translated pci
> flavor, i.e. pci requests, to the pci stats.
>
>
>
> Thanks
>
> --jyh
>
>
>
> *From:* Ian Wells [mailto:ijw.ubu...@cack.org.uk]
> *Sent:* Monday, January 13, 2014 10:57 AM
>
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
>
>
>
> It's worth noting that this makes the scheduling a computationally hard
> problem. The answer to that in this scheme is to reduce the number of
> inputs to trivialise the problem.  It's going to be O(f(number of flavor
> types requested, number of pci_stats pools)) and if you group appropriately
> there shouldn't be an excessive number of pci_stats pools.  I am not going
> to stand up and say this makes it achievable - and if it doesn't them I'm
> not sure that anything would make overlapping flavors achievable - but I
> think it gives us some hope.
> --
>
> Ian.
>
>
>
> On 13 January 2014 19:27, Jiang, Yunhong  wrote:
>
> Hi, Robert, scheduler keep count based on pci_stats instead of the pci
> flavor.
>
>
>
> As stated by Ian at
> https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg13455.htmlalready,
>  the flavor will only use the tags used by pci_stats.
>
>
>
> Thanks
>
> --jyh
>
>
>
> *From:* Robert Li (baoli) [mailto:ba...@cisco.com]
> *Sent:* Monday, January 13, 2014 8:22 AM
>
>
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
>
>
>
> As I have responded in the other email, and If I understand PCI flavor
> correctly, then the issue that we need to deal with is the overlapping
> issue. A simplest case of this overlapping is that you can define a flavor
> F1 as [vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v']
> .  Let's assume that only the admin can define the flavors. It's not hard
> to see that a device can belong to the two different flavors in the same
> time. This introduces an issue in the scheduler. Suppose the scheduler
> (counts or stats based) maintains counts based on flavors (or the keys
> corresponding to the flavors). To request a device with the flavor F1,
>  counts in F2 needs to be subtracted by one as well. There may be several
> ways to achieve that. But regardless, it introduces tremendous overhead in
> terms of system processing and administrative costs.
>
>
>
> What are the use cases for that? How practical are those use cases?
>
>
>
> thanks,
>
> Robert
>
>
>
> On 1/10/14 9:34 PM, "Ian Wells"  wrote:
>
>
>
>
> >
> > OK - so if this is good then I think the question is how we could change
> the 'pci_whitelist' parameter we have - which, as you say, should either
> *only* do whitelisting or be renamed - to allow us to add information.
>  Yongli has something along those lines but it's not flexible and it
> distinguishes poorly between which bits are extra information and which
> bits are matching expressions (and it's still called pci_whitelist) - but
> even with those criticisms it's very close to what we're talking about.
>  When we have that I think a lot of the rest of the arguments should simply
> resolve themselves.
> >
> >
> >
> > [yjiang5_1] The reason that not easy to find a flexible/distinguishable
> change to pci_whitelist is because it combined two things. So a
> stupid/naive solution in my head is, change it to VERY generic name,
> ‘pci_devices_information’,
> >
> > and change schema as an array of {‘devices_property’=regex exp,
> ‘group_name’ = ‘g1’} dictionary, and the device_property expression can be
> ‘address ==xxx, vendor_id == xxx’ (i.e. similar with current white list),
>  and we can squeeze m

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Jiang, Yunhong

Ian, not sure if I get your question. Why should scheduler get the number of 
flavor types requested? The scheduler will only translate the PCI flavor to the 
pci property match requirement like it does now, (either vendor_id, device_id, 
or item in extra_info), then match the translated pci flavor, i.e. pci 
requests, to the pci stats.

Thanks
--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Monday, January 13, 2014 10:57 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

It's worth noting that this makes the scheduling a computationally hard 
problem. The answer to that in this scheme is to reduce the number of inputs to 
trivialise the problem.  It's going to be O(f(number of flavor types requested, 
number of pci_stats pools)) and if you group appropriately there shouldn't be 
an excessive number of pci_stats pools.  I am not going to stand up and say 
this makes it achievable - and if it doesn't them I'm not sure that anything 
would make overlapping flavors achievable - but I think it gives us some hope.
--
Ian.

On 13 January 2014 19:27, Jiang, Yunhong 
mailto:yunhong.ji...@intel.com>> wrote:
Hi, Robert, scheduler keep count based on pci_stats instead of the pci flavor.

As stated by Ian at 
https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg13455.html 
already, the flavor will only use the tags used by pci_stats.

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com<mailto:ba...@cisco.com>]
Sent: Monday, January 13, 2014 8:22 AM

To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

As I have responded in the other email, and If I understand PCI flavor 
correctly, then the issue that we need to deal with is the overlapping issue. A 
simplest case of this overlapping is that you can define a flavor F1 as 
[vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v'] .  Let's 
assume that only the admin can define the flavors. It's not hard to see that a 
device can belong to the two different flavors in the same time. This 
introduces an issue in the scheduler. Suppose the scheduler (counts or stats 
based) maintains counts based on flavors (or the keys corresponding to the 
flavors). To request a device with the flavor F1,  counts in F2 needs to be 
subtracted by one as well. There may be several ways to achieve that. But 
regardless, it introduces tremendous overhead in terms of system processing and 
administrative costs.

What are the use cases for that? How practical are those use cases?

thanks,
Robert

On 1/10/14 9:34 PM, "Ian Wells" 
mailto:ijw.ubu...@cack.org.uk>> wrote:

>
> OK - so if this is good then I think the question is how we could change the 
> 'pci_whitelist' parameter we have - which, as you say, should either *only* 
> do whitelisting or be renamed - to allow us to add information.  Yongli has 
> something along those lines but it's not flexible and it distinguishes poorly 
> between which bits are extra information and which bits are matching 
> expressions (and it's still called pci_whitelist) - but even with those 
> criticisms it's very close to what we're talking about.  When we have that I 
> think a lot of the rest of the arguments should simply resolve themselves.
>
>
>
> [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
> change to pci_whitelist is because it combined two things. So a stupid/naive 
> solution in my head is, change it to VERY generic name, 
> 'pci_devices_information',
>
> and change schema as an array of {'devices_property'=regex exp, 'group_name' 
> = 'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
> vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
> more into the "pci_devices_information" in future, like 'network_information' 
> = xxx or "Neutron specific information" you required in previous mail.

We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ match = { class = "Acme inc. discombobulator" }, info = { group = "we like 
teh groups", volume = "11" } }

>
> All keys other than 'device_property' becomes extra information, i.e. 
> software defined property. These extra information will be carried with the 
> PCI devices,. Some implementation details, A)we can limit the acceptable 
> keys, like we only support 'group_name', 'network_id', or we can accept any 
> keys other than reserved (ven

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Jiang, Yunhong

I'm not network engineer and always lost at 802.1Qbh/802.1BR specs :(  So I'd 
wait for requirement from Neutron. A quick check seems my discussion with Ian 
meet the requirement already?

Thanks
--jyh

From: Irena Berezovsky [mailto:ire...@mellanox.com]
Sent: Monday, January 13, 2014 12:51 AM
To: OpenStack Development Mailing List (not for usage questions)
Cc: Jiang, Yunhong; He, Yongli; Robert Li (baoli) (ba...@cisco.com); Sandhya 
Dasu (sadasu) (sad...@cisco.com); ijw.ubu...@cack.org.uk; j...@johngarbutt.com
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi,
After having a lot of discussions both on IRC and mailing list, I would like to 
suggest to define basic use cases for PCI pass-through network support with 
agreed list of limitations and assumptions  and implement it.  By doing this 
Proof of Concept we will be able to deliver basic PCI pass-through network 
support in Icehouse timeframe and understand better how to provide complete 
solution starting from  tenant /admin API enhancement, enhancing nova-neutron 
communication and eventually provide neutron plugin  supporting the PCI 
pass-through networking.
We can try to split tasks between currently involved participants and bring up 
the basic case. Then we can enhance the implementation.
Having more knowledge and experience with neutron parts, I would like  to start 
working on neutron mechanism driver support.  I have already started to arrange 
the following blueprint doc based on everyone's ideas:
https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit#<https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit>

For the basic PCI pass-through networking case we can assume the following:

1.   Single provider network (PN1)

2.   White list of available SRIOV PCI devices for allocation as NIC for 
neutron networks on provider network  (PN1) is defined on each compute node

3.   Support directly assigned SRIOV PCI pass-through device as vNIC. (This 
will limit the number of tests)

4.   More 

If my suggestion seems reasonable to you, let's try to reach an agreement and 
split the work during our Monday IRC meeting.

BR,
Irena

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Saturday, January 11, 2014 8:36 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Comments with prefix [yjiang5_2] , including the double confirm.

I think we (you and me) is mostly on the same page, would you please give a 
summary, and then we can have community , including Irena/Robert, to check it. 
We need Cores to sponsor it. We should check with John to see if this is 
different with his mentor picture, and we may need a neutron core (I assume 
Cisco has a bunch of Neutron cores :) )to sponsor it?

And, will anyone from Cisco can help on the implementation? After this long 
discussion, we are in half bottom of I release and I'm not sure if Yongli and I 
alone can finish them in I release.

Thanks
--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 6:34 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

>
> OK - so if this is good then I think the question is how we could change the 
> 'pci_whitelist' parameter we have - which, as you say, should either *only* 
> do whitelisting or be renamed - to allow us to add information.  Yongli has 
> something along those lines but it's not flexible and it distinguishes poorly 
> between which bits are extra information and which bits are matching 
> expressions (and it's still called pci_whitelist) - but even with those 
> criticisms it's very close to what we're talking about.  When we have that I 
> think a lot of the rest of the arguments should simply resolve themselves.
>
>
>
> [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
> change to pci_whitelist is because it combined two things. So a stupid/naive 
> solution in my head is, change it to VERY generic name, 
> 'pci_devices_information',
>
> and change schema as an array of {'devices_property'=regex exp, 'group_name' 
> = 'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
> vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
> more into the "pci_devices_information" in future, like 'network_information' 
> = xxx or "Neutron specific information" you required in previous mail.

We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ matc

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Ian Wells

It's worth noting that this makes the scheduling a computationally hard
problem. The answer to that in this scheme is to reduce the number of
inputs to trivialise the problem.  It's going to be O(f(number of flavor
types requested, number of pci_stats pools)) and if you group appropriately
there shouldn't be an excessive number of pci_stats pools.  I am not going
to stand up and say this makes it achievable - and if it doesn't them I'm
not sure that anything would make overlapping flavors achievable - but I
think it gives us some hope.
-- 
Ian.


On 13 January 2014 19:27, Jiang, Yunhong  wrote:

>  Hi, Robert, scheduler keep count based on pci_stats instead of the pci
> flavor.
>
>
>
> As stated by Ian at
> https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg13455.htmlalready,
>  the flavor will only use the tags used by pci_stats.
>
>
>
> Thanks
>
> --jyh
>
>
>
> *From:* Robert Li (baoli) [mailto:ba...@cisco.com]
> *Sent:* Monday, January 13, 2014 8:22 AM
>
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
>
>
>
> As I have responded in the other email, and If I understand PCI flavor
> correctly, then the issue that we need to deal with is the overlapping
> issue. A simplest case of this overlapping is that you can define a flavor
> F1 as [vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v']
> .  Let's assume that only the admin can define the flavors. It's not hard
> to see that a device can belong to the two different flavors in the same
> time. This introduces an issue in the scheduler. Suppose the scheduler
> (counts or stats based) maintains counts based on flavors (or the keys
> corresponding to the flavors). To request a device with the flavor F1,
>  counts in F2 needs to be subtracted by one as well. There may be several
> ways to achieve that. But regardless, it introduces tremendous overhead in
> terms of system processing and administrative costs.
>
>
>
> What are the use cases for that? How practical are those use cases?
>
>
>
> thanks,
>
> Robert
>
>
>
> On 1/10/14 9:34 PM, "Ian Wells"  wrote:
>
>
>
>
> >
> > OK - so if this is good then I think the question is how we could change
> the 'pci_whitelist' parameter we have - which, as you say, should either
> *only* do whitelisting or be renamed - to allow us to add information.
>  Yongli has something along those lines but it's not flexible and it
> distinguishes poorly between which bits are extra information and which
> bits are matching expressions (and it's still called pci_whitelist) - but
> even with those criticisms it's very close to what we're talking about.
>  When we have that I think a lot of the rest of the arguments should simply
> resolve themselves.
> >
> >
> >
> > [yjiang5_1] The reason that not easy to find a flexible/distinguishable
> change to pci_whitelist is because it combined two things. So a
> stupid/naive solution in my head is, change it to VERY generic name,
> ‘pci_devices_information’,
> >
> > and change schema as an array of {‘devices_property’=regex exp,
> ‘group_name’ = ‘g1’} dictionary, and the device_property expression can be
> ‘address ==xxx, vendor_id == xxx’ (i.e. similar with current white list),
>  and we can squeeze more into the “pci_devices_information” in future, like
> ‘network_information’ = xxx or “Neutron specific information” you required
> in previous mail.
>
>
> We're getting to the stage that an expression parser would be useful,
> annoyingly, but if we are going to try and squeeze it into JSON can I
> suggest:
>
> { match = { class = "Acme inc. discombobulator" }, info = { group = "we
> like teh groups", volume = "11" } }
>
> >
> > All keys other than ‘device_property’ becomes extra information, i.e.
> software defined property. These extra information will be carried with the
> PCI devices,. Some implementation details, A)we can limit the acceptable
> keys, like we only support ‘group_name’, ‘network_id’, or we can accept any
> keys other than reserved (vendor_id, device_id etc) one.
>
>
> Not sure we have a good list of reserved keys at the moment, and with two
> dicts it isn't really necessary, I guess.  I would say that we have one
> match parser which looks something like this:
>
> # does this PCI device match the expression given?
> def match(expression, pci_details, extra_specs):
>for (k, v) in expression:
> if k.starts_with('e.'):
>m

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Jiang, Yunhong

Hi, Robert, scheduler keep count based on pci_stats instead of the pci flavor.

As stated by Ian at 
https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg13455.html 
already, the flavor will only use the tags used by pci_stats.

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, January 13, 2014 8:22 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

As I have responded in the other email, and If I understand PCI flavor 
correctly, then the issue that we need to deal with is the overlapping issue. A 
simplest case of this overlapping is that you can define a flavor F1 as 
[vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v'] .  Let's 
assume that only the admin can define the flavors. It's not hard to see that a 
device can belong to the two different flavors in the same time. This 
introduces an issue in the scheduler. Suppose the scheduler (counts or stats 
based) maintains counts based on flavors (or the keys corresponding to the 
flavors). To request a device with the flavor F1,  counts in F2 needs to be 
subtracted by one as well. There may be several ways to achieve that. But 
regardless, it introduces tremendous overhead in terms of system processing and 
administrative costs.

What are the use cases for that? How practical are those use cases?

thanks,
Robert

On 1/10/14 9:34 PM, "Ian Wells" 
mailto:ijw.ubu...@cack.org.uk>> wrote:

>
> OK - so if this is good then I think the question is how we could change the 
> 'pci_whitelist' parameter we have - which, as you say, should either *only* 
> do whitelisting or be renamed - to allow us to add information.  Yongli has 
> something along those lines but it's not flexible and it distinguishes poorly 
> between which bits are extra information and which bits are matching 
> expressions (and it's still called pci_whitelist) - but even with those 
> criticisms it's very close to what we're talking about.  When we have that I 
> think a lot of the rest of the arguments should simply resolve themselves.
>
>
>
> [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
> change to pci_whitelist is because it combined two things. So a stupid/naive 
> solution in my head is, change it to VERY generic name, 
> 'pci_devices_information',
>
> and change schema as an array of {'devices_property'=regex exp, 'group_name' 
> = 'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
> vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
> more into the "pci_devices_information" in future, like 'network_information' 
> = xxx or "Neutron specific information" you required in previous mail.

We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ match = { class = "Acme inc. discombobulator" }, info = { group = "we like 
teh groups", volume = "11" } }

>
> All keys other than 'device_property' becomes extra information, i.e. 
> software defined property. These extra information will be carried with the 
> PCI devices,. Some implementation details, A)we can limit the acceptable 
> keys, like we only support 'group_name', 'network_id', or we can accept any 
> keys other than reserved (vendor_id, device_id etc) one.

Not sure we have a good list of reserved keys at the moment, and with two dicts 
it isn't really necessary, I guess.  I would say that we have one match parser 
which looks something like this:

# does this PCI device match the expression given?
def match(expression, pci_details, extra_specs):
   for (k, v) in expression:
if k.starts_with('e.'):
   mv = extra_specs.get(k[2:])
else:
   mv = pci_details.get(k[2:])
if not match(m, mv):
return False
return True

Usable in this matching (where 'e.' just won't work) and also for flavor 
assignment (where e. will indeed match the extra values).

> B) if a device match 'device_property' in several entries, raise exception, 
> or use the first one.

Use the first one, I think.  It's easier, and potentially more useful.

> [yjiang5_1] Another thing need discussed is, as you pointed out, "we would 
> need to add a config param on the control host to decide which flags to group 
> on when doing the stats".  I agree with the design, but some details need 
> decided.

This is a patch that can come at any point after we do the above stuff (which 
we need for Neutron), clearly.

&g

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Robert Li (baoli)

As I have responded in the other email, and If I understand PCI flavor 
correctly, then the issue that we need to deal with is the overlapping issue. A 
simplest case of this overlapping is that you can define a flavor F1 as 
[vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v'] .  Let's 
assume that only the admin can define the flavors. It's not hard to see that a 
device can belong to the two different flavors in the same time. This 
introduces an issue in the scheduler. Suppose the scheduler (counts or stats 
based) maintains counts based on flavors (or the keys corresponding to the 
flavors). To request a device with the flavor F1,  counts in F2 needs to be 
subtracted by one as well. There may be several ways to achieve that. But 
regardless, it introduces tremendous overhead in terms of system processing and 
administrative costs.

What are the use cases for that? How practical are those use cases?

thanks,
Robert

On 1/10/14 9:34 PM, "Ian Wells" 
mailto:ijw.ubu...@cack.org.uk>> wrote:

>
> OK - so if this is good then I think the question is how we could change the 
> 'pci_whitelist' parameter we have - which, as you say, should either *only* 
> do whitelisting or be renamed - to allow us to add information.  Yongli has 
> something along those lines but it's not flexible and it distinguishes poorly 
> between which bits are extra information and which bits are matching 
> expressions (and it's still called pci_whitelist) - but even with those 
> criticisms it's very close to what we're talking about.  When we have that I 
> think a lot of the rest of the arguments should simply resolve themselves.
>
>
>
> [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
> change to pci_whitelist is because it combined two things. So a stupid/naive 
> solution in my head is, change it to VERY generic name, 
> ‘pci_devices_information’,
>
> and change schema as an array of {‘devices_property’=regex exp, ‘group_name’ 
> = ‘g1’} dictionary, and the device_property expression can be ‘address ==xxx, 
> vendor_id == xxx’ (i.e. similar with current white list),  and we can squeeze 
> more into the “pci_devices_information” in future, like ‘network_information’ 
> = xxx or “Neutron specific information” you required in previous mail.

We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ match = { class = "Acme inc. discombobulator" }, info = { group = "we like 
teh groups", volume = "11" } }

>
> All keys other than ‘device_property’ becomes extra information, i.e. 
> software defined property. These extra information will be carried with the 
> PCI devices,. Some implementation details, A)we can limit the acceptable 
> keys, like we only support ‘group_name’, ‘network_id’, or we can accept any 
> keys other than reserved (vendor_id, device_id etc) one.

Not sure we have a good list of reserved keys at the moment, and with two dicts 
it isn't really necessary, I guess.  I would say that we have one match parser 
which looks something like this:

# does this PCI device match the expression given?
def match(expression, pci_details, extra_specs):
   for (k, v) in expression:
if k.starts_with('e.'):
   mv = extra_specs.get(k[2:])
else:
   mv = pci_details.get(k[2:])
if not match(m, mv):
return False
return True

Usable in this matching (where 'e.' just won't work) and also for flavor 
assignment (where e. will indeed match the extra values).

> B) if a device match ‘device_property’ in several entries, raise exception, 
> or use the first one.

Use the first one, I think.  It's easier, and potentially more useful.

> [yjiang5_1] Another thing need discussed is, as you pointed out, “we would 
> need to add a config param on the control host to decide which flags to group 
> on when doing the stats”.  I agree with the design, but some details need 
> decided.

This is a patch that can come at any point after we do the above stuff (which 
we need for Neutron), clearly.

> Where should it defined. If we a) define it in both control node and compute 
> node, then it should be static defined (just change pool_keys in 
> "/opt/stack/nova/nova/pci/pci_stats.py" to a configuration parameter) . Or b) 
> define only in control node, then I assume the control node should be the 
> scheduler node, and the scheduler manager need save such information, present 
> a API to fetch such information and the compute node need fetch it on every 
> update_available_resource() periodic task. I’d prefer to take a) option in 
> first step. Your idea?

I think it has to be (a), which is a shame.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Irena Berezovsky

Ian,
It's great news.
Thank you for bringing Bob's attention to this effort. I'll look for Bob on IRC 
to get the details.
And of course, core support raises our chances to make PCI pass-through 
networking into icehouse.

BR,
Irena

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Monday, January 13, 2014 2:02 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Irena, have a word with Bob (rkukura on IRC, East coast), he was talking about 
what would be needed already and should be able to help you.  Conveniently he's 
also core. ;)
--
Ian.

On 12 January 2014 22:12, Irena Berezovsky 
mailto:ire...@mellanox.com>> wrote:
Hi John,
Thank you for taking an initiative and summing up the work that need to be done 
to provide PCI pass-through network support.
The only item I think is missing is the neutron support for PCI pass-through. 
Currently we have Mellanox Plugin that supports PCI pass-through assuming 
Mellanox Adapter card embedded switch technology. But in order to have fully 
integrated  PCI pass-through networking support for the use cases Robert listed 
on previous mail, the generic neutron PCI pass-through support is required. 
This can be enhanced with vendor specific task that may differ (Mellanox 
Embedded switch vs Cisco 802.1BR), but there is still common part of being PCI 
aware mechanism driver.
I have already started with definition for this part:
https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit#<https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit>
I also plan to start coding soon.

Depends on how it goes, I can take also nova parts that integrate with neutron 
APIs from item 3.

Regards,
Irena

-Original Message-
From: John Garbutt [mailto:j...@johngarbutt.com<mailto:j...@johngarbutt.com>]
Sent: Friday, January 10, 2014 4:34 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support
Apologies for this top post, I just want to move this discussion towards action.

I am traveling next week so it is unlikely that I can make the meetings. Sorry.

Can we please agree on some concrete actions, and who will do the coding?
This also means raising new blueprints for each item of work.
I am happy to review and eventually approve those blueprints, if you email me 
directly.

Ideas are taken from what we started to agree on, mostly written up here:
https://wiki.openstack.org/wiki/Meetings/Passthrough#Definitions


What doesn't need doing...


We have PCI whitelist and PCI alias at the moment, let keep those names the 
same for now.
I personally prefer PCI-flavor, rather than PCI-alias, but lets discuss any 
rename separately.

We seemed happy with the current system (roughly) around GPU passthrough:
nova flavor-key  set "pci_passthrough:alias"=" 
large_GPU:1,small_GPU:2"
nova boot --image some_image --flavor  

Again, we seemed happy with the current PCI whitelist.

Sure, we could optimise the scheduling, but again, please keep that a separate 
discussion.
Something in the scheduler needs to know how many of each PCI alias are 
available on each host.
How that information gets there can be change at a later date.

PCI alias is in config, but its probably better defined using host aggregates, 
or some custom API.
But lets leave that for now, and discuss it separately.
If the need arrises, we can migrate away from the config.


What does need doing...
==

1) API & CLI changes for "nic-type", and associated tempest tests

* Add a user visible "nic-type" so users can express on of several network 
types.
* We need a default nic-type, for when the user doesn't specify one (might 
default to SRIOV in some cases)
* We can easily test the case where the default is virtual and the user 
expresses a preference for virtual
* Above is much better than not testing it at all.

nova boot --flavor m1.large --image 
  --nic net-id=
  --nic net-id=,nic-type=fast
  --nic net-id=,nic-type=fast 

or

neutron port-create
  --fixed-ip subnet_id=,ip_address=192.168.57.101
  --nic-type=
  
nova boot --flavor m1.large --image  --nic port-id=

Where nic-type is just an extra bit metadata string that is passed to nova and 
the VIF driver.


2) Expand PCI alias information

We need extensions to PCI alias so we can group SRIOV devices better.

I still think we are yet to agree on a format, but I would suggest this as a 
starting point:

{
 "name":"GPU_fast",
 devices:[
  {"vendor_id":"1137","product_id":"0071", address:"*", "attach-type":"direct"},
  {"vendor_id":"1137","product_id":"0072", address:"*

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Ian Wells

Irena, have a word with Bob (rkukura on IRC, East coast), he was talking
about what would be needed already and should be able to help you.
Conveniently he's also core. ;)
-- 
Ian.


On 12 January 2014 22:12, Irena Berezovsky  wrote:

> Hi John,
> Thank you for taking an initiative and summing up the work that need to be
> done to provide PCI pass-through network support.
> The only item I think is missing is the neutron support for PCI
> pass-through. Currently we have Mellanox Plugin that supports PCI
> pass-through assuming Mellanox Adapter card embedded switch technology. But
> in order to have fully integrated  PCI pass-through networking support for
> the use cases Robert listed on previous mail, the generic neutron PCI
> pass-through support is required. This can be enhanced with vendor specific
> task that may differ (Mellanox Embedded switch vs Cisco 802.1BR), but there
> is still common part of being PCI aware mechanism driver.
> I have already started with definition for this part:
>
> https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit#
> I also plan to start coding soon.
>
> Depends on how it goes, I can take also nova parts that integrate with
> neutron APIs from item 3.
>
> Regards,
> Irena
>
> -Original Message-
> From: John Garbutt [mailto:j...@johngarbutt.com]
> Sent: Friday, January 10, 2014 4:34 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
>
> Apologies for this top post, I just want to move this discussion towards
> action.
>
> I am traveling next week so it is unlikely that I can make the meetings.
> Sorry.
>
> Can we please agree on some concrete actions, and who will do the coding?
> This also means raising new blueprints for each item of work.
> I am happy to review and eventually approve those blueprints, if you email
> me directly.
>
> Ideas are taken from what we started to agree on, mostly written up here:
> https://wiki.openstack.org/wiki/Meetings/Passthrough#Definitions
>
>
> What doesn't need doing...
> 
>
> We have PCI whitelist and PCI alias at the moment, let keep those names
> the same for now.
> I personally prefer PCI-flavor, rather than PCI-alias, but lets discuss
> any rename separately.
>
> We seemed happy with the current system (roughly) around GPU passthrough:
> nova flavor-key  set "pci_passthrough:alias"="
> large_GPU:1,small_GPU:2"
> nova boot --image some_image --flavor  
>
> Again, we seemed happy with the current PCI whitelist.
>
> Sure, we could optimise the scheduling, but again, please keep that a
> separate discussion.
> Something in the scheduler needs to know how many of each PCI alias are
> available on each host.
> How that information gets there can be change at a later date.
>
> PCI alias is in config, but its probably better defined using host
> aggregates, or some custom API.
> But lets leave that for now, and discuss it separately.
> If the need arrises, we can migrate away from the config.
>
>
> What does need doing...
> ==
>
> 1) API & CLI changes for "nic-type", and associated tempest tests
>
> * Add a user visible "nic-type" so users can express on of several network
> types.
> * We need a default nic-type, for when the user doesn't specify one (might
> default to SRIOV in some cases)
> * We can easily test the case where the default is virtual and the user
> expresses a preference for virtual
> * Above is much better than not testing it at all.
>
> nova boot --flavor m1.large --image 
>   --nic net-id=
>   --nic net-id=,nic-type=fast
>   --nic net-id=,nic-type=fast 
>
> or
>
> neutron port-create
>   --fixed-ip subnet_id=,ip_address=192.168.57.101
>   --nic-type=
>   
> nova boot --flavor m1.large --image  --nic port-id=
>
> Where nic-type is just an extra bit metadata string that is passed to nova
> and the VIF driver.
>
>
> 2) Expand PCI alias information
>
> We need extensions to PCI alias so we can group SRIOV devices better.
>
> I still think we are yet to agree on a format, but I would suggest this as
> a starting point:
>
> {
>  "name":"GPU_fast",
>  devices:[
>   {"vendor_id":"1137","product_id":"0071", address:"*",
> "attach-type":"direct"},
>   {"vendor_id":"1137","product_id":"0072", address:"*",
> "attach-type":"direct"}  ],
>  sriov_info: {}
> }
>
> {
>  "name":&q

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Irena Berezovsky

Hi,
After having a lot of discussions both on IRC and mailing list, I would like to 
suggest to define basic use cases for PCI pass-through network support with 
agreed list of limitations and assumptions  and implement it.  By doing this 
Proof of Concept we will be able to deliver basic PCI pass-through network 
support in Icehouse timeframe and understand better how to provide complete 
solution starting from  tenant /admin API enhancement, enhancing nova-neutron 
communication and eventually provide neutron plugin  supporting the PCI 
pass-through networking.
We can try to split tasks between currently involved participants and bring up 
the basic case. Then we can enhance the implementation.
Having more knowledge and experience with neutron parts, I would like  to start 
working on neutron mechanism driver support.  I have already started to arrange 
the following blueprint doc based on everyone's ideas:
https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit#<https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit>

For the basic PCI pass-through networking case we can assume the following:

1.   Single provider network (PN1)

2.   White list of available SRIOV PCI devices for allocation as NIC for 
neutron networks on provider network  (PN1) is defined on each compute node

3.   Support directly assigned SRIOV PCI pass-through device as vNIC. (This 
will limit the number of tests)

4.   More 


If my suggestion seems reasonable to you, let's try to reach an agreement and 
split the work during our Monday IRC meeting.

BR,
Irena

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Saturday, January 11, 2014 8:36 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Comments with prefix [yjiang5_2] , including the double confirm.

I think we (you and me) is mostly on the same page, would you please give a 
summary, and then we can have community , including Irena/Robert, to check it. 
We need Cores to sponsor it. We should check with John to see if this is 
different with his mentor picture, and we may need a neutron core (I assume 
Cisco has a bunch of Neutron cores :) )to sponsor it?

And, will anyone from Cisco can help on the implementation? After this long 
discussion, we are in half bottom of I release and I'm not sure if Yongli and I 
alone can finish them in I release.

Thanks
--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 6:34 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support


>
> OK - so if this is good then I think the question is how we could change the 
> 'pci_whitelist' parameter we have - which, as you say, should either *only* 
> do whitelisting or be renamed - to allow us to add information.  Yongli has 
> something along those lines but it's not flexible and it distinguishes poorly 
> between which bits are extra information and which bits are matching 
> expressions (and it's still called pci_whitelist) - but even with those 
> criticisms it's very close to what we're talking about.  When we have that I 
> think a lot of the rest of the arguments should simply resolve themselves.
>
>
>
> [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
> change to pci_whitelist is because it combined two things. So a stupid/naive 
> solution in my head is, change it to VERY generic name, 
> 'pci_devices_information',
>
> and change schema as an array of {'devices_property'=regex exp, 'group_name' 
> = 'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
> vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
> more into the "pci_devices_information" in future, like 'network_information' 
> = xxx or "Neutron specific information" you required in previous mail.


We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ match = { class = "Acme inc. discombobulator" }, info = { group = "we like 
teh groups", volume = "11" } }

[yjiang5_2] Double confirm that 'match' is whitelist, and info is 'extra info', 
right?  Can the key be more meaningful, for example, 
s/match/pci_device_property,  s/info/pci_device_info, or s/match/pci_devices/  
etc.
Also assume the class should be the class code in the configuration space, and 
be digital, am I right? Otherwise, it's not easy to get the 'Acme inc. 
discombobulator' information.


>
> All keys other than 'devi

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-12 Thread Irena Berezovsky

Hi John,
Thank you for taking an initiative and summing up the work that need to be done 
to provide PCI pass-through network support.
The only item I think is missing is the neutron support for PCI pass-through. 
Currently we have Mellanox Plugin that supports PCI pass-through assuming 
Mellanox Adapter card embedded switch technology. But in order to have fully 
integrated  PCI pass-through networking support for the use cases Robert listed 
on previous mail, the generic neutron PCI pass-through support is required. 
This can be enhanced with vendor specific task that may differ (Mellanox 
Embedded switch vs Cisco 802.1BR), but there is still common part of being PCI 
aware mechanism driver. 
I have already started with definition for this part:
https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit#
I also plan to start coding soon.

Depends on how it goes, I can take also nova parts that integrate with neutron 
APIs from item 3.
 
Regards,
Irena

-Original Message-
From: John Garbutt [mailto:j...@johngarbutt.com] 
Sent: Friday, January 10, 2014 4:34 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Apologies for this top post, I just want to move this discussion towards action.

I am traveling next week so it is unlikely that I can make the meetings. Sorry.

Can we please agree on some concrete actions, and who will do the coding?
This also means raising new blueprints for each item of work.
I am happy to review and eventually approve those blueprints, if you email me 
directly.

Ideas are taken from what we started to agree on, mostly written up here:
https://wiki.openstack.org/wiki/Meetings/Passthrough#Definitions


What doesn't need doing...


We have PCI whitelist and PCI alias at the moment, let keep those names the 
same for now.
I personally prefer PCI-flavor, rather than PCI-alias, but lets discuss any 
rename separately.

We seemed happy with the current system (roughly) around GPU passthrough:
nova flavor-key  set "pci_passthrough:alias"=" 
large_GPU:1,small_GPU:2"
nova boot --image some_image --flavor  

Again, we seemed happy with the current PCI whitelist.

Sure, we could optimise the scheduling, but again, please keep that a separate 
discussion.
Something in the scheduler needs to know how many of each PCI alias are 
available on each host.
How that information gets there can be change at a later date.

PCI alias is in config, but its probably better defined using host aggregates, 
or some custom API.
But lets leave that for now, and discuss it separately.
If the need arrises, we can migrate away from the config.


What does need doing...
==

1) API & CLI changes for "nic-type", and associated tempest tests

* Add a user visible "nic-type" so users can express on of several network 
types.
* We need a default nic-type, for when the user doesn't specify one (might 
default to SRIOV in some cases)
* We can easily test the case where the default is virtual and the user 
expresses a preference for virtual
* Above is much better than not testing it at all.

nova boot --flavor m1.large --image 
  --nic net-id=
  --nic net-id=,nic-type=fast
  --nic net-id=,nic-type=fast 

or

neutron port-create
  --fixed-ip subnet_id=,ip_address=192.168.57.101
  --nic-type=
  
nova boot --flavor m1.large --image  --nic port-id=

Where nic-type is just an extra bit metadata string that is passed to nova and 
the VIF driver.


2) Expand PCI alias information

We need extensions to PCI alias so we can group SRIOV devices better.

I still think we are yet to agree on a format, but I would suggest this as a 
starting point:

{
 "name":"GPU_fast",
 devices:[
  {"vendor_id":"1137","product_id":"0071", address:"*", "attach-type":"direct"},
  {"vendor_id":"1137","product_id":"0072", address:"*", "attach-type":"direct"} 
 ],
 sriov_info: {}
}

{
 "name":"NIC_fast",
 devices:[
  {"vendor_id":"1137","product_id":"0071", address:"0:[1-50]:2:*", 
"attach-type":"macvtap"}
  {"vendor_id":"1234","product_id":"0081", address:"*", "attach-type":"direct"} 
 ],
 sriov_info: {
  "nic_type":"fast",
  "network_ids": ["net-id-1", "net-id-2"]  } }

{
 "name":"NIC_slower",
 devices:[
  {"vendor_id":"1137","product_id":"0071", address:"*", "attach-type":"direct"}
  {"vendor_id":"1234","product_id":"0081", address:"*

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Jiang, Yunhong

Comments with prefix [yjiang5_2] , including the double confirm.

I think we (you and me) is mostly on the same page, would you please give a 
summary, and then we can have community , including Irena/Robert, to check it. 
We need Cores to sponsor it. We should check with John to see if this is 
different with his mentor picture, and we may need a neutron core (I assume 
Cisco has a bunch of Neutron cores :) )to sponsor it?

And, will anyone from Cisco can help on the implementation? After this long 
discussion, we are in half bottom of I release and I'm not sure if Yongli and I 
alone can finish them in I release.

Thanks
--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 6:34 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support


>
> OK - so if this is good then I think the question is how we could change the 
> 'pci_whitelist' parameter we have - which, as you say, should either *only* 
> do whitelisting or be renamed - to allow us to add information.  Yongli has 
> something along those lines but it's not flexible and it distinguishes poorly 
> between which bits are extra information and which bits are matching 
> expressions (and it's still called pci_whitelist) - but even with those 
> criticisms it's very close to what we're talking about.  When we have that I 
> think a lot of the rest of the arguments should simply resolve themselves.
>
>
>
> [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
> change to pci_whitelist is because it combined two things. So a stupid/naive 
> solution in my head is, change it to VERY generic name, 
> 'pci_devices_information',
>
> and change schema as an array of {'devices_property'=regex exp, 'group_name' 
> = 'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
> vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
> more into the "pci_devices_information" in future, like 'network_information' 
> = xxx or "Neutron specific information" you required in previous mail.


We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ match = { class = "Acme inc. discombobulator" }, info = { group = "we like 
teh groups", volume = "11" } }

[yjiang5_2] Double confirm that 'match' is whitelist, and info is 'extra info', 
right?  Can the key be more meaningful, for example, 
s/match/pci_device_property,  s/info/pci_device_info, or s/match/pci_devices/  
etc.
Also assume the class should be the class code in the configuration space, and 
be digital, am I right? Otherwise, it's not easy to get the 'Acme inc. 
discombobulator' information.


>
> All keys other than 'device_property' becomes extra information, i.e. 
> software defined property. These extra information will be carried with the 
> PCI devices,. Some implementation details, A)we can limit the acceptable 
> keys, like we only support 'group_name', 'network_id', or we can accept any 
> keys other than reserved (vendor_id, device_id etc) one.


Not sure we have a good list of reserved keys at the moment, and with two dicts 
it isn't really necessary, I guess.  I would say that we have one match parser 
which looks something like this:

# does this PCI device match the expression given?
def match(expression, pci_details, extra_specs):
   for (k, v) in expression:
if k.starts_with('e.'):
   mv = extra_specs.get(k[2:])
else:
   mv = pci_details.get(k[2:])
if not match(m, mv):
return False
return True

Usable in this matching (where 'e.' just won't work) and also for flavor 
assignment (where e. will indeed match the extra values).
[yjiang5_2] I think if we use same function to check or use two functions for 
match/flavor will be implementation detail and can be discussed in next step. 
Of course, we should always avoid code duplication.


> B) if a device match 'device_property' in several entries, raise exception, 
> or use the first one.

Use the first one, I think.  It's easier, and potentially more useful.
[yjiang5] good.


> [yjiang5_1] Another thing need discussed is, as you pointed out, "we would 
> need to add a config param on the control host to decide which flags to group 
> on when doing the stats".  I agree with the design, but some details need 
> decided.

This is a patch that can come at any point after we do the above stuff (which 
we need for Neutron), clearly.

> Where should it defined. I

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells

>
> OK - so if this is good then I think the question is how we could change
the 'pci_whitelist' parameter we have - which, as you say, should either
*only* do whitelisting or be renamed - to allow us to add information.
 Yongli has something along those lines but it's not flexible and it
distinguishes poorly between which bits are extra information and which
bits are matching expressions (and it's still called pci_whitelist) - but
even with those criticisms it's very close to what we're talking about.
 When we have that I think a lot of the rest of the arguments should simply
resolve themselves.
>
>
>
> [yjiang5_1] The reason that not easy to find a flexible/distinguishable
change to pci_whitelist is because it combined two things. So a
stupid/naive solution in my head is, change it to VERY generic name,
‘pci_devices_information’,
>
> and change schema as an array of {‘devices_property’=regex exp,
‘group_name’ = ‘g1’} dictionary, and the device_property expression can be
‘address ==xxx, vendor_id == xxx’ (i.e. similar with current white list),
 and we can squeeze more into the “pci_devices_information” in future, like
‘network_information’ = xxx or “Neutron specific information” you required
in previous mail.


We're getting to the stage that an expression parser would be useful,
annoyingly, but if we are going to try and squeeze it into JSON can I
suggest:

{ match = { class = "Acme inc. discombobulator" }, info = { group = "we
like teh groups", volume = "11" } }

>
> All keys other than ‘device_property’ becomes extra information, i.e.
software defined property. These extra information will be carried with the
PCI devices,. Some implementation details, A)we can limit the acceptable
keys, like we only support ‘group_name’, ‘network_id’, or we can accept any
keys other than reserved (vendor_id, device_id etc) one.


Not sure we have a good list of reserved keys at the moment, and with two
dicts it isn't really necessary, I guess.  I would say that we have one
match parser which looks something like this:

# does this PCI device match the expression given?
def match(expression, pci_details, extra_specs):
   for (k, v) in expression:
if k.starts_with('e.'):
   mv = extra_specs.get(k[2:])
else:
   mv = pci_details.get(k[2:])
if not match(m, mv):
return False
return True

Usable in this matching (where 'e.' just won't work) and also for flavor
assignment (where e. will indeed match the extra values).

> B) if a device match ‘device_property’ in several entries, raise
exception, or use the first one.

Use the first one, I think.  It's easier, and potentially more useful.

> [yjiang5_1] Another thing need discussed is, as you pointed out, “we
would need to add a config param on the control host to decide which flags
to group on when doing the stats”.  I agree with the design, but some
details need decided.

This is a patch that can come at any point after we do the above stuff
(which we need for Neutron), clearly.

> Where should it defined. If we a) define it in both control node and
compute node, then it should be static defined (just change pool_keys in
"/opt/stack/nova/nova/pci/pci_stats.py" to a configuration parameter) . Or
b) define only in control node, then I assume the control node should be
the scheduler node, and the scheduler manager need save such information,
present a API to fetch such information and the compute node need fetch it
on every update_available_resource() periodic task. I’d prefer to take a)
option in first step. Your idea?

I think it has to be (a), which is a shame.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Jiang, Yunhong

I have to use [yjiang5_1] prefix now :)

--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 3:55 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

On 11 January 2014 00:04, Jiang, Yunhong 
mailto:yunhong.ji...@intel.com>> wrote:
[yjiang5] Really thanks for the summary and it is quite clear. So what's the 
object of "equivalent devices at host level"? Because 'equivalent device * to 
an end user *" is flavor, so is it 'equivalent to *scheduler*" or 'equivalent 
to *xxx*'? If equivalent to scheduler, then I'd take the pci_stats as a 
flexible group for scheduler

To the scheduler, indeed.  And with the group proposal the scheduler and end 
user equivalences are one and the same.
[yjiang5_1] Once use the proposal, then we missed the flexible for 'end user 
equivalences" and that's the reason I'm against the group :)

Secondly, for your definition of 'whitelist', I'm hesitate to your '*and*' 
because IMHO, 'and' means mixed two things together, otherwise, we can state in 
simply one sentence. For example, I prefer to have another configuration option 
to define the 'put devices in the group', or, if we extend it , be "define 
extra information like 'group name' for devices".

I'm not stating what we should do, or what the definitions should mean; I'm 
saying how they've been interpreted as weve discussed this in the past.  We've 
had issues in the past where we've had continuing difficulties in describing 
anything without coming back to a 'whitelist' (generally meaning 'matching 
expression, as an actual 'whitelist' is implied, rather than separately 
required, in a grouping system.
 Bearing in mind what you said about scheduling, and if we skip 'group' for a 
moment, then can I suggest (or possibly restate, because your comments are 
pointing in this direction):
- we allow extra information to be added at what is now the whitelisting stage, 
that just gets carried around with the device
[yjiang5] For 'added at ... whitelisting stage', see my above statement about 
the configuration. However, if you do want to use whitelist, I'm ok, but please 
keep in mind that it's two functionality combined: device you may assign *and* 
the group name for these devices.

Indeed - which is in fact what we've been proposing all along.

- when we're turning devices into flavors, we can also match on that extra 
information if we want (which means we can tag up the devices on the compute 
node if we like, according to taste, and then bundle them up by tag to make 
flavors; or we can add Neutron specific information and ignore it when making 
flavors)
[yjiang5] Agree. Currently we can only use vendor_id and device_id for 
flavor/alias, but we can extend it to cover such extra information since now 
it's a API.

- we would need to add a config param on the control host to decide which flags 
to group on when doing the stats (and they would additionally be the only 
params that would work for flavors, I think)
[yjiang5] Agree. And this is achievable because we switch the flavor to be API, 
then we can control the flavor creation process.

OK - so if this is good then I think the question is how we could change the 
'pci_whitelist' parameter we have - which, as you say, should either *only* do 
whitelisting or be renamed - to allow us to add information.  Yongli has 
something along those lines but it's not flexible and it distinguishes poorly 
between which bits are extra information and which bits are matching 
expressions (and it's still called pci_whitelist) - but even with those 
criticisms it's very close to what we're talking about.  When we have that I 
think a lot of the rest of the arguments should simply resolve themselves.

[yjiang5_1] The reason that not easy to find a flexible/distinguishable change 
to pci_whitelist is because it combined two things. So a stupid/naive solution 
in my head is, change it to VERY generic name, 'pci_devices_information', and 
change schema as an array of {'devices_property'=regex exp, 'group_name' = 
'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
more into the "pci_devices_information" in future, like 'network_information' = 
xxx or "Neutron specific information" you required in previous mail. All keys 
other than 'device_property' becomes extra information, i.e. software defined 
property. These extra information will be carried with the PCI devices,. Some 
implementation details, A)we can limit

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells

On 11 January 2014 00:04, Jiang, Yunhong  wrote:

>  [yjiang5] Really thanks for the summary and it is quite clear. So what’s
> the object of “equivalent devices at host level”? Because ‘equivalent
> device * to an end user *” is flavor, so is it ‘equivalent to **scheduler**”
> or ‘equivalent to **xxx**’? If equivalent to scheduler, then I’d take the
> pci_stats as a flexible group for scheduler
>

To the scheduler, indeed.  And with the group proposal the scheduler and
end user equivalences are one and the same.


> Secondly, for your definition of ‘whitelist’, I’m hesitate to your ‘*and*’
> because IMHO, ‘and’ means mixed two things together, otherwise, we can
> state in simply one sentence. For example, I prefer to have another
> configuration option to define the ‘put devices in the group’, or, if we
> extend it , be “define extra information like ‘group name’ for devices”.
>

I'm not stating what we should do, or what the definitions should mean; I'm
saying how they've been interpreted as weve discussed this in the past.
We've had issues in the past where we've had continuing difficulties in
describing anything without coming back to a 'whitelist' (generally meaning
'matching expression, as an actual 'whitelist' is implied, rather than
separately required, in a grouping system.

  Bearing in mind what you said about scheduling, and if we skip 'group'
> for a moment, then can I suggest (or possibly restate, because your
> comments are pointing in this direction):
>
> - we allow extra information to be added at what is now the whitelisting
> stage, that just gets carried around with the device
>
> [yjiang5] For ‘added at … whitelisting stage’, see my above statement
> about the configuration. However, if you do want to use whitelist, I’m ok,
> but please keep in mind that it’s two functionality combined: device you
> may assign **and** the group name for these devices.
>

Indeed - which is in fact what we've been proposing all along.


>
> - when we're turning devices into flavors, we can also match on that extra
> information if we want (which means we can tag up the devices on the
> compute node if we like, according to taste, and then bundle them up by tag
> to make flavors; or we can add Neutron specific information and ignore it
> when making flavors)
>
> [yjiang5] Agree. Currently we can only use vendor_id and device_id for
> flavor/alias, but we can extend it to cover such extra information since
> now it’s a API.
>
>
>
> - we would need to add a config param on the control host to decide which
> flags to group on when doing the stats (and they would additionally be the
> only params that would work for flavors, I think)
>
> [yjiang5] Agree. And this is achievable because we switch the flavor to be
> API, then we can control the flavor creation process.
>

OK - so if this is good then I think the question is how we could change
the 'pci_whitelist' parameter we have - which, as you say, should either
*only* do whitelisting or be renamed - to allow us to add information.
Yongli has something along those lines but it's not flexible and it
distinguishes poorly between which bits are extra information and which
bits are matching expressions (and it's still called pci_whitelist) - but
even with those criticisms it's very close to what we're talking about.
When we have that I think a lot of the rest of the arguments should simply
resolve themselves.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Jiang, Yunhong

Ian, thanks for your reply. Please check comments prefix with [yjiang5].

Thanks
--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 12:17 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hey Yunhong,

The thing about 'group' and 'flavor' and 'whitelist' is that they once meant 
distinct things (and I think we've been trying to reduce them back from three 
things to two or one):

- group: equivalent devices at a host level - use any one, no-one will care, 
because they're either identical or as near as makes no difference
- flavor: equivalent devices to an end user - we may re-evaluate our offerings 
and group them differently on the fly
- whitelist: either 'something to match the devices you may assign' 
(originally) or 'something to match the devices you may assign *and* put them 
in the group (in the group proposal)

[yjiang5] Really thanks for the summary and it is quite clear. So what's the 
object of "equivalent devices at host level"? Because 'equivalent device * to 
an end user *" is flavor, so is it 'equivalent to *scheduler*" or 'equivalent 
to *xxx*'? If equivalent to scheduler, then I'd take the pci_stats as a 
flexible group for scheduler, and I'd think 'equivalent for scheduler' as a 
restriction for 'equivalent to end user' because of performance issue, 
otherwise, it's needless.   Secondly, for your definition of 'whitelist', I'm 
hesitate to your '*and*' because IMHO, 'and' means mixed two things together, 
otherwise, we can state in simply one sentence. For example, I prefer to have 
another configuration option to define the 'put devices in the group', or, if 
we extend it , be "define extra information like 'group name' for devices".

Bearing in mind what you said about scheduling, and if we skip 'group' for a 
moment, then can I suggest (or possibly restate, because your comments are 
pointing in this direction):
- we allow extra information to be added at what is now the whitelisting stage, 
that just gets carried around with the device
[yjiang5] For 'added at ... whitelisting stage', see my above statement about 
the configuration. However, if you do want to use whitelist, I'm ok, but please 
keep in mind that it's two functionality combined: device you may assign *and* 
the group name for these devices.

- when we're turning devices into flavors, we can also match on that extra 
information if we want (which means we can tag up the devices on the compute 
node if we like, according to taste, and then bundle them up by tag to make 
flavors; or we can add Neutron specific information and ignore it when making 
flavors)
[yjiang5] Agree. Currently we can only use vendor_id and device_id for 
flavor/alias, but we can extend it to cover such extra information since now 
it's a API.

- we would need to add a config param on the control host to decide which flags 
to group on when doing the stats (and they would additionally be the only 
params that would work for flavors, I think)
[yjiang5] Agree. And this is achievable because we switch the flavor to be API, 
then we can control the flavor creation process.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells

Hey Yunhong,

The thing about 'group' and 'flavor' and 'whitelist' is that they once
meant distinct things (and I think we've been trying to reduce them back
from three things to two or one):

- group: equivalent devices at a host level - use any one, no-one will
care, because they're either identical or as near as makes no difference
- flavor: equivalent devices to an end user - we may re-evaluate our
offerings and group them differently on the fly
- whitelist: either 'something to match the devices you may assign'
(originally) or 'something to match the devices you may assign *and* put
them in the group (in the group proposal)

Bearing in mind what you said about scheduling, and if we skip 'group' for
a moment, then can I suggest (or possibly restate, because your comments
are pointing in this direction):

- we allow extra information to be added at what is now the whitelisting
stage, that just gets carried around with the device
- when we're turning devices into flavors, we can also match on that extra
information if we want (which means we can tag up the devices on the
compute node if we like, according to taste, and then bundle them up by tag
to make flavors; or we can add Neutron specific information and ignore it
when making flavors)
- we would need to add a config param on the control host to decide which
flags to group on when doing the stats (and they would additionally be the
only params that would work for flavors, I think)
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells

On 10 January 2014 15:30, John Garbutt  wrote:

> We seemed happy with the current system (roughly) around GPU passthrough:
> nova flavor-key  set
> "pci_passthrough:alias"=" large_GPU:1,small_GPU:2"
> nova boot --image some_image --flavor  
>

Actually, I think we pretty solidly disagree on this point.  On the other
hand, Yongli's current patch (with pci_flavor in the whitelist) is pretty
OK.

> nova boot --flavor m1.large --image 
>   --nic net-id=
>   --nic net-id=,nic-type=fast

  --nic net-id=,nic-type=fast 
>

With flavor defined (wherever it's defined):

nova boot ..
   --nic net-id=,pci-flavor=xxx# ok, presumably defaults to
PCI passthrough
   --nic net-id=,pci-flavor=xxx,vnic-attach=macvtap # ok
   --nic net-id= # ok - no flavor = vnic
   --nic port-id=,pci-flavor=xxx# ok, gets vnic-attach from
port
   --nic port-id= # ok - no flavor = vnic

> or
>
> neutron port-create
>   --fixed-ip subnet_id=,ip_address=192.168.57.101
>   --nic-type=
>   
> nova boot --flavor m1.large --image  --nic port-id=
>

No, I think not - specifically because flavors are a nova concept and not a
neutron one, so putting them on the port is inappropriate. Conversely,
vnic-attach is a Neutron concept (fine, nova implements it, but Neutron
tells it how) so I think it *is* a port field, and we'd just set it on the
newly created port when doing nova boot ..,vnic-attach=thing

2) Expand PCI alias information

{
>  "name":"NIC_fast",
>   sriov_info: {
>   "nic_type":"fast",
>
  "network_ids": ["net-id-1", "net-id-2"]
>

Why can't we use the flavor name in --nic (because multiple flavors might
be on one NIC type, I guess)?  Where does e.g. switch/port information go,
particularly since it's per-device (not per-group) and non-scheduling?

I think the issue here is that you assume we group by flavor, then add
extra info, then group into a NIC group.  But for a lot of use cases there
is information that differs on every NIC port, so it makes more sense to
add extra info to a device, then group into flavor and that can also be
used for the --nic.

network_ids is interesting, but this is a nova config file and network_ids
are (a) from Neutron (b) ephemeral, so we can't put them in config.  They
could be provider network names, but that's not the same thing as a neutron
network name and not easily discoverable, outside of Neutron i.e. before
scheduling.

Again, Yongli's current change with pci-flavor in the whitelist records
leads to a reasonable way to how to make this work here, I think;
straightforward extra_info would be fine (though perhaps nice if it's
easier to spot it as of a different type from the whitelist regex fields).
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Jiang, Yunhong

Brian, the issue of 'class name' is because currently the libvirt does not 
provide such information, otherwise we are glad to add that :(
But this is a good point and we have considered already. One solution is to 
retrieve it through some code like read the configuration space directly. But 
that's not so easy especially considering the different platform has different 
method to get the configuration space. A workaround (at least in first step) is 
to use the user defined property, so that user can define it through 
configuration space.

The issue to udev is, it's linux specific, and it may even various in different 
distribution.

Thanks
--jyh

From: Brian Schott [mailto:brian.sch...@nimbisservices.com]
Sent: Thursday, January 09, 2014 11:19 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Ian,

The idea of pci flavors is a great and using vendor_id and product_id make 
sense, but I could see a case for adding the class name such as 'VGA compatible 
controller'. Otherwise, slightly different generations of hardware will mean 
custom whitelist setups on each compute node.

01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)

On the flip side, vendor_id and product_id might not be sufficient.  Suppose I 
have two identical NICs, one for nova internal use and the second for guest 
tenants?  So, bus numbering may be required.

01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)

Some possible combinations:

# take 2 gpus
pci_passthrough_whitelist=[
 { "vendor_id":"NVIDIA Corporation G71","product_id":"GeForce 7900 GTX", 
"name":"GPU"},
]

# only take the GPU on PCI 2
pci_passthrough_whitelist=[
 { "vendor_id":"NVIDIA Corporation G71","product_id":"GeForce 7900 GTX", 
'bus_id': '02:', "name":"GPU"},
]
pci_passthrough_whitelist=[
 {"bus_id": "01:00.0", "name": "GPU"},
 {"bus_id": "02:00.0", "name": "GPU"},
]

pci_passthrough_whitelist=[
 {"class": "VGA compatible controller", "name": "GPU"},
]

pci_passthrough_whitelist=[
 { "product_id":"GeForce 7900 GTX", "name":"GPU"},
]

I know you guys are thinking of PCI devices, but any though of mapping to 
something like udev rather than pci?  Supporting udev rules might be easier and 
more robust rather than making something up.

Brian

-
Brian Schott, CTO
Nimbis Services, Inc.
brian.sch...@nimbisservices.com<mailto:brian.sch...@nimbisservices.com>
ph: 443-274-6064  fx: 443-274-6060



On Jan 9, 2014, at 12:47 PM, Ian Wells 
mailto:ijw.ubu...@cack.org.uk>> wrote:


I think I'm in agreement with all of this.  Nice summary, Robert.
It may not be where the work ends, but if we could get this done the rest is 
just refinement.

On 9 January 2014 17:49, Robert Li (baoli) 
mailto:ba...@cisco.com>> wrote:

Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
"vendor_id":"","product_id":""}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={"vendor_id":"", "product_id":"", "name":"str"}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:"pci_passthrough:alias"="name:count"

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitel

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Jiang, Yunhong

Ian, thanks for your reply. Please check my response prefix with 'yjiang5'.

--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 4:08 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

On 10 January 2014 07:40, Jiang, Yunhong 
mailto:yunhong.ji...@intel.com>> wrote:
Robert, sorry that I'm not fan of * your group * term. To me, *your group" 
mixed two thing. It's an extra property provided by configuration, and also 
it's a very-not-flexible mechanism to select devices (you can only select 
devices based on the 'group name' property).

It is exactly that.  It's 0 new config items, 0 new APIs, just an extra tag on 
the whitelists that are already there (although the proposal suggests changing 
the name of them to be more descriptive of what they now do).  And you talk 
about flexibility as if this changes frequently, but in fact the grouping / 
aliasing of devices almost never changes after installation, which is, not 
coincidentally, when the config on the compute nodes gets set up.

1)   A dynamic group is much better. For example, user may want to select 
GPU device based on vendor id, or based on vendor_id+device_id. In another 
word, user want to create group based on vendor_id, or vendor_id+device_id and 
select devices from these group.  John's proposal is very good, to provide an 
API to create the PCI flavor(or alias). I prefer flavor because it's more 
openstack style.
I disagree with this.  I agree that what you're saying offers a more 
flexibilibility after initial installation but I have various issues with it.
[yjiang5] I think you talking is mostly about white list, instead of PCI 
flavor. PCI flavor is more about PCI request, like I want to have a device with 
"vendor_id = cisco, device_id= 15454E", or 'vendor_id=intel device_class=nic' , 
( because the image have the driver for all Intel NIC card :)  ). While 
whitelist is to decide the device that is assignable in a host.
"

This is directly related to the hardware configuation on each compute node.  
For (some) other things of this nature, like provider networks, the compute 
node is the only thing that knows what it has attached to it, and it is the 
store (in configuration) of that information.  If I add a new compute node then 
it's my responsibility to configure it correctly on attachment, but when I add 
a compute node (when I'm setting the cluster up, or sometime later on) then 
it's at that precise point that I know how I've attached it and what hardware 
it's got on it.  Also, it's at this that point in time that I write out the 
configuration file (not by hand, note; there's almost certainly automation when 
configuring hundreds of nodes so arguments that 'if I'm writing hundreds of 
config files one will be wrong' are moot).

I'm also not sure there's much reason to change the available devices 
dynamically after that, since that's normally an activity that results from 
changing the physical setup of the machine which implies that actually you're 
going to have access to and be able to change the config as you do it.  John 
did come up with one case where you might be trying to remove old GPUs from 
circulation, but it's a very uncommon case that doesn't seem worth coding for, 
and it's still achievable by changing the config and restarting the compute 
processes.
[yjiag5] I totally agree with you that whitelist is static defined when 
provision. I just want to separate the information of 'provider network' to 
another configuration (like extra information). Whitelist is just white list to 
decide the device assignable. The provider network is information of the 
device, it's not in the scope of the white list.
This also reduces the autonomy of the compute node in favour of centralised 
tracking, which goes against the 'distributed where possible' philosophy of 
Openstack.
Finally, you're not actually removing configuration from the compute node.  You 
still have to configure a whitelist there; in the grouping design you also have 
to configure grouping (flavouring) on the control node as well.  The groups 
proposal adds one extra piece of information to the whitelists that are already 
there to mark groups, not a whole new set of config lines.
[yjiang5] Still, while list is to decide the device assignable, not to provide 
device information. We should mixed functionality to the configuration. If it's 
ok, I simply want to discard the 'group' term :) The nova PCI flow is simple, 
compute node provide PCI device (based on white list), the scheduler track the 
PCI device information (abstracted as pci_stats for performance issue), the API 
provide method that user specify the de

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Robert Li (baoli)

Hi Yongli,

Please also see my response to Yunhong. Here, I just want to add a comment 
about your local versus global argument. I took a brief look at your patches, 
and the PCI-flavor is added into the whitelist. The compute node needs to know 
these pci-flavors in order to report PCI stats based on them. Please correct me 
if I'm wrong.

Another comment is that a compute node doesn't need to consult with the 
controller, but it's report or registration of resources may be rejected by the 
controller due to non-existing PCI groups.

thanks,
Robert

On 1/10/14 2:11 AM, "yongli he" 
mailto:yongli...@intel.com>> wrote:

On 2014年01月10日 00:49, Robert Li (baoli) wrote:
Hi Folks,
HI, all

basiclly i flavor  the pic-flavor style and against massing  the white-list. 
please see my inline comments.



With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
"vendor_id":"","product_id":""}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={"vendor_id":"", "product_id":"", "name":"str"}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:"pci_passthrough:alias"="name:count"

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"", 
"name":"str"}]

By doing so, we eliminated the PCI alias. And we call the "name" in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of
the white list configuration is mostly local to a host, so only address in 
there, like John's proposal is good. mix the group into the whitelist means we 
make the global thing per host style, this is maybe wrong.

benefits can be harvested:

 * the implementation is significantly simplified
but more mass, refer my new patches already sent out.
 * provisioning is simplified by eliminating the PCI alias
pci alias provide a good way to define a global reference-able name for PCI, we 
need this, this is also true for John's pci-flavor.
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
simplify this seems like good, but it does not, separated the local and global 
is the instinct nature simplify.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
this mean you should consult the controller to deploy your host, if we keep 
white-list local, we simplify the deploy.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the —nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.
i still feel use alias/pci flavor is better solution.

Further, we are saying that we can define default PCI groups based on the PCI 
dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Robert Li (baoli)

ts would certainly influence the overall PCI passthrough design, I 
presume. The bottom line is that we want those requirements to be met.


4)   IMHO, the core for nova PCI support is *PCI property*. The property 
means not only generic PCI devices like vendor id, device id, device type, 
compute specific property like BDF address, the adjacent switch IP address,  
but also user defined property like nuertron’s physical net name etc. And then, 
it’s about how to get these property, how to select/group devices based on the 
property, how to store/fetch these properties.



I agree. But that's exactly what we are trying to accomplish.

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Thursday, January 09, 2014 8:49 AM
To: OpenStack Development Mailing List (not for usage questions); Irena 
Berezovsky; Sandhya Dasu (sadasu); Jiang, Yunhong; Itzik Brown; 
j...@johngarbutt.com<mailto:j...@johngarbutt.com>; He, Yongli
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
"vendor_id":"","product_id":""}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={"vendor_id":"", "product_id":"", "name":"str"}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:"pci_passthrough:alias"="name:count"

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"", 
"name":"str"}]

By doing so, we eliminated the PCI alias. And we call the "name" in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the —nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.

Further, we are saying that we can define default PCI groups based on the PCI 
device's class.

For vnic-type (or nic-type), we are saying that it defines the link 
characteristics of the nic that is attached to a VM: a nic that's connected to 
a virtual switch, a nic that is con

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Alan Kavanagh

+1 PCI Flavor.

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: January-10-14 1:56 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

BTW, I like the PCI flavor :)

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Thursday, January 09, 2014 10:41 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi, Ian, when you in aggrement with all of this, do you agree with the 'group 
name', or agree with John's pci flavor?
I'm against the PCI group and will send out a reply later.

--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Thursday, January 09, 2014 9:47 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

I think I'm in agreement with all of this.  Nice summary, Robert.
It may not be where the work ends, but if we could get this done the rest is 
just refinement.

On 9 January 2014 17:49, Robert Li (baoli) 
mailto:ba...@cisco.com>> wrote:
Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
"vendor_id":"","product_id":""}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={"vendor_id":"", "product_id":"", "name":"str"}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:"pci_passthrough:alias"="name:count"

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"", 
"name":"str"}]

By doing so, we eliminated the PCI alias. And we call the "name" in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the -nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.

Further, we are saying that we can define default PCI groups based on the PCI 
device's class.

For vnic-type (or nic-type), we are saying tha

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread John Garbutt

Apologies for this top post, I just want to move this discussion towards action.

I am traveling next week so it is unlikely that I can make the meetings. Sorry.

Can we please agree on some concrete actions, and who will do the coding?
This also means raising new blueprints for each item of work.
I am happy to review and eventually approve those blueprints, if you
email me directly.

Ideas are taken from what we started to agree on, mostly written up here:
https://wiki.openstack.org/wiki/Meetings/Passthrough#Definitions

What doesn't need doing...

We have PCI whitelist and PCI alias at the moment, let keep those
names the same for now.
I personally prefer PCI-flavor, rather than PCI-alias, but lets
discuss any rename separately.

We seemed happy with the current system (roughly) around GPU passthrough:
nova flavor-key  set
"pci_passthrough:alias"=" large_GPU:1,small_GPU:2"
nova boot --image some_image --flavor  

Again, we seemed happy with the current PCI whitelist.

Sure, we could optimise the scheduling, but again, please keep that a
separate discussion.
Something in the scheduler needs to know how many of each PCI alias
are available on each host.
How that information gets there can be change at a later date.

PCI alias is in config, but its probably better defined using host
aggregates, or some custom API.
But lets leave that for now, and discuss it separately.
If the need arrises, we can migrate away from the config.

What does need doing...
==

1) API & CLI changes for "nic-type", and associated tempest tests

* Add a user visible "nic-type" so users can express on of several
network types.
* We need a default nic-type, for when the user doesn't specify one
(might default to SRIOV in some cases)
* We can easily test the case where the default is virtual and the
user expresses a preference for virtual
* Above is much better than not testing it at all.

nova boot --flavor m1.large --image 
  --nic net-id=
  --nic net-id=,nic-type=fast
  --nic net-id=,nic-type=fast 

or

neutron port-create
  --fixed-ip subnet_id=,ip_address=192.168.57.101
  --nic-type=

nova boot --flavor m1.large --image  --nic port-id=

Where nic-type is just an extra bit metadata string that is passed to
nova and the VIF driver.

2) Expand PCI alias information

We need extensions to PCI alias so we can group SRIOV devices better.

I still think we are yet to agree on a format, but I would suggest
this as a starting point:

{
 "name":"GPU_fast",
 devices:[
  {"vendor_id":"1137","product_id":"0071", address:"*", "attach-type":"direct"},
  {"vendor_id":"1137","product_id":"0072", address:"*", "attach-type":"direct"}
 ],
 sriov_info: {}
}

{
 "name":"NIC_fast",
 devices:[
  {"vendor_id":"1137","product_id":"0071", address:"0:[1-50]:2:*",
"attach-type":"macvtap"}
  {"vendor_id":"1234","product_id":"0081", address:"*", "attach-type":"direct"}
 ],
 sriov_info: {
  "nic_type":"fast",
  "network_ids": ["net-id-1", "net-id-2"]
 }
}

{
 "name":"NIC_slower",
 devices:[
  {"vendor_id":"1137","product_id":"0071", address:"*", "attach-type":"direct"}
  {"vendor_id":"1234","product_id":"0081", address:"*", "attach-type":"direct"}
 ],
 sriov_info: {
  "nic_type":"fast",
  "network_ids": ["*"]  # this means could attach to any network
 }
}

The idea being the VIF driver gets passed this info, when network_info
includes a nic that matches.
Any other details, like VLAN id, would come from neutron, and passed
to the VIF driver as normal.

3) Reading "nic_type" and doing the PCI passthrough of NIC user requests

Not sure we are agreed on this, but basically:
* network_info contains "nic-type" from neutron
* need to select the correct VIF driver
* need to pass matching PCI alias information to VIF driver
* neutron passes details other details (like VLAN id) as before
* nova gives VIF driver an API that allows it to attach PCI devices
that are in the whitelist to the VM being configured
* with all this, the VIF driver can do what it needs to do
* lets keep it simple, and expand it as the need arrises

4) Make changes to VIF drivers, so the above is implemented

Depends on (3)

These seems like some good steps to get the basics in place for PCI
passthrough networking.
Once its working, we can review it and see if there are things that
need to evolve further.

Does that seem like a workable approach?
Who is willing to implement any of (1), (2) and (3)?

Cheers,
John

On 9 January 2014 17:47, Ian Wells  wrote:
> I think I'm in agreement with all of this.  Nice summary, Robert.
>
> It may not be where the work ends, but if we could get this done the rest is
> just refinement.
>
>
> On 9 January 2014 17:49, Robert Li (baoli)  wrote:
>>
>> Hi Folks,
>>
>>
>> With John joining the IRC, so far, we had a couple of productive meetings
>> in an effort to come to consensus and move forward. Thanks John for doing
>> that, and I appreciate everyone's effort to make it to the daily meeting.
>> Let's reconvene on Monday.

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells

In any case, we don't have to decide this now.  If we simply allowed the
whitelist to add extra arbitrary properties to the PCI record (like a group
name) and return it to the central server, we could use that in scheduling
for the minute as a group name, we wouldn't implement the APIs for flavors
yet, and we could get a working system that would be minimally changed from
what we already have.  We could worry about the scheduling in the
scheduling group, and we could leave the APIs (which, as I say, are a
minimally useful feature) untl later.  then we'd have something useful in
short order.
-- 
Ian.


On 10 January 2014 13:08, Ian Wells  wrote:

> On 10 January 2014 07:40, Jiang, Yunhong  wrote:
>
>>  Robert, sorry that I’m not fan of * your group * term. To me, *your
>> group” mixed two thing. It’s an extra property provided by configuration,
>> and also it’s a very-not-flexible mechanism to select devices (you can only
>> select devices based on the ‘group name’ property).
>>
>>
> It is exactly that.  It's 0 new config items, 0 new APIs, just an extra
> tag on the whitelists that are already there (although the proposal
> suggests changing the name of them to be more descriptive of what they now
> do).  And you talk about flexibility as if this changes frequently, but in
> fact the grouping / aliasing of devices almost never changes after
> installation, which is, not coincidentally, when the config on the compute
> nodes gets set up.
>
>>  1)   A dynamic group is much better. For example, user may want to
>> select GPU device based on vendor id, or based on vendor_id+device_id. In
>> another word, user want to create group based on vendor_id, or
>> vendor_id+device_id and select devices from these group.  John’s proposal
>> is very good, to provide an API to create the PCI flavor(or alias). I
>> prefer flavor because it’s more openstack style.
>>
> I disagree with this.  I agree that what you're saying offers a more
> flexibilibility after initial installation but I have various issues with
> it.
>
> This is directly related to the hardware configuation on each compute
> node.  For (some) other things of this nature, like provider networks, the
> compute node is the only thing that knows what it has attached to it, and
> it is the store (in configuration) of that information.  If I add a new
> compute node then it's my responsibility to configure it correctly on
> attachment, but when I add a compute node (when I'm setting the cluster up,
> or sometime later on) then it's at that precise point that I know how I've
> attached it and what hardware it's got on it.  Also, it's at this that
> point in time that I write out the configuration file (not by hand, note;
> there's almost certainly automation when configuring hundreds of nodes so
> arguments that 'if I'm writing hundreds of config files one will be wrong'
> are moot).
>
> I'm also not sure there's much reason to change the available devices
> dynamically after that, since that's normally an activity that results from
> changing the physical setup of the machine which implies that actually
> you're going to have access to and be able to change the config as you do
> it.  John did come up with one case where you might be trying to remove old
> GPUs from circulation, but it's a very uncommon case that doesn't seem
> worth coding for, and it's still achievable by changing the config and
> restarting the compute processes.
>
> This also reduces the autonomy of the compute node in favour of
> centralised tracking, which goes against the 'distributed where possible'
> philosophy of Openstack.
>
> Finally, you're not actually removing configuration from the compute
> node.  You still have to configure a whitelist there; in the grouping
> design you also have to configure grouping (flavouring) on the control node
> as well.  The groups proposal adds one extra piece of information to the
> whitelists that are already there to mark groups, not a whole new set of
> config lines.
>
>
> To compare scheduling behaviour:
>
> If I  need 4G of RAM, each compute node has reported its summary of free
> RAM to the scheduler.  I look for a compute node with 4G free, and filter
> the list of compute nodes down.  This is a query on n records, n being the
> number of compute nodes.  I schedule to the compute node, which then
> confirms it does still have 4G free and runs the VM or rejects the request.
>
> If I need 3 PCI devices and use the current system, each machine has
> reported its device allocations to the scheduler.  With SRIOV multiplying
> up the number of available devices, it's reporting back hundreds of records
> per compute node to the schedulers, and the filtering activity is a 3
> queries on n * number of PCI devices in cloud records, which could easily
> end up in the tens or even hundreds of thousands of records for a
> moderately sized cloud.  There compute node also has a record of its device
> allocations which is also checked and updated before th

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Ian Wells

On 10 January 2014 07:40, Jiang, Yunhong  wrote:

>  Robert, sorry that I’m not fan of * your group * term. To me, *your
> group” mixed two thing. It’s an extra property provided by configuration,
> and also it’s a very-not-flexible mechanism to select devices (you can only
> select devices based on the ‘group name’ property).
>
>
It is exactly that.  It's 0 new config items, 0 new APIs, just an extra tag
on the whitelists that are already there (although the proposal suggests
changing the name of them to be more descriptive of what they now do).  And
you talk about flexibility as if this changes frequently, but in fact the
grouping / aliasing of devices almost never changes after installation,
which is, not coincidentally, when the config on the compute nodes gets set
up.

>  1)   A dynamic group is much better. For example, user may want to
> select GPU device based on vendor id, or based on vendor_id+device_id. In
> another word, user want to create group based on vendor_id, or
> vendor_id+device_id and select devices from these group.  John’s proposal
> is very good, to provide an API to create the PCI flavor(or alias). I
> prefer flavor because it’s more openstack style.
>
I disagree with this.  I agree that what you're saying offers a more
flexibilibility after initial installation but I have various issues with
it.

This is directly related to the hardware configuation on each compute
node.  For (some) other things of this nature, like provider networks, the
compute node is the only thing that knows what it has attached to it, and
it is the store (in configuration) of that information.  If I add a new
compute node then it's my responsibility to configure it correctly on
attachment, but when I add a compute node (when I'm setting the cluster up,
or sometime later on) then it's at that precise point that I know how I've
attached it and what hardware it's got on it.  Also, it's at this that
point in time that I write out the configuration file (not by hand, note;
there's almost certainly automation when configuring hundreds of nodes so
arguments that 'if I'm writing hundreds of config files one will be wrong'
are moot).

I'm also not sure there's much reason to change the available devices
dynamically after that, since that's normally an activity that results from
changing the physical setup of the machine which implies that actually
you're going to have access to and be able to change the config as you do
it.  John did come up with one case where you might be trying to remove old
GPUs from circulation, but it's a very uncommon case that doesn't seem
worth coding for, and it's still achievable by changing the config and
restarting the compute processes.

This also reduces the autonomy of the compute node in favour of centralised
tracking, which goes against the 'distributed where possible' philosophy of
Openstack.

Finally, you're not actually removing configuration from the compute node.
You still have to configure a whitelist there; in the grouping design you
also have to configure grouping (flavouring) on the control node as well.
The groups proposal adds one extra piece of information to the whitelists
that are already there to mark groups, not a whole new set of config lines.

To compare scheduling behaviour:

If I  need 4G of RAM, each compute node has reported its summary of free
RAM to the scheduler.  I look for a compute node with 4G free, and filter
the list of compute nodes down.  This is a query on n records, n being the
number of compute nodes.  I schedule to the compute node, which then
confirms it does still have 4G free and runs the VM or rejects the request.

If I need 3 PCI devices and use the current system, each machine has
reported its device allocations to the scheduler.  With SRIOV multiplying
up the number of available devices, it's reporting back hundreds of records
per compute node to the schedulers, and the filtering activity is a 3
queries on n * number of PCI devices in cloud records, which could easily
end up in the tens or even hundreds of thousands of records for a
moderately sized cloud.  There compute node also has a record of its device
allocations which is also checked and updated before the final request is
run.

If I need 3 PCI devices and use the groups system, each machine has
reported its device *summary* to the scheduler.  With SRIOV multiplying up
the number of available devices, it's still reporting one or a small number
of categories, i.e. { net: 100}.  The difficulty of scheduling is a query
on num groups * n records - fewer, in fact, if some machines have no
passthrough devices.

You can see that there's quite a cost to be paid for having those flexible
alias APIs.

> 4)   IMHO, the core for nova PCI support is **PCI property**. The
> property means not only generic PCI devices like vendor id, device id,
> device type, compute specific property like BDF address, the adjacent
> switch IP address,  but also user defined property like nuertron’s phys

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread yongli he


On 2014?01?10? 00:49, Robert Li (baoli) wrote:


Hi Folks,


HI, all

basiclly i flavor  the pic-flavor style and against massing  the 
white-list. please see my inline comments.





With John joining the IRC, so far, we had a couple of productive 
meetings in an effort to come to consensus and move forward. Thanks 
John for doing that, and I appreciate everyone's effort to make it to 
the daily meeting. Let's reconvene on Monday.


But before that, and based on our today's conversation on IRC, I'd 
like to say a few things. I think that first of all, we need to get 
agreement on the terminologies that we are using so far. With the 
current nova PCI passthrough


PCI whitelist: defines all the available PCI passthrough 
devices on a compute node. pci_passthrough_whitelist=[{ 
"vendor_id":"","product_id":""}]
PCI Alias: criteria defined on the controller node with which 
requested PCI passthrough devices can be selected from all the PCI 
passthrough devices available in a cloud.
Currently it has the following format: 
pci_alias={"vendor_id":"", "product_id":"", "name":"str"}
nova flavor extra_specs: request for PCI passthrough devices 
can be specified with extra_specs in the format for 
example:"pci_passthrough:alias"="name:count"


As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against 
the PCI devices, it has to match the vendor_id and product_id against 
all the available PCI devices until one is found. The name is only 
used for reference in the extra_specs. On the other hand, the 
whitelist is basically the same as the alias without a name.


What we have discussed so far is based on something called PCI groups 
(or PCI flavors as Yongli puts it). Without introducing other 
complexities, and with a little change of the above representation, we 
will have something like:
pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"", 
"name":"str"}]


By doing so, we eliminated the PCI alias. And we call the "name" in 
above as a PCI group name. You can think of it as combining the 
definitions of the existing whitelist and PCI alias. And believe it or 
not, a PCI group is actually a PCI alias. However, with that change of 
thinking, a lot of
the white list configuration is mostly local to a host, so only address 
in there, like John's proposal is good. mix the group into the whitelist 
means we make the global thing per host style, this is maybe wrong.



benefits can be harvested:

 * the implementation is significantly simplified

but more mass, refer my new patches already sent out.

 * provisioning is simplified by eliminating the PCI alias
pci alias provide a good way to define a global reference-able name for 
PCI, we need this, this is also true for John's pci-flavor.
 * a compute node only needs to report stats with something 
like: PCI group name:count. A compute node processes all the PCI 
passthrough devices against the whitelist, and assign a PCI group 
based on the whitelist definition.
simplify this seems like good, but it does not, separated the local and 
global is the instinct nature simplify.
 * on the controller, we may only need to define the PCI group 
names. if we use a nova api to define PCI groups (could be private or 
public, for example), one potential benefit, among other things 
(validation, etc),  they can be owned by the tenant that creates them. 
And thus a wholesale of PCI passthrough devices is also possible.
this mean you should consult the controller to deploy your host, if we 
keep white-list local, we simplify the deploy.

 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI 
groups based on network connectivities.


Further, to support SRIOV, we are saying that PCI group names not only 
can be used in the extra specs, it can also be used in the —nic option 
and the neutron commands. This allows the most flexibilities and 
functionalities afforded by SRIOV.

i still feel use alias/pci flavor is better solution.


Further, we are saying that we can define default PCI groups based on 
the PCI device's class.
default grouping make our conceptual model more mass, pre-define a 
global thing in API and your hard code is not good way, i post -2 for this.


For vnic-type (or nic-type), we are saying that it defines the link 
characteristics of the nic that is attached to a VM: a nic that's 
connected to a virtual switch, a nic that is connected to a physical 
switch, or a nic that is connected to a physical switch, but has a 
host macvtap device in between. The actual names of the choices are 
not important here, and can be debated.


I'm hoping that we can go over the above on Monday. But any c

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Jiang, Yunhong

BTW, I like the PCI flavor :)

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Thursday, January 09, 2014 10:41 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi, Ian, when you in aggrement with all of this, do you agree with the 'group 
name', or agree with John's pci flavor?
I'm against the PCI group and will send out a reply later.

--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Thursday, January 09, 2014 9:47 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

I think I'm in agreement with all of this.  Nice summary, Robert.
It may not be where the work ends, but if we could get this done the rest is 
just refinement.

On 9 January 2014 17:49, Robert Li (baoli) 
mailto:ba...@cisco.com>> wrote:
Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
"vendor_id":"","product_id":""}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={"vendor_id":"", "product_id":"", "name":"str"}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:"pci_passthrough:alias"="name:count"

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"", 
"name":"str"}]

By doing so, we eliminated the PCI alias. And we call the "name" in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the -nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.

Further, we are saying that we can define default PCI groups based on the PCI 
device's class.

For vnic-type (or nic-type), we are saying that it defines the link 
characteristics of the nic that is attached to a VM: a nic that's connected to 
a virtual switch, a nic that is connected to a physical switch, or a nic that 
is connecte

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Jiang, Yunhong

Hi, Ian, when you in aggrement with all of this, do you agree with the 'group 
name', or agree with John's pci flavor?
I'm against the PCI group and will send out a reply later.

--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Thursday, January 09, 2014 9:47 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

I think I'm in agreement with all of this.  Nice summary, Robert.
It may not be where the work ends, but if we could get this done the rest is 
just refinement.

On 9 January 2014 17:49, Robert Li (baoli) 
mailto:ba...@cisco.com>> wrote:
Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
"vendor_id":"","product_id":""}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={"vendor_id":"", "product_id":"", "name":"str"}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:"pci_passthrough:alias"="name:count"

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"", 
"name":"str"}]

By doing so, we eliminated the PCI alias. And we call the "name" in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the -nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.

Further, we are saying that we can define default PCI groups based on the PCI 
device's class.

For vnic-type (or nic-type), we are saying that it defines the link 
characteristics of the nic that is attached to a VM: a nic that's connected to 
a virtual switch, a nic that is connected to a physical switch, or a nic that 
is connected to a physical switch, but has a host macvtap device in between. 
The actual names of the choices are not important here, and can be debated.

I'm hoping that we can go over the above on Monday. But any comments are 
welcome by email.

Thanks,
Robert

___
Op

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Jiang, Yunhong

Robert, sorry that I'm not fan of * your group * term. To me, *your group" 
mixed two thing. It's an extra property provided by configuration, and also 
it's a very-not-flexible mechanism to select devices (you can only select 
devices based on the 'group name' property).


1)   A dynamic group is much better. For example, user may want to select 
GPU device based on vendor id, or based on vendor_id+device_id. In another 
word, user want to create group based on vendor_id, or vendor_id+device_id and 
select devices from these group.  John's proposal is very good, to provide an 
API to create the PCI flavor(or alias). I prefer flavor because it's more 
openstack style.



2)   As for the second thing of your 'group', I'd understand it as an extra 
property provided by configuration.  I don't think we should put it into the 
white list, which is to configure devices that are assignable.  I'd add another 
configuration option to provide extra attribute to devices. When nova compute 
is up, it will parse this configuration and add them to the corresponding PCI 
devices. I don't think adding another configuration will cause too many trouble 
to deployment. Openstack already have a lot of configuration items :)



3)   I think currently we mixed the neutron and nova design. To me, Neutron 
SRIOV support is a user of nova PCI support. Thus we should firstly analysis 
the requirement from neutron PCI support to nova PCI support in a more generic  
way, and then, we can discuss how we enhance the nova PCI support, or, if you 
want, re-design the nova PCI support. IMHO, if don't consider network, current 
implementation should be ok.



4)   IMHO, the core for nova PCI support is *PCI property*. The property 
means not only generic PCI devices like vendor id, device id, device type, 
compute specific property like BDF address, the adjacent switch IP address,  
but also user defined property like nuertron's physical net name etc. And then, 
it's about how to get these property, how to select/group devices based on the 
property, how to store/fetch these properties.



Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Thursday, January 09, 2014 8:49 AM
To: OpenStack Development Mailing List (not for usage questions); Irena 
Berezovsky; Sandhya Dasu (sadasu); Jiang, Yunhong; Itzik Brown; 
j...@johngarbutt.com; He, Yongli
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
"vendor_id":"","product_id":""}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={"vendor_id":"", "product_id":"", "name":"str"}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:"pci_passthrough:alias"="name:count"

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"", 
"name":"str"}]

By doing so, we eliminated the PCI alias. And we call the "name" in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simpli

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Ian Wells

On 9 January 2014 22:50, Ian Wells  wrote:

> On 9 January 2014 20:19, Brian Schott wrote:
> On the flip side, vendor_id and product_id might not be sufficient.
>  Suppose I have two identical NICs, one for nova internal use and the
> second for guest tenants?  So, bus numbering may be required.
>
>>
>> 01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900
>> GTX] (rev a1)
>> 02:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900
>> GTX] (rev a1)
>>
>
> I totally concur on this - with network devices in particular the PCI path
> is important because you don't accidentally want to grab the Openstack
> control network device ;)
>

Redundant statement is redundant.  Sorry, yes, this has been a pet bugbear
of mine.  It applies equally to provider networks on the networking side of
thing, and, where Neutron is not your network device manager for a PCI
device, you may want several device groups bridged to different segments.
Network devices are one case of a category of device where there's
something about the device that you can't detect that means it's not
necessarily interchangeable with its peers.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Ian Wells

On 9 January 2014 20:19, Brian Schott wrote:

> Ian,
>
> The idea of pci flavors is a great and using vendor_id and product_id make
> sense, but I could see a case for adding the class name such as 'VGA
> compatible controller'. Otherwise, slightly different generations of
> hardware will mean custom whitelist setups on each compute node.
>

Personally, I think the important thing is to have a matching expression.
The more flexible the matching language, the better.

On the flip side, vendor_id and product_id might not be sufficient.
>  Suppose I have two identical NICs, one for nova internal use and the
> second for guest tenants?  So, bus numbering may be required.
>
> 01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900
> GTX] (rev a1)
> 02:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900
> GTX] (rev a1)
>

I totally concur on this - with network devices in particular the PCI path
is important because you don't accidentally want to grab the Openstack
control network device ;)


> I know you guys are thinking of PCI devices, but any though of mapping to
> something like udev rather than pci?  Supporting udev rules might be easier
> and more robust rather than making something up.
>

Past experience has told me that udev rules are not actually terribly good,
which you soon discover when you have to write expressions like:

 SUBSYSTEM=="net", KERNELS==":83:00.0", ACTION=="add", NAME="eth8"

which took me a long time to figure out and is self-documenting only in
that it has a recognisable PCI path in there, 'KERNELS' not being a
meaningful name to me.  And self-documenting is key to udev rules, because
there's not much information on the tag meanings otherwise.

I'm comfortable with having a match format that covers what we know and
copes with extension for when we find we're short a feature, and what we
have now is close to that.  Yes, it needs the class adding, we all agree,
and you should be able to match on PCI path, which you can't now, but it's
close.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Sandhya Dasu (sadasu)

Hi,
 One use case was brought up in today's meeting that I think is not valid.

It is the use case where all 3 vnic types : Virtio, direct and macvtap (the 
terms used in the meeting were slow, fast, faster/foobar) could be attached to 
the same VM.  The main difference between a direct and macvtap interface is 
that the former does not support live migration. So, attaching both direct and 
macvtap pci-passthrough interfaces to the same VM would mean that it cannot 
support live migration. In that case assigning the macvtap interface is in 
essence a waste.

So, it would be ideal to disallow such an assignment or at least warn the user 
that the VM will now not be able to support live migration.  We can  however 
still combine direct or macvtap pci-passthrough interfaces with virtio vmic 
types without issue.

Thanks,
Sandhya

From: Ian Wells mailto:ijw.ubu...@cack.org.uk>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
mailto:openstack-dev@lists.openstack.org>>
Date: Thursday, January 9, 2014 12:47 PM
To: "OpenStack Development Mailing List (not for usage questions)" 
mailto:openstack-dev@lists.openstack.org>>
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

I think I'm in agreement with all of this.  Nice summary, Robert.

It may not be where the work ends, but if we could get this done the rest is 
just refinement.


On 9 January 2014 17:49, Robert Li (baoli) 
mailto:ba...@cisco.com>> wrote:
Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
"vendor_id":"","product_id":""}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={"vendor_id":"", "product_id":"", "name":"str"}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:"pci_passthrough:alias"="name:count"

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"", 
"name":"str"}]

By doing so, we eliminated the PCI alias. And we call the "name" in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group nam

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Brian Schott

Ian,

The idea of pci flavors is a great and using vendor_id and product_id make 
sense, but I could see a case for adding the class name such as 'VGA compatible 
controller'. Otherwise, slightly different generations of hardware will mean 
custom whitelist setups on each compute node.  

01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)

On the flip side, vendor_id and product_id might not be sufficient.  Suppose I 
have two identical NICs, one for nova internal use and the second for guest 
tenants?  So, bus numbering may be required.  

01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)

Some possible combinations:

# take 2 gpus
pci_passthrough_whitelist=[
{ "vendor_id":"NVIDIA Corporation G71","product_id":"GeForce 7900 GTX", 
"name":"GPU"},
]

# only take the GPU on PCI 2
pci_passthrough_whitelist=[
{ "vendor_id":"NVIDIA Corporation G71","product_id":"GeForce 7900 GTX", 
'bus_id': '02:', "name":"GPU"},
]
pci_passthrough_whitelist=[
{"bus_id": "01:00.0", "name": "GPU"},
{"bus_id": "02:00.0", "name": "GPU"},
]

pci_passthrough_whitelist=[
{"class": "VGA compatible controller", "name": "GPU"},
]

pci_passthrough_whitelist=[
{ "product_id":"GeForce 7900 GTX", "name":"GPU"},
]

I know you guys are thinking of PCI devices, but any though of mapping to 
something like udev rather than pci?  Supporting udev rules might be easier and 
more robust rather than making something up.

Brian

-
Brian Schott, CTO
Nimbis Services, Inc.
brian.sch...@nimbisservices.com
ph: 443-274-6064  fx: 443-274-6060



On Jan 9, 2014, at 12:47 PM, Ian Wells  wrote:

> I think I'm in agreement with all of this.  Nice summary, Robert.
> 
> It may not be where the work ends, but if we could get this done the rest is 
> just refinement.
> 
> 
> On 9 January 2014 17:49, Robert Li (baoli)  wrote:
> Hi Folks,
> 
> 
> With John joining the IRC, so far, we had a couple of productive meetings in 
> an effort to come to consensus and move forward. Thanks John for doing that, 
> and I appreciate everyone's effort to make it to the daily meeting. Let's 
> reconvene on Monday. 
> 
> But before that, and based on our today's conversation on IRC, I'd like to 
> say a few things. I think that first of all, we need to get agreement on the 
> terminologies that we are using so far. With the current nova PCI passthrough
> 
> PCI whitelist: defines all the available PCI passthrough devices on a 
> compute node. pci_passthrough_whitelist=[{
>  "vendor_id":"","product_id":""}] 
> PCI Alias: criteria defined on the controller node with which 
> requested PCI passthrough devices can be selected from all the PCI 
> passthrough devices available in a cloud. 
> Currently it has the following format: 
> pci_alias={"vendor_id":"",
>  "product_id":"", "name":"str"}
> 
> nova flavor extra_specs: request for PCI passthrough devices can be 
> specified with extra_specs in the format for 
> example:"pci_passthrough:alias"="name:count"
> 
> As you can see, currently a PCI alias has a name and is defined on the 
> controller. The implications for it is that when matching it against the PCI 
> devices, it has to match the vendor_id and product_id against all the 
> available PCI devices until one is found. The name is only used for reference 
> in the extra_specs. On the other hand, the whitelist is basically the same as 
> the alias without a name.
> 
> What we have discussed so far is based on something called PCI groups (or PCI 
> flavors as Yongli puts it). Without introducing other complexities, and with 
> a little change of the above representation, we will have something like:
> 
> pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"",
>  "name":"str"}] 
> 
> By doing so, we eliminated the PCI alias. And we call the "name" in above as 
> a PCI group name. You can think of it as combining the definitions of the 
> existing whitelist and PCI alias. And believe it or not, a PCI group is 
> actually a PCI alias. However, with that change of thinking, a lot of 
> benefits can be harvested:
> 
>  * the implementation is significantly simplified
>  * provisioning is simplified by eliminating the PCI alias
>  * a compute node only needs to report stats with something like: PCI 
> group name:count. A compute node processes all the PCI passthrough devices 
> against the whitelist, and assign a PCI group based on the whitelist 
> definition.
>  * on the controller, we may only need to define the PCI group names. 
> if we use a nova api to define PCI groups (could be private or public, for 
> example), one potential benefit, among other things (validation, etc),  they 
> can be owned by the tenant that cre

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Ian Wells

I think I'm in agreement with all of this.  Nice summary, Robert.

It may not be where the work ends, but if we could get this done the rest
is just refinement.


On 9 January 2014 17:49, Robert Li (baoli)  wrote:

>Hi Folks,
>
>  With John joining the IRC, so far, we had a couple of productive
> meetings in an effort to come to consensus and move forward. Thanks John
> for doing that, and I appreciate everyone's effort to make it to the daily
> meeting. Let's reconvene on Monday.
>
>  But before that, and based on our today's conversation on IRC, I'd like
> to say a few things. I think that first of all, we need to get agreement on
> the terminologies that we are using so far. With the current nova PCI
> passthrough
>
>  PCI whitelist: defines all the available PCI passthrough devices
> on a compute node. pci_passthrough_whitelist=[{
> "vendor_id":"","product_id":""}]
> PCI Alias: criteria defined on the controller node with which
> requested PCI passthrough devices can be selected from all the PCI
> passthrough devices available in a cloud.
> Currently it has the following format: 
> pci_alias={"vendor_id":"",
> "product_id":"", "name":"str"}
>
> nova flavor extra_specs: request for PCI passthrough devices can
> be specified with extra_specs in the format for example:
> "pci_passthrough:alias"="name:count"
>
>  As you can see, currently a PCI alias has a name and is defined on the
> controller. The implications for it is that when matching it against the
> PCI devices, it has to match the vendor_id and product_id against all the
> available PCI devices until one is found. The name is only used for
> reference in the extra_specs. On the other hand, the whitelist is basically
> the same as the alias without a name.
>
>  What we have discussed so far is based on something called PCI groups
> (or PCI flavors as Yongli puts it). Without introducing other complexities,
> and with a little change of the above representation, we will have
> something like:
>
> pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"",
> "name":"str"}]
>
>  By doing so, we eliminated the PCI alias. And we call the "name" in
> above as a PCI group name. You can think of it as combining the definitions
> of the existing whitelist and PCI alias. And believe it or not, a PCI group
> is actually a PCI alias. However, with that change of thinking, a lot of
> benefits can be harvested:
>
>   * the implementation is significantly simplified
>  * provisioning is simplified by eliminating the PCI alias
>  * a compute node only needs to report stats with something like:
> PCI group name:count. A compute node processes all the PCI passthrough
> devices against the whitelist, and assign a PCI group based on the
> whitelist definition.
>  * on the controller, we may only need to define the PCI group
> names. if we use a nova api to define PCI groups (could be private or
> public, for example), one potential benefit, among other things
> (validation, etc),  they can be owned by the tenant that creates them. And
> thus a wholesale of PCI passthrough devices is also possible.
>  * scheduler only works with PCI group names.
>  * request for PCI passthrough device is based on PCI-group
>  * deployers can provision the cloud based on the PCI groups
>  * Particularly for SRIOV, deployers can design SRIOV PCI groups
> based on network connectivities.
>
>  Further, to support SRIOV, we are saying that PCI group names not only
> can be used in the extra specs, it can also be used in the —nic option and
> the neutron commands. This allows the most flexibilities and
> functionalities afforded by SRIOV.
>
>  Further, we are saying that we can define default PCI groups based on
> the PCI device's class.
>
>  For vnic-type (or nic-type), we are saying that it defines the link
> characteristics of the nic that is attached to a VM: a nic that's connected
> to a virtual switch, a nic that is connected to a physical switch, or a nic
> that is connected to a physical switch, but has a host macvtap device in
> between. The actual names of the choices are not important here, and can be
> debated.
>
>  I'm hoping that we can go over the above on Monday. But any comments are
> welcome by email.
>
>  Thanks,
> Robert
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Robert Li (baoli)

Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
"vendor_id":"","product_id":""}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={"vendor_id":"", "product_id":"", "name":"str"}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:"pci_passthrough:alias"="name:count"

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"", 
"name":"str"}]

By doing so, we eliminated the PCI alias. And we call the "name" in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the —nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.

Further, we are saying that we can define default PCI groups based on the PCI 
device's class.

For vnic-type (or nic-type), we are saying that it defines the link 
characteristics of the nic that is attached to a VM: a nic that's connected to 
a virtual switch, a nic that is connected to a physical switch, or a nic that 
is connected to a physical switch, but has a host macvtap device in between. 
The actual names of the choices are not important here, and can be debated.

I'm hoping that we can go over the above on Monday. But any comments are 
welcome by email.

Thanks,
Robert

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-02 Thread Robert Li (baoli)

Hi John,

We had one on 12/14/2013 with the log:

http://eavesdrop.openstack.org/meetings/pci_passthrough_meeting/2013/pci_pa
ssthrough_meeting.2013-12-24-14.02.log.html

The next one will be at UTC 1400 on Jan. 7th, Tuesday.


--Robert

On 1/2/14 10:06 AM, "John Garbutt"  wrote:

>On 22 December 2013 12:07, Irena Berezovsky  wrote:
>> Hi Ian,
>>
>> My comments are inline
>>
>> I  would like to suggest to focus the next PCI-pass though IRC meeting
>>on:
>>
>> 1.Closing the administration and tenant that powers the VM use
>> cases.
>>
>> 2.   Decouple the nova and neutron parts to start focusing on the
>> neutron related details.
>
>When is the next meeting?
>
>I have lost track due to holidays, etc.
>
>John
>
>___
>OpenStack-dev mailing list
>OpenStack-dev@lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-02 Thread John Garbutt

On 22 December 2013 12:07, Irena Berezovsky  wrote:
> Hi Ian,
>
> My comments are inline
>
> I  would like to suggest to focus the next PCI-pass though IRC meeting on:
>
> 1.Closing the administration and tenant that powers the VM use
> cases.
>
> 2.   Decouple the nova and neutron parts to start focusing on the
> neutron related details.

When is the next meeting?

I have lost track due to holidays, etc.

John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-23 Thread Jay Pipes


On 12/17/2013 10:09 AM, Ian Wells wrote:

Reiterating from the IRC mneeting, largely, so apologies.

Firstly, I disagree that
https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support is an
accurate reflection of the current state.  It's a very unilateral view,
largely because the rest of us had been focussing on the google document
that we've been using for weeks.

Secondly, I totally disagree with this approach.  This assumes that
description of the (cloud-internal, hardware) details of each compute
node is best done with data stored centrally and driven by an API.  I
don't agree with either of these points.

Firstly, the best place to describe what's available on a compute node
is in the configuration on the compute node.  For instance, I describe
which interfaces do what in Neutron on the compute node.  This is
because when you're provisioning nodes, that's the moment you know how
you've attached it to the network and what hardware you've put in it and
what you intend the hardware to be for - or conversely your deployment
puppet or chef or whatever knows it, and Razor or MAAS has enumerated
it, but the activities are equivalent.  Storing it centrally distances
the compute node from its descriptive information for no good purpose
that I can see and adds the complexity of having to go make remote
requests just to start up.

Secondly, even if you did store this centrally, it's not clear to me
that an API is very useful.  As far as I can see, the need for an API is
really the need to manage PCI device flavors.  If you want that to be
API-managed, then the rest of a (rather complex) API cascades from that
one choice.  Most of the things that API lets you change (expressions
describing PCI devices) are the sort of thing that you set once and only
revisit when you start - for instance - deploying new hosts in a
different way.

I at the parallel in Neutron provider networks.  They're config driven,
largely on the compute hosts.  Agents know what ports on their machine
(the hardware tie) are associated with provider networks, by provider
network name.  The controller takes 'neutron net-create ...
--provider:network 'name'' and uses that to tie a virtual network to the
provider network definition on each host.  What we absolutely don't do
is have a complex admin API that lets us say 'in host aggregate 4,
provider network x (which I made earlier) is connected to eth6'.


FWIW, I could not agree more. The Neutron API already suffers from 
overcomplexity. There's really no need to make it even more complex than 
it already is, especially for a feature that more naturally fits in 
configuration data (Puppet/Chef/etc) and isn't something that you would 
really ever change for a compute host once set.


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-23 Thread Jose Gavine Cueto

Hi,

I would just like to share my idea on somehow managing sr-iov networking
attributes in neutron (e.g. mac addr, ip addr, vlan).  I've had experience
implementing this and that was before pci-passthrough feature in nova
existed.  Basically, nova still did the plugging and the unplugging of vifs
and neutron did all the provisioning of networking attributes.  At that
time, the best hack I can do was to treat sr-iov nics as ordinary vifs that
were distinguishable by nova and neutron.  So to implement that, when
booting an instance in nova, a certain sr-iov-vf-specific extra_spec was
used (e.g. vfs := 1) indicating the number of sr-iov vfs to create and
eventually represented as mere vif objects in nova.  In nova, the sr-iov
vfs were represented as vifs but a special exception was made wherein
sr-iov vfs aren't really plugged, because of course it isn't necessary.  In
effect, the vifs that represent the vfs were accounted in the db including
its ip and mac addresses, and vlan tags.  With respect to l2 isolation, the
vlan tags were retrieved when booting the instance through neutron api and
were applied in libvirt xml.  To summarize, the networking attributes such
as ip and mac addresses and vlan tags were applied normally to vfs and thus
preserved the normal "OS way" of managing these like ordinary vifs.
 However, since its just a hack, some consequences and issues surfaced such
as, proper migration of these networking attributes weren't tested,
 libvirt seems to mistakenly swap the mac addresses when rebooting the
instances, and most importantly the vifs that represented the vfs lack
passthrough-specific information.  Since today OS already has this concept
of PCI-passthrough, I'm thinking this could be combined with the idea of a
vf that is represented by a vif to have a complete abstraction of a
manageable sr-iov vf.  I have not read thoroughly the preceeding replies,
so this idea might be redundant or irrelevant already.

Cheers,
Pepe

On Thu, Oct 17, 2013 at 4:32 AM, Irena Berezovsky wrote:

>  Hi,
>
> As one of the next steps for PCI pass-through I would like to discuss is
> the support for PCI pass-through vNIC.
>
> While nova takes care of PCI pass-through device resources  management and
> VIF settings, neutron should manage their networking configuration.
>
> I would like to register a summit proposal to discuss the support for PCI
> pass-through networking.
>
> I am not sure what would be the right topic to discuss the PCI
> pass-through networking, since it involve both nova and neutron.
>
> There is already a session registered by Yongli on nova topic to discuss
> the PCI pass-through next steps.
>
> I think PCI pass-through networking is quite a big topic and it worth to
> have a separate discussion.
>
> Is there any other people who are interested to discuss it and share their
> thoughts and experience?
>
>
>
> Regards,
>
> Irena
>
>
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>

-- 
To stop learning is like to stop loving.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-22 Thread Irena Berezovsky

Hi Ian,
My comments are inline
I  would like to suggest to focus the next PCI-pass though IRC meeting on:

1.Closing the administration and tenant that powers the VM use cases.

2.   Decouple the nova and neutron parts to start focusing on the neutron 
related details.

BR,
Irena

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, December 20, 2013 2:50 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

On 19 December 2013 15:15, John Garbutt 
mailto:j...@johngarbutt.com>> wrote:
> Note, I don't see the person who boots the server ever seeing the pci-flavor, 
> only understanding the server flavor.
> [IrenaB] I am not sure that elaborating PCI device request into server flavor 
> is the right approach for the PCI pass-through network case. vNIC by its 
> nature is something dynamic that can be plugged or unplugged after VM boot. 
> server flavor is  quite static.
I was really just meaning the server flavor specify the type of NIC to attach.

The existing port specs, etc, define how many nics, and you can hot
plug as normal, just the VIF plugger code is told by the server flavor
if it is able to PCI passthrough, and which devices it can pick from.
The idea being combined with the neturon network-id you know what to
plug.

The more I talk about this approach the more I hate it :(

The thinking we had here is that nova would provide a VIF or a physical NIC for 
each attachment.  Precisely what goes on here is a bit up for grabs, but I 
would think:
Nova specifiies the type at port-update, making it obvious to Neutron it's 
getting a virtual interface or a passthrough NIC (and the type of that NIC, 
probably, and likely also the path so that Neutron can distinguish between NICs 
if it needs to know the specific attachment port)
Neutron does its magic on the network if it has any to do, like faffing(*) with 
switches
Neutron selects the VIF/NIC plugging type that Nova should use, and in the case 
that the NIC is a VF and it wants to set an encap, returns that encap back to 
Nova
Nova plugs it in and sets it up (in libvirt, this is generally in the XML; 
XenAPI and others are up for grabs).
[IrenaB] I agree on the described flow. Still need to close how to elaborate 
the request for pass-through vNIC into the  'nova boot'.
> We might also want a "nic-flavor" that tells neutron information it requires, 
> but lets get to that later...
> [IrenaB] nic flavor is definitely something that we need in order to choose 
> if  high performance (PCI pass-through) or virtio (i.e. OVS) nic will be 
> created.
Well, I think its the right way go. Rather than overloading the server
flavor with hints about which PCI devices you could use.

The issue here is that additional attach.  Since for passthrough that isn't 
NICs (like crypto cards) you would almost certainly specify it in the flavor, 
if you did the same for NICs then you would have a preallocated pool of NICs 
from which to draw.  The flavor is also all you need to know for billing, and 
the flavor lets you schedule.  If you have it on the list of NICs, you have to 
work out how many physical NICs you need before you schedule (admittedly not 
hard, but not in keeping) and if you then did a subsequent attach it could fail 
because you have no more NICs on the machine you scheduled to - and at this 
point you're kind of stuck.

Also with the former, if you've run out of NICs, the already-extant resize call 
would allow you to pick a flavor with more NICs and you can then reschedule the 
subsequent VM to wherever resources are available to fulfil the new request.
[IrenaB] Still think that putting PCI NIC request into Server Flavor is not 
right approach. You will need to create different server flavors per any 
possible combination of tenant networks attachment options, or maybe assume he 
is connecting to all. As for billing, you can use type of vNIC in addition to 
packets in/out for billing per vNIC. This way, tenant will be charged only for  
used vNICs.
One question here is whether Neutron should become a provider of billed 
resources (specifically passthrough NICs) in the same way as Cinder is of 
volumes - something we'd not discussed to date; we've largely worked on the 
assumption that NICs are like any other passthrough resource, just one where, 
once it's allocated out, Neutron can work magic with it.
[IrenaB] I am not so familiar with Ceilometer, but seems that if we are talking 
about network resources, neutron should be in charge.

--
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread Ian Wells

On 19 December 2013 15:15, John Garbutt  wrote:

> > Note, I don't see the person who boots the server ever seeing the
> pci-flavor, only understanding the server flavor.
>  > [IrenaB] I am not sure that elaborating PCI device request into server
> flavor is the right approach for the PCI pass-through network case. vNIC by
> its nature is something dynamic that can be plugged or unplugged after VM
> boot. server flavor is  quite static.
>
> I was really just meaning the server flavor specify the type of NIC to
> attach.
>
> The existing port specs, etc, define how many nics, and you can hot
> plug as normal, just the VIF plugger code is told by the server flavor
> if it is able to PCI passthrough, and which devices it can pick from.
> The idea being combined with the neturon network-id you know what to
> plug.
>
> The more I talk about this approach the more I hate it :(
>

The thinking we had here is that nova would provide a VIF or a physical NIC
for each attachment.  Precisely what goes on here is a bit up for grabs,
but I would think:

Nova specifiies the type at port-update, making it obvious to Neutron it's
getting a virtual interface or a passthrough NIC (and the type of that NIC,
probably, and likely also the path so that Neutron can distinguish between
NICs if it needs to know the specific attachment port)
Neutron does its magic on the network if it has any to do, like faffing(*)
with switches
Neutron selects the VIF/NIC plugging type that Nova should use, and in the
case that the NIC is a VF and it wants to set an encap, returns that encap
back to Nova
Nova plugs it in and sets it up (in libvirt, this is generally in the XML;
XenAPI and others are up for grabs).

 > We might also want a "nic-flavor" that tells neutron information it
> requires, but lets get to that later...
> > [IrenaB] nic flavor is definitely something that we need in order to
> choose if  high performance (PCI pass-through) or virtio (i.e. OVS) nic
> will be created.
>
> Well, I think its the right way go. Rather than overloading the server
> flavor with hints about which PCI devices you could use.
>

The issue here is that additional attach.  Since for passthrough that isn't
NICs (like crypto cards) you would almost certainly specify it in the
flavor, if you did the same for NICs then you would have a preallocated
pool of NICs from which to draw.  The flavor is also all you need to know
for billing, and the flavor lets you schedule.  If you have it on the list
of NICs, you have to work out how many physical NICs you need before you
schedule (admittedly not hard, but not in keeping) and if you then did a
subsequent attach it could fail because you have no more NICs on the
machine you scheduled to - and at this point you're kind of stuck.

Also with the former, if you've run out of NICs, the already-extant resize
call would allow you to pick a flavor with more NICs and you can then
reschedule the subsequent VM to wherever resources are available to fulfil
the new request.

One question here is whether Neutron should become a provider of billed
resources (specifically passthrough NICs) in the same way as Cinder is of
volumes - something we'd not discussed to date; we've largely worked on the
assumption that NICs are like any other passthrough resource, just one
where, once it's allocated out, Neutron can work magic with it.
-- 
Ian.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread John Garbutt

Response inline...

On 19 December 2013 13:05, Irena Berezovsky  wrote:
> Hi John,
> I totally agree that we should define the use cases both for administration 
> and tenant that powers the VM.
> Since we are trying to support PCI pass-through network, let's focus on the 
> related use cases.
> Please see my comments inline.

Cool.

> Regards,
> Irena
> -Original Message-
> From: John Garbutt [mailto:j...@johngarbutt.com]
> Sent: Thursday, December 19, 2013 1:42 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support
>
> Apologies for being late onto this thread, and not making the meeting the 
> other day.
> Also apologies this is almost totally a top post.
>
> On 17 December 2013 15:09, Ian Wells  wrote:
>> Firstly, I disagree that
>> https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support is an
>> accurate reflection of the current state.  It's a very unilateral
>> view, largely because the rest of us had been focussing on the google
>> document that we've been using for weeks.
>
> I haven't seen the google doc. I got involved through the blueprint review of 
> this:
> https://blueprints.launchpad.net/nova/+spec/pci-extra-info
>
> I assume its this one?
> https://docs.google.com/document/d/1EMwDg9J8zOxzvTnQJ9HwZdiotaVstFWKIuKrPse6JOs
>
> On a quick read, my main concern is separating out the "user" more:
> * administration (defines pci-flavor, defines which hosts can provide it, 
> defines server flavor...)
> * person who boots server (picks server flavor, defines neutron ports)
>
> Note, I don't see the person who boots the server ever seeing the pci-flavor, 
> only understanding the server flavor.
> [IrenaB] I am not sure that elaborating PCI device request into server flavor 
> is the right approach for the PCI pass-through network case. vNIC by its 
> nature is something dynamic that can be plugged or unplugged after VM boot. 
> server flavor is  quite static.

I was really just meaning the server flavor specify the type of NIC to attach.

The existing port specs, etc, define how many nics, and you can hot
plug as normal, just the VIF plugger code is told by the server flavor
if it is able to PCI passthrough, and which devices it can pick from.
The idea being combined with the neturon network-id you know what to
plug.

The more I talk about this approach the more I hate it :(

> We might also want a "nic-flavor" that tells neutron information it requires, 
> but lets get to that later...
> [IrenaB] nic flavor is definitely something that we need in order to choose 
> if  high performance (PCI pass-through) or virtio (i.e. OVS) nic will be 
> created.

Well, I think its the right way go. Rather than overloading the server
flavor with hints about which PCI devices you could use.

>> Secondly, I totally disagree with this approach.  This assumes that
>> description of the (cloud-internal, hardware) details of each compute
>> node is best done with data stored centrally and driven by an API.  I
>> don't agree with either of these points.
>
> Possibly, but I would like to first agree on the use cases and data model we 
> want.
>
> Nova has generally gone for APIs over config in recent times.
> Mostly so you can do run-time configuration of the system.
> But lets just see what makes sense when we have the use cases agreed.
>
>>> On 2013年12月16日 22:27, Robert Li (baoli) wrote:
>>> I'd like to give you guy a summary of current state, let's discuss it
>>> then.
>>> https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support
>>>
>>>
>>> 1)  fade out alias ( i think this ok for all)
>>> 2)  white list became pic-flavor ( i think this ok for all)
>>> 3)  address simply regular expression support: only * and a number
>>> range is support [hex-hex]. ( i think this ok?)
>>> 4)  aggregate : now it's clear enough, and won't impact SRIOV.  ( i
>>> think this irrelevant to SRIOV now)
>
> So... this means we have:
>
> PCI-flavor:
> * i.e. standardGPU, standardGPUnew, fastGPU, hdFlash1TB etc
>
> Host mapping:
> * decide which hosts you allow a particular flavor to be used
> * note, the scheduler still needs to find out if any devices are "free"
>
> flavor (of the server):
> * usual RAM, CPU, Storage
> * use extra specs to add PCI devices
> * example:
> ** add one PCI device, choice of standardGPU or standardGPUnew
> ** also add: one hdFlash1TB
>
> Now, the other bit is SRIOV... At a high level:
>
> Neutron:
> * user wants to connect to a particular neutron n

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread Irena Berezovsky

Hi John,
I totally agree that we should define the use cases both for administration and 
tenant that powers the VM.
Since we are trying to support PCI pass-through network, let's focus on the 
related use cases.
Please see my comments inline.

Regards,
Irena
-Original Message-
From: John Garbutt [mailto:j...@johngarbutt.com] 
Sent: Thursday, December 19, 2013 1:42 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Apologies for being late onto this thread, and not making the meeting the other 
day.
Also apologies this is almost totally a top post.

On 17 December 2013 15:09, Ian Wells  wrote:
> Firstly, I disagree that
> https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support is an 
> accurate reflection of the current state.  It's a very unilateral 
> view, largely because the rest of us had been focussing on the google 
> document that we've been using for weeks.

I haven't seen the google doc. I got involved through the blueprint review of 
this:
https://blueprints.launchpad.net/nova/+spec/pci-extra-info

I assume its this one?
https://docs.google.com/document/d/1EMwDg9J8zOxzvTnQJ9HwZdiotaVstFWKIuKrPse6JOs

On a quick read, my main concern is separating out the "user" more:
* administration (defines pci-flavor, defines which hosts can provide it, 
defines server flavor...)
* person who boots server (picks server flavor, defines neutron ports)

Note, I don't see the person who boots the server ever seeing the pci-flavor, 
only understanding the server flavor.
[IrenaB] I am not sure that elaborating PCI device request into server flavor 
is the right approach for the PCI pass-through network case. vNIC by its nature 
is something dynamic that can be plugged or unplugged after VM boot. server 
flavor is  quite static.

We might also want a "nic-flavor" that tells neutron information it requires, 
but lets get to that later...
[IrenaB] nic flavor is definitely something that we need in order to choose if  
high performance (PCI pass-through) or virtio (i.e. OVS) nic will be created.

> Secondly, I totally disagree with this approach.  This assumes that 
> description of the (cloud-internal, hardware) details of each compute 
> node is best done with data stored centrally and driven by an API.  I 
> don't agree with either of these points.

Possibly, but I would like to first agree on the use cases and data model we 
want.

Nova has generally gone for APIs over config in recent times.
Mostly so you can do run-time configuration of the system.
But lets just see what makes sense when we have the use cases agreed.

>> On 2013年12月16日 22:27, Robert Li (baoli) wrote:
>> I'd like to give you guy a summary of current state, let's discuss it 
>> then.
>> https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support
>>
>>
>> 1)  fade out alias ( i think this ok for all)
>> 2)  white list became pic-flavor ( i think this ok for all)
>> 3)  address simply regular expression support: only * and a number 
>> range is support [hex-hex]. ( i think this ok?)
>> 4)  aggregate : now it's clear enough, and won't impact SRIOV.  ( i 
>> think this irrelevant to SRIOV now)

So... this means we have:

PCI-flavor:
* i.e. standardGPU, standardGPUnew, fastGPU, hdFlash1TB etc

Host mapping:
* decide which hosts you allow a particular flavor to be used
* note, the scheduler still needs to find out if any devices are "free"

flavor (of the server):
* usual RAM, CPU, Storage
* use extra specs to add PCI devices
* example:
** add one PCI device, choice of standardGPU or standardGPUnew
** also add: one hdFlash1TB

Now, the other bit is SRIOV... At a high level:

Neutron:
* user wants to connect to a particular neutron network
* user wants a super-fast SRIOV connection

Administration:
* needs to map PCI device to what neutron network the connect to

The big question is:
* is this a specific SRIOV only (provider) network
* OR... are other non-SRIOV connections also made to that same network

I feel we have to go for that latter. Imagine a network on VLAN 42, you might 
want some SRIOV into that network, and some OVS connecting into the same 
network. The user might have VMs connected using both methods, so wants the 
same IP address ranges and same network id spanning both.
[IrenaB] Agree. SRIOV connection is the choice for certain VM on certain 
network. The same VM can be connected to other network via virtio nic as well 
as other VMs can be connected to the same network via virtio nics.

If we go for that latter new either need:
* some kind of nic-flavor
** boot ... -nic nic-id:"public-id:,nic-flavor:"10GBpassthrough"
** but neutron could store nic-flavor, and pass it through to VIF driver, and 
user says port-id
* OR add NIC config int

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread John Garbutt

On 19 December 2013 12:54, John Garbutt  wrote:
> On 19 December 2013 12:21, Ian Wells  wrote:
>>
>> John:
>>>
>>> At a high level:
>>>
>>> Neutron:
>>> * user wants to connect to a particular neutron network
>>> * user wants a super-fast SRIOV connection
>>>
>>> Administration:
>>> * needs to map PCI device to what neutron network the connect to
>>>
>>> The big question is:
>>> * is this a specific SRIOV only (provider) network
>>> * OR... are other non-SRIOV connections also made to that same network
>>>
>>> I feel we have to go for that latter. Imagine a network on VLAN 42,
>>> you might want some SRIOV into that network, and some OVS connecting
>>> into the same network. The user might have VMs connected using both
>>> methods, so wants the same IP address ranges and same network id
>>> spanning both.
>>>
>>>
>>> If we go for that latter new either need:
>>> * some kind of nic-flavor
>>> ** boot ... -nic nic-id:"public-id:,nic-flavor:"10GBpassthrough"
>>> ** but neutron could store nic-flavor, and pass it through to VIF
>>> driver, and user says port-id
>>> * OR add NIC config into the server flavor
>>> ** extra spec to say, tell VIF driver it could use on of this list of
>>> PCI devices: (list pci-flavors)
>>> * OR do both
>>>
>>> I vote for nic-flavor only, because it matches the volume-type we have
>>> with cinder.
>>
>>
>> I think the issue there is that Nova is managing the supply of PCI devices
>> (which is limited and limited on a per-machine basis).  Indisputably you
>> need to select the NIC you want to use as a passthrough rather than a vnic
>> device, so there's something in the --nic argument, but you have to answer
>> two questions:
>>
>> - how many devices do you need (which is now not a flavor property but in
>> the --nic list, which seems to me an odd place to be defining billable
>> resources)
>> - what happens when someone does nova interface-attach?
>
> Agreed.

Apologies, I misread what you put, maybe we don't agree...

I am just trying not to make a passthrough NIC and odd special case.

In my mind, it should just be a regular neturon port connection that
happens to be implemented using PCI passthrough.

I agree we need to sort out the scheduling of that, because its a
finite resource.

> The --nic list specifies how many NICs.
>
> I was suggesting adding a nic-flavor on each --nic spec to say if its
> PCI passthrough vs virtual NIC.
>
>> Cinder's an indirect parallel because the resources it's adding to the
>> hypervisor are virtual and unlimited, I think, or am I missing something
>> here?
>
> I was more referring more to the different "volume-types" i.e. "fast
> volume" or "normal volume".
> And how that is similar to "virtual" vs "fast PCI passthough" vs "slow
> PCI passthrough"
>
> Local volumes probably have the same issues as PCI passthrough with
> "finite resources".
> But I am not sure we have a good solution for that yet.
>
> Mostly, it seems right that Cinder and Neutron own the configuration
> about the volume and network resources.
>
> The VIF driver and volume drivers seem to have a similar sort of
> relationship with Cinder and Neutron vs Nova.
>
> Then the issues boils down to visibility into that data so we can
> schedule efficiently, which is no easy problem.
>
>>>
>>> However, it does suggest that Nova should leave all the SRIOV work to
>>> the VIF driver.
>>> So the VIF driver, as activate by neutron, will understand which PCI
>>> devices to passthrough.
>>>
>>> Similar to the plan with brick, we could have an oslo lib that helps
>>> you attach SRIOV devices that could be used by the neturon VIF drivers
>>> and the nova PCI passthrough code.
>>
>> I'm not clear that this is necessary.
>>
>> At the moment with vNICs, you pass through devices by having a co-operation
>> between Neutron (which configures a way of attaching them to put them on a
>> certain network) and the hypervisor specific code (which creates them in the
>> instance and attaches them as instructed by Neutron).  Why would we not
>> follow the same pattern with passthrough devices?  In this instance, neutron
>> would tell nova that when it's plugging this device it should be a
>> passthrough device, and pass any additional parameters like the VF encap,
>> and Nova would do as instructed, then Neutron would reconfigure whatever
>> parts of the network need to be reconfigured in concert with the
>> hypervisor's settings to make the NIC a part of the specified network.
>
> I agree, in general terms.
>
> Firstly, do you agree the neutron network-id can be used for
> passthrough and non-passthrough VIF connections? i.e. a neturon
> network-id does not imply PCI-passthrough.
>
> Secondly, we need to agree on the information flow around defining the
> "flavor" of the NIC. i.e. virtual or passthroughFast or
> passthroughNormal.
>
> My gut feeling is that neutron port description should somehow define
> this via a nic-flavor that maps to a group of pci-flavors.
>
> But from a billing point of view, I like t

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread John Garbutt

On 19 December 2013 12:21, Ian Wells  wrote:
>
> John:
>>
>> At a high level:
>>
>> Neutron:
>> * user wants to connect to a particular neutron network
>> * user wants a super-fast SRIOV connection
>>
>> Administration:
>> * needs to map PCI device to what neutron network the connect to
>>
>> The big question is:
>> * is this a specific SRIOV only (provider) network
>> * OR... are other non-SRIOV connections also made to that same network
>>
>> I feel we have to go for that latter. Imagine a network on VLAN 42,
>> you might want some SRIOV into that network, and some OVS connecting
>> into the same network. The user might have VMs connected using both
>> methods, so wants the same IP address ranges and same network id
>> spanning both.
>>
>>
>> If we go for that latter new either need:
>> * some kind of nic-flavor
>> ** boot ... -nic nic-id:"public-id:,nic-flavor:"10GBpassthrough"
>> ** but neutron could store nic-flavor, and pass it through to VIF
>> driver, and user says port-id
>> * OR add NIC config into the server flavor
>> ** extra spec to say, tell VIF driver it could use on of this list of
>> PCI devices: (list pci-flavors)
>> * OR do both
>>
>> I vote for nic-flavor only, because it matches the volume-type we have
>> with cinder.
>
>
> I think the issue there is that Nova is managing the supply of PCI devices
> (which is limited and limited on a per-machine basis).  Indisputably you
> need to select the NIC you want to use as a passthrough rather than a vnic
> device, so there's something in the --nic argument, but you have to answer
> two questions:
>
> - how many devices do you need (which is now not a flavor property but in
> the --nic list, which seems to me an odd place to be defining billable
> resources)
> - what happens when someone does nova interface-attach?

Agreed.

The --nic list specifies how many NICs.

I was suggesting adding a nic-flavor on each --nic spec to say if its
PCI passthrough vs virtual NIC.

> Cinder's an indirect parallel because the resources it's adding to the
> hypervisor are virtual and unlimited, I think, or am I missing something
> here?

I was more referring more to the different "volume-types" i.e. "fast
volume" or "normal volume".
And how that is similar to "virtual" vs "fast PCI passthough" vs "slow
PCI passthrough"

Local volumes probably have the same issues as PCI passthrough with
"finite resources".
But I am not sure we have a good solution for that yet.

Mostly, it seems right that Cinder and Neutron own the configuration
about the volume and network resources.

The VIF driver and volume drivers seem to have a similar sort of
relationship with Cinder and Neutron vs Nova.

Then the issues boils down to visibility into that data so we can
schedule efficiently, which is no easy problem.

>>
>> However, it does suggest that Nova should leave all the SRIOV work to
>> the VIF driver.
>> So the VIF driver, as activate by neutron, will understand which PCI
>> devices to passthrough.
>>
>> Similar to the plan with brick, we could have an oslo lib that helps
>> you attach SRIOV devices that could be used by the neturon VIF drivers
>> and the nova PCI passthrough code.
>
> I'm not clear that this is necessary.
>
> At the moment with vNICs, you pass through devices by having a co-operation
> between Neutron (which configures a way of attaching them to put them on a
> certain network) and the hypervisor specific code (which creates them in the
> instance and attaches them as instructed by Neutron).  Why would we not
> follow the same pattern with passthrough devices?  In this instance, neutron
> would tell nova that when it's plugging this device it should be a
> passthrough device, and pass any additional parameters like the VF encap,
> and Nova would do as instructed, then Neutron would reconfigure whatever
> parts of the network need to be reconfigured in concert with the
> hypervisor's settings to make the NIC a part of the specified network.

I agree, in general terms.

Firstly, do you agree the neutron network-id can be used for
passthrough and non-passthrough VIF connections? i.e. a neturon
network-id does not imply PCI-passthrough.

Secondly, we need to agree on the information flow around defining the
"flavor" of the NIC. i.e. virtual or passthroughFast or
passthroughNormal.

My gut feeling is that neutron port description should somehow define
this via a nic-flavor that maps to a group of pci-flavors.

But from a billing point of view, I like the idea of the server flavor
saying to the VIF plug code, by the way, for this server, please
support all the nics using devices in pciflavor:fastNic should that be
possible for the users given port configuration. But this is leaking
neutron/networking information into Nova, which seems bad.

John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread Ian Wells

John:

> At a high level:
>
> Neutron:
> * user wants to connect to a particular neutron network
> * user wants a super-fast SRIOV connection

Administration:
> * needs to map PCI device to what neutron network the connect to
>
The big question is:
> * is this a specific SRIOV only (provider) network
> * OR... are other non-SRIOV connections also made to that same network
>
> I feel we have to go for that latter. Imagine a network on VLAN 42,
> you might want some SRIOV into that network, and some OVS connecting
> into the same network. The user might have VMs connected using both
> methods, so wants the same IP address ranges and same network id
> spanning both.
>

> If we go for that latter new either need:
> * some kind of nic-flavor
> ** boot ... -nic nic-id:"public-id:,nic-flavor:"10GBpassthrough"
> ** but neutron could store nic-flavor, and pass it through to VIF
> driver, and user says port-id
> * OR add NIC config into the server flavor
> ** extra spec to say, tell VIF driver it could use on of this list of
> PCI devices: (list pci-flavors)
> * OR do both
>
> I vote for nic-flavor only, because it matches the volume-type we have
> with cinder.
>

I think the issue there is that Nova is managing the supply of PCI devices
(which is limited and limited on a per-machine basis).  Indisputably you
need to select the NIC you want to use as a passthrough rather than a vnic
device, so there's something in the --nic argument, but you have to answer
two questions:

- how many devices do you need (which is now not a flavor property but in
the --nic list, which seems to me an odd place to be defining billable
resources)
- what happens when someone does nova interface-attach?

Cinder's an indirect parallel because the resources it's adding to the
hypervisor are virtual and unlimited, I think, or am I missing something
here?


> However, it does suggest that Nova should leave all the SRIOV work to
> the VIF driver.
> So the VIF driver, as activate by neutron, will understand which PCI
> devices to passthrough.
>
> Similar to the plan with brick, we could have an oslo lib that helps
> you attach SRIOV devices that could be used by the neturon VIF drivers
> and the nova PCI passthrough code.
>

I'm not clear that this is necessary.

At the moment with vNICs, you pass through devices by having a co-operation
between Neutron (which configures a way of attaching them to put them on a
certain network) and the hypervisor specific code (which creates them in
the instance and attaches them as instructed by Neutron).  Why would we not
follow the same pattern with passthrough devices?  In this instance,
neutron would tell nova that when it's plugging this device it should be a
passthrough device, and pass any additional parameters like the VF encap,
and Nova would do as instructed, then Neutron would reconfigure whatever
parts of the network need to be reconfigured in concert with the
hypervisor's settings to make the NIC a part of the specified network.
-- 
Ian.


>
> Thanks,
> John
>
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-12-19 Thread John Garbutt

Apologies for being late onto this thread, and not making the meeting
the other day.
Also apologies this is almost totally a top post.

On 17 December 2013 15:09, Ian Wells  wrote:
> Firstly, I disagree that
> https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support is an accurate
> reflection of the current state.  It's a very unilateral view, largely
> because the rest of us had been focussing on the google document that we've
> been using for weeks.

I haven't seen the google doc. I got involved through the blueprint
review of this:
https://blueprints.launchpad.net/nova/+spec/pci-extra-info

I assume its this one?
https://docs.google.com/document/d/1EMwDg9J8zOxzvTnQJ9HwZdiotaVstFWKIuKrPse6JOs

On a quick read, my main concern is separating out the "user" more:
* administration (defines pci-flavor, defines which hosts can provide
it, defines server flavor...)
* person who boots server (picks server flavor, defines neutron ports)

Note, I don't see the person who boots the server ever seeing the
pci-flavor, only understanding the server flavor.

We might also want a "nic-flavor" that tells neutron information it
requires, but lets get to that later...

> Secondly, I totally disagree with this approach.  This assumes that
> description of the (cloud-internal, hardware) details of each compute node
> is best done with data stored centrally and driven by an API.  I don't agree
> with either of these points.

Possibly, but I would like to first agree on the use cases and data
model we want.

Nova has generally gone for APIs over config in recent times.
Mostly so you can do run-time configuration of the system.
But lets just see what makes sense when we have the use cases agreed.

>> On 2013年12月16日 22:27, Robert Li (baoli) wrote:
>> I'd like to give you guy a summary of current state, let's discuss it
>> then.
>> https://wiki.openstack.org/wiki/PCI_passthrough_SRIOV_support
>>
>>
>> 1)  fade out alias ( i think this ok for all)
>> 2)  white list became pic-flavor ( i think this ok for all)
>> 3)  address simply regular expression support: only * and a number range
>> is support [hex-hex]. ( i think this ok?)
>> 4)  aggregate : now it's clear enough, and won't impact SRIOV.  ( i think
>> this irrelevant to SRIOV now)

So... this means we have:

PCI-flavor:
* i.e. standardGPU, standardGPUnew, fastGPU, hdFlash1TB etc

Host mapping:
* decide which hosts you allow a particular flavor to be used
* note, the scheduler still needs to find out if any devices are "free"

flavor (of the server):
* usual RAM, CPU, Storage
* use extra specs to add PCI devices
* example:
** add one PCI device, choice of standardGPU or standardGPUnew
** also add: one hdFlash1TB

Now, the other bit is SRIOV... At a high level:

Neutron:
* user wants to connect to a particular neutron network
* user wants a super-fast SRIOV connection

Administration:
* needs to map PCI device to what neutron network the connect to

The big question is:
* is this a specific SRIOV only (provider) network
* OR... are other non-SRIOV connections also made to that same network

I feel we have to go for that latter. Imagine a network on VLAN 42,
you might want some SRIOV into that network, and some OVS connecting
into the same network. The user might have VMs connected using both
methods, so wants the same IP address ranges and same network id
spanning both.

If we go for that latter new either need:
* some kind of nic-flavor
** boot ... -nic nic-id:"public-id:,nic-flavor:"10GBpassthrough"
** but neutron could store nic-flavor, and pass it through to VIF
driver, and user says port-id
* OR add NIC config into the server flavor
** extra spec to say, tell VIF driver it could use on of this list of
PCI devices: (list pci-flavors)
* OR do both

I vote for nic-flavor only, because it matches the volume-type we have
with cinder.

However, it does suggest that Nova should leave all the SRIOV work to
the VIF driver.
So the VIF driver, as activate by neutron, will understand which PCI
devices to passthrough.

Similar to the plan with brick, we could have an oslo lib that helps
you attach SRIOV devices that could be used by the neturon VIF drivers
and the nova PCI passthrough code.

Thanks,
John

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-11-01 Thread Isaku Yamahata

Port profile is generic way of Neutron to pass plugin-specific data
as dictionary. Cisco plugin uses it to pass VMEFX specific data.
Robert, correct me if I'm wrong.

thanks,
---
Isaku Yamahata 


On Thu, Oct 31, 2013 at 10:21:20PM +,
"Jiang, Yunhong"  wrote:

> Robert, I think your change request for pci alias should be covered by the 
> extra infor enhancement. 
> https://blueprints.launchpad.net/nova/+spec/pci-extra-info  and Yongli is 
> working on it.
> 
> I'm not sure how the port profile is passed to the connected switch, is it a 
> Cisco VMEFX specific method or libvirt method? Sorry I'm not well on network 
> side.
> 
> --jyh
> 
> From: Robert Li (baoli) [mailto:ba...@cisco.com]
> Sent: Wednesday, October 30, 2013 10:13 AM
> To: Irena Berezovsky; Jiang, Yunhong; prashant.upadhy...@aricent.com; 
> chris.frie...@windriver.com; He, Yongli; Itzik Brown
> Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
> (kmestery); Sandhya Dasu (sadasu)
> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support
> 
> Hi,
> 
> Regarding physical network mapping,  This is what I thought.
> 
> consider the following scenarios:
>1. a compute node with SRIOV only interfaces attached to a physical 
> network. the node is connected to one upstream switch
>2. a compute node with both SRIOV interfaces and non-SRIOV interfaces 
> attached to a physical network. the node is connected to one upstream switch
>3. in addition to case 1 &2, a compute node may have multiple vNICs that 
> are connected to different upstream switches.
> 
> CASE 1:
>  -- the mapping from a virtual network (in terms of neutron) to a physical 
> network is actually done by binding a port profile to a neutron port. With 
> cisco's VM-FEX, a port profile is associated with one or multiple vlans. Once 
> the neutron port is bound with this port-profile in the upstream switch, it's 
> effectively plugged into the physical network.
>  -- since the compute node is connected to one upstream switch, the existing 
> nova PCI alias will be sufficient. For example, one can boot a Nova instance 
> that is attached to a SRIOV port with the following command:
>   nova boot -flavor m1.large -image  --nic 
> net-id=,pci-alias=,sriov=,port-profile=
> the net-id will be useful in terms of allocating IP address, enable dhcp, 
> etc that is associated with the network.
> -- the pci-alias specified in the nova boot command is used to create a PCI 
> request for scheduling purpose. a PCI device is bound to a neutron port 
> during the instance build time in the case of nova boot. Before invoking the 
> neutron API to create a port, an allocated PCI device out of a PCI alias will 
> be located from the PCI device list object. This device info among other 
> information will be sent to neutron to create the port.
> 
> CASE 2:
> -- Assume that OVS is used for the non-SRIOV interfaces. An example of 
> configuration with ovs plugin would look like:
> bridge_mappings = physnet1:br-vmfex
> network_vlan_ranges = physnet1:15:17
> tenant_network_type = vlan
> When a neutron network is created, a vlan is either allocated or 
> specified in the neutron net-create command. Attaching a physical interface 
> to the bridge (in the above example br-vmfex) is an administrative task.
> -- to create a Nova instance with non-SRIOV port:
>nova boot -flavor m1.large -image  --nic net-id=
> -- to create a Nova instance with SRIOV port:
>nova boot -flavor m1.large -image  --nic 
> net-id=,pci-alias=,sriov=,port-profile=
> it's essentially the same as in the first case. But since the net-id is 
> already associated with a vlan, the vlan associated with the port-profile 
> must be identical to that vlan. This has to be enforced by neutron.
> again, since the node is connected to one upstream switch, the existing 
> nova PCI alias should be sufficient.
> 
> CASE 3:
> -- A compute node might be connected to multiple upstream switches, with each 
> being a separate network. This means SRIOV PFs/VFs are already implicitly 
> associated with physical networks. In the none-SRIOV case, a physical 
> interface is associated with a physical network by plugging it into that 
> network, and attaching this interface to the ovs bridge that represents this 
> physical network on the compute node. In the SRIOV case, we need a way to 
> group the SRIOV VFs that belong to the same physical networks. The existing 
> nova PCI alias is to facilitate PCI device allocation by associating 
>  with an alias name. This will no longer be 
> sufficient. But it can be enhanced to achieve our goal

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-30 Thread Isaku Yamahata

On Wed, Oct 30, 2013 at 04:14:40AM +,
"Jiang, Yunhong"  wrote:

> > But how about long term direction?
> > Neutron should know/manage such network related resources on
> > compute nodes?
> 
> So you mean the PCI device management will be spited between Nova and 
> Neutron? For example, non-NIC device owned by nova and NIC device owned by 
> neutron?

Yes. But I'd like to hear from other Neutron developers.


> There have been so many discussion of the scheduler enhancement, like 
> https://etherpad.openstack.org/p/grizzly-split-out-scheduling , so possibly 
> that's the right direction? Let's wait for the summit discussion.

Interesting. Yeah, I look forward for the summit discussion.
Let's try to involve not only Nova developers, but also other Neutron
developers.

thanks,
-- 
Isaku Yamahata 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong



> -Original Message-
> From: Isaku Yamahata [mailto:isaku.yamah...@gmail.com]
> Sent: Tuesday, October 29, 2013 8:24 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Cc: isaku.yamah...@gmail.com; Itzik Brown
> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
> 
> Hi Yunhong.
> 
> On Tue, Oct 29, 2013 at 08:22:40PM +,
> "Jiang, Yunhong"  wrote:
> 
> > > * describe resource external to nova that is attached to VM in the API
> > > (block device mapping and/or vif references)
> > > * ideally the nova scheduler needs to be aware of the local capacity,
> > > and how that relates to the above information (relates to the cross
> > > service scheduling issues)
> >
> > I think this possibly a bit different. For volume, it's sure managed by
> Cinder, but for PCI devices, currently
> > It ;s managed by nova. So we possibly need nova to translate the
> information (possibly before nova scheduler).
> >
> > > * state of the device should be stored by Neutron/Cinder
> > > (attached/detached, capacity, IP, etc), but still exposed to the
> > > "scheduler"
> >
> > I'm not sure if we can keep the state of the device in Neutron. Currently
> nova manage all PCI devices.
> 
> Yes, with the current implementation, nova manages PCI devices and it
> works.
> That's great. It will remain so in Icehouse cycle (maybe also J?).
> 
> But how about long term direction?
> Neutron should know/manage such network related resources on
> compute nodes?

So you mean the PCI device management will be spited between Nova and Neutron? 
For example, non-NIC device owned by nova and NIC device owned by neutron?

There have been so many discussion of the scheduler enhancement, like 
https://etherpad.openstack.org/p/grizzly-split-out-scheduling , so possibly 
that's the right direction? Let's wait for the summit discussion.

> The implementation in Nova will be moved into Neutron like what Cinder
> did?
> any opinions/thoughts?
> It seems that not so many Neutron developers are interested in PCI
> passthrough at the moment, though.
> 
> There are use cases for this, I think.
> For example, some compute nodes use OVS plugin, another nodes LB
> plugin.
> (Right now it may not possible easily, but it will be with ML2 plugin and
> mechanism driver). User wants their VMs to run on nodes with OVS plugin
> for
> some reason(e.g. performance difference).
> Such usage would be handled similarly.
> 
> Thanks,
> ---
> Isaku Yamahata
> 
> 
> >
> > Thanks
> > --jyh
> >
> >
> > > * connection params get given to Nova from Neutron/Cinder
> > > * nova still has the vif driver or volume driver to make the final
> connection
> > > * the disk should be formatted/expanded, and network info injected in
> > > the same way as before (cloud-init, config drive, DHCP, etc)
> > >
> > > John
> > >
> > > On 29 October 2013 10:17, Irena Berezovsky
> 
> > > wrote:
> > > > Hi Jiang, Robert,
> > > >
> > > > IRC meeting option works for me.
> > > >
> > > > If I understand your question below, you are looking for a way to tie
> up
> > > > between requested virtual network(s) and requested PCI device(s).
> The
> > > way we
> > > > did it in our solution  is to map a provider:physical_network to an
> > > > interface that represents the Physical Function. Every virtual
> network is
> > > > bound to the provider:physical_network, so the PCI device should
> be
> > > > allocated based on this mapping.  We can  map a PCI alias to the
> > > > provider:physical_network.
> > > >
> > > >
> > > >
> > > > Another topic to discuss is where the mapping between neutron
> port
> > > and PCI
> > > > device should be managed. One way to solve it, is to propagate the
> > > allocated
> > > > PCI device details to neutron on port creation.
> > > >
> > > > In case  there is no qbg/qbh support, VF networking configuration
> > > should be
> > > > applied locally on the Host.
> > > >
> > > > The question is when and how to apply networking configuration on
> the
> > > PCI
> > > > device?
> > > >
> > > > We see the following options:
> > > >
> > > > * it can be done on port creation.
> > > >
> > > > *

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Isaku Yamahata

Hi Yunhong.

On Tue, Oct 29, 2013 at 08:22:40PM +,
"Jiang, Yunhong"  wrote:

> > * describe resource external to nova that is attached to VM in the API
> > (block device mapping and/or vif references)
> > * ideally the nova scheduler needs to be aware of the local capacity,
> > and how that relates to the above information (relates to the cross
> > service scheduling issues)
> 
> I think this possibly a bit different. For volume, it's sure managed by 
> Cinder, but for PCI devices, currently
> It ;s managed by nova. So we possibly need nova to translate the information 
> (possibly before nova scheduler).
> 
> > * state of the device should be stored by Neutron/Cinder
> > (attached/detached, capacity, IP, etc), but still exposed to the
> > "scheduler"
> 
> I'm not sure if we can keep the state of the device in Neutron. Currently 
> nova manage all PCI devices.

Yes, with the current implementation, nova manages PCI devices and it works.
That's great. It will remain so in Icehouse cycle (maybe also J?).

But how about long term direction?
Neutron should know/manage such network related resources on compute nodes?
The implementation in Nova will be moved into Neutron like what Cinder did?
any opinions/thoughts?
It seems that not so many Neutron developers are interested in PCI
passthrough at the moment, though.

There are use cases for this, I think.
For example, some compute nodes use OVS plugin, another nodes LB plugin.
(Right now it may not possible easily, but it will be with ML2 plugin and
mechanism driver). User wants their VMs to run on nodes with OVS plugin for
some reason(e.g. performance difference).
Such usage would be handled similarly.

Thanks,
---
Isaku Yamahata


> 
> Thanks
> --jyh
> 
> 
> > * connection params get given to Nova from Neutron/Cinder
> > * nova still has the vif driver or volume driver to make the final 
> > connection
> > * the disk should be formatted/expanded, and network info injected in
> > the same way as before (cloud-init, config drive, DHCP, etc)
> > 
> > John
> > 
> > On 29 October 2013 10:17, Irena Berezovsky 
> > wrote:
> > > Hi Jiang, Robert,
> > >
> > > IRC meeting option works for me.
> > >
> > > If I understand your question below, you are looking for a way to tie up
> > > between requested virtual network(s) and requested PCI device(s). The
> > way we
> > > did it in our solution  is to map a provider:physical_network to an
> > > interface that represents the Physical Function. Every virtual network is
> > > bound to the provider:physical_network, so the PCI device should be
> > > allocated based on this mapping.  We can  map a PCI alias to the
> > > provider:physical_network.
> > >
> > >
> > >
> > > Another topic to discuss is where the mapping between neutron port
> > and PCI
> > > device should be managed. One way to solve it, is to propagate the
> > allocated
> > > PCI device details to neutron on port creation.
> > >
> > > In case  there is no qbg/qbh support, VF networking configuration
> > should be
> > > applied locally on the Host.
> > >
> > > The question is when and how to apply networking configuration on the
> > PCI
> > > device?
> > >
> > > We see the following options:
> > >
> > > * it can be done on port creation.
> > >
> > > * It can be done when nova VIF driver is called for vNIC
> > plugging.
> > > This will require to  have all networking configuration available to the
> > VIF
> > > driver or send request to the neutron server to obtain it.
> > >
> > > * It can be done by  having a dedicated L2 neutron agent on
> > each
> > > Host that scans for allocated PCI devices  and then retrieves networking
> > > configuration from the server and configures the device. The agent will
> > be
> > > also responsible for managing update requests coming from the neutron
> > > server.
> > >
> > >
> > >
> > > For macvtap vNIC type assignment, the networking configuration can be
> > > applied by a dedicated L2 neutron agent.
> > >
> > >
> > >
> > > BR,
> > >
> > > Irena
> > >
> > >
> > >
> > > From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> > > Sent: Tuesday, October 29, 2013 9:04 AM
> > >
> > >
> > > To: Robert Li (baoli); Irena Berezovsky;
> > prashant.upadhy...@aricent.com;
> >

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Henry Gessau

On Tue, Oct 29, at 5:52 pm, Jiang, Yunhong  wrote:

>> -Original Message-
>> From: Henry Gessau [mailto:ges...@cisco.com]
>> Sent: Tuesday, October 29, 2013 2:23 PM
>> To: OpenStack Development Mailing List (not for usage questions)
>> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>> 
>> On Tue, Oct 29, at 4:31 pm, Jiang, Yunhong 
>> wrote:
>> 
>> > Henry,why do you think the "service VM" need the entire PF instead of a
>> > VF? I think the SR-IOV NIC should provide QoS and performance
>> isolation.
>> 
>> I was speculating. I just thought it might be a good idea to leave open the
>> possibility of assigning a PF to a VM if the need arises.
>> 
>> Neutron service VMs are a new thing. I will be following the discussions
>> and
>> there is a summit session for them. It remains to be seen if there is any
>> desire/need for full PF ownership of NICs. But if a service VM owns the PF
>> and has the right NIC driver it could do some advanced features with it.
>> 
> At least in current PCI implementation, if a device has no SR-IOV
> enabled, then that device will be exposed and can be assigned (is this
> your so-called PF?).

Apologies, this was not clear to me until now. Thanks. I am not aware of a
use-case for a service VM needing to control VFs. So you are right, I should
not have talked about PF but rather just the entire NIC device in
passthrough mode, no SR-IOV needed.

So the admin will need to know: Put a NIC in SR-IOV mode if it is to be used
by multiple VMs. Put a NIC in single device passthrough mode if it is to be
used by one service VM.

> If a device has SR-IOV enabled, then only VF be
> exposed and the PF is hidden from resource tracker. The reason is, when
> SR-IOV enabled, the PF is mostly used to configure and management the
> VFs, and it will be security issue to expose the PF to a guest.

Thanks for bringing up the security issue. If a physical network interface
is connected in a special way to some switch/router with the intention being
for it to be used only by a service VM, then close attention must be paid to
security. The device owner might get some low-level network access that can
be misused.

> I'm not sure if you are talking about the PF, are you talking about the
> PF w/ or w/o SR-IOV enabled.
> 
> I totally agree that assign a PCI NIC to service VM have a lot of benefit
> from both performance and isolation point of view.
> 
> Thanks
> --jyh
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong



> -Original Message-
> From: Henry Gessau [mailto:ges...@cisco.com]
> Sent: Tuesday, October 29, 2013 2:23 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
> 
> On Tue, Oct 29, at 4:31 pm, Jiang, Yunhong 
> wrote:
> 
> > Henry,why do you think the "service VM" need the entire PF instead of a
> > VF? I think the SR-IOV NIC should provide QoS and performance
> isolation.
> 
> I was speculating. I just thought it might be a good idea to leave open the
> possibility of assigning a PF to a VM if the need arises.
> 
> Neutron service VMs are a new thing. I will be following the discussions
> and
> there is a summit session for them. It remains to be seen if there is any
> desire/need for full PF ownership of NICs. But if a service VM owns the PF
> and has the right NIC driver it could do some advanced features with it.
> 
At least in current PCI implementation, if a device has no SR-IOV enabled, then 
that device will be exposed and can be assigned (is this your so-called PF?). 
If a device has SR-IOV enabled, then only VF be exposed and the PF is hidden 
from resource tracker. The reason is, when SR-IOV enabled, the PF is mostly 
used to configure and management the VFs, and it will be security issue to 
expose the PF to a guest.

I'm not sure if you are talking about the PF, are you talking about the PF w/ 
or w/o SR-IOV enabled. 

I totally agree that assign a PCI NIC to service VM have a lot of benefit from 
both performance and isolation point of view.

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Chris Friesen


On 10/29/2013 03:23 PM, Henry Gessau wrote:

On Tue, Oct 29, at 4:31 pm, Jiang, Yunhong  wrote:



As to assign entire PCI device to a guest, that should be ok since
usually PF and VF has different device ID, the tricky thing is, at least
for some PCI devices, you can't configure that some NIC will have SR-IOV
enabled while others not.


Thanks for the warning. :) Perhaps the cloud admin might plug in an extra
NIC in just a few nodes (one or two per rack, maybe) for the purpose of
running service VMs there. Again, just speculating. I don't know how hard it
is to manage non-homogenous nodes.


Perhaps those nodes could be identified using a host-aggregate with 
suitable metadata?


Chris


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Henry Gessau

On Tue, Oct 29, at 4:31 pm, Jiang, Yunhong  wrote:

> Henry,why do you think the "service VM" need the entire PF instead of a
> VF? I think the SR-IOV NIC should provide QoS and performance isolation.

I was speculating. I just thought it might be a good idea to leave open the
possibility of assigning a PF to a VM if the need arises.

Neutron service VMs are a new thing. I will be following the discussions and
there is a summit session for them. It remains to be seen if there is any
desire/need for full PF ownership of NICs. But if a service VM owns the PF
and has the right NIC driver it could do some advanced features with it.

> As to assign entire PCI device to a guest, that should be ok since
> usually PF and VF has different device ID, the tricky thing is, at least
> for some PCI devices, you can't configure that some NIC will have SR-IOV
> enabled while others not.

Thanks for the warning. :) Perhaps the cloud admin might plug in an extra
NIC in just a few nodes (one or two per rack, maybe) for the purpose of
running service VMs there. Again, just speculating. I don't know how hard it
is to manage non-homogenous nodes.

> 
> Thanks
> --jyh
> 
>> -Original Message-
>> From: Henry Gessau [mailto:ges...@cisco.com]
>> Sent: Tuesday, October 29, 2013 8:10 AM
>> To: OpenStack Development Mailing List (not for usage questions)
>> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>> 
>> Lots of great info and discussion going on here.
>> 
>> One additional thing I would like to mention is regarding PF and VF usage.
>> 
>> Normally VFs will be assigned to instances, and the PF will either not be
>> used at all, or maybe some agent in the host of the compute node might
>> have
>> access to the PF for something (management?).
>> 
>> There is a neutron design track around the development of "service VMs".
>> These are dedicated instances that run neutron services like routers,
>> firewalls, etc. It is plausible that a service VM would like to use PCI
>> passthrough and get the entire PF. This would allow it to have complete
>> control over a physical link, which I think will be wanted in some cases.
>> 
>> --
>> Henry
>> 
>> On Tue, Oct 29, at 10:23 am, Irena Berezovsky 
>> wrote:
>> 
>> > Hi,
>> >
>> > I would like to share some details regarding the support provided by
>> > Mellanox plugin. It enables networking via SRIOV pass-through devices
>> or
>> > macvtap interfaces.  It plugin is available here:
>> >
>> https://github.com/openstack/neutron/tree/master/neutron/plugins/mln
>> x.
>> >
>> > To support either PCI pass-through device and macvtap interface type of
>> > vNICs, we set neutron port profile:vnic_type according to the required
>> VIF
>> > type and then use the created port to 'nova boot' the VM.
>> >
>> > To  overcome the missing scheduler awareness for PCI devices which
>> was not
>> > part of the Havana release yet, we
>> >
>> > have an additional service (embedded switch Daemon) that runs on each
>> > compute node.
>> >
>> > This service manages the SRIOV resources allocation,  answers vNICs
>> > discovery queries and applies VLAN/MAC configuration using standard
>> Linux
>> > APIs (code is here:
>> https://github.com/mellanox-openstack/mellanox-eswitchd
>> > ).  The embedded switch Daemon serves as a glue layer between VIF
>> Driver and
>> > Neutron Agent.
>> >
>> > In the Icehouse Release when SRIOV resources allocation is already part
>> of
>> > the Nova, we plan to eliminate the need in embedded switch daemon
>> service.
>> > So what is left to figure out is how to tie up between neutron port and
>> PCI
>> > device and invoke networking configuration.
>> >
>> >
>> >
>> > In our case what we have is actually the Hardware VEB that is not
>> programmed
>> > via either 802.1Qbg or 802.1Qbh, but configured locally by Neutron
>> Agent. We
>> > also support both Ethernet and InfiniBand physical network L2
>> technology.
>> > This means that we apply different configuration commands  to set
>> > configuration on VF.
>> >
>> >
>> >
>> > I guess what we have to figure out is how to support the generic case for
>> > the PCI device networking support, for HW VEB, 802.1Qbg and
>> 802.1Qbh cases.
>> >
>> >
>> >
>> > BR,
>> >
>> > Irena

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong

Henry,why do you think the "service VM" need the entire PF instead of a VF? I 
think the SR-IOV NIC should provide QoS and performance isolation.

As to assign entire PCI device to a guest, that should be ok since usually PF 
and VF has different device ID, the tricky thing is, at least for some PCI 
devices, you can't configure that some NIC will have SR-IOV enabled while 
others not.

Thanks
--jyh

> -Original Message-
> From: Henry Gessau [mailto:ges...@cisco.com]
> Sent: Tuesday, October 29, 2013 8:10 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
> 
> Lots of great info and discussion going on here.
> 
> One additional thing I would like to mention is regarding PF and VF usage.
> 
> Normally VFs will be assigned to instances, and the PF will either not be
> used at all, or maybe some agent in the host of the compute node might
> have
> access to the PF for something (management?).
> 
> There is a neutron design track around the development of "service VMs".
> These are dedicated instances that run neutron services like routers,
> firewalls, etc. It is plausible that a service VM would like to use PCI
> passthrough and get the entire PF. This would allow it to have complete
> control over a physical link, which I think will be wanted in some cases.
> 
> --
> Henry
> 
> On Tue, Oct 29, at 10:23 am, Irena Berezovsky 
> wrote:
> 
> > Hi,
> >
> > I would like to share some details regarding the support provided by
> > Mellanox plugin. It enables networking via SRIOV pass-through devices
> or
> > macvtap interfaces.  It plugin is available here:
> >
> https://github.com/openstack/neutron/tree/master/neutron/plugins/mln
> x.
> >
> > To support either PCI pass-through device and macvtap interface type of
> > vNICs, we set neutron port profile:vnic_type according to the required
> VIF
> > type and then use the created port to 'nova boot' the VM.
> >
> > To  overcome the missing scheduler awareness for PCI devices which
> was not
> > part of the Havana release yet, we
> >
> > have an additional service (embedded switch Daemon) that runs on each
> > compute node.
> >
> > This service manages the SRIOV resources allocation,  answers vNICs
> > discovery queries and applies VLAN/MAC configuration using standard
> Linux
> > APIs (code is here:
> https://github.com/mellanox-openstack/mellanox-eswitchd
> > ).  The embedded switch Daemon serves as a glue layer between VIF
> Driver and
> > Neutron Agent.
> >
> > In the Icehouse Release when SRIOV resources allocation is already part
> of
> > the Nova, we plan to eliminate the need in embedded switch daemon
> service.
> > So what is left to figure out is how to tie up between neutron port and
> PCI
> > device and invoke networking configuration.
> >
> >
> >
> > In our case what we have is actually the Hardware VEB that is not
> programmed
> > via either 802.1Qbg or 802.1Qbh, but configured locally by Neutron
> Agent. We
> > also support both Ethernet and InfiniBand physical network L2
> technology.
> > This means that we apply different configuration commands  to set
> > configuration on VF.
> >
> >
> >
> > I guess what we have to figure out is how to support the generic case for
> > the PCI device networking support, for HW VEB, 802.1Qbg and
> 802.1Qbh cases.
> >
> >
> >
> > BR,
> >
> > Irena
> >
> >
> >
> > *From:*Robert Li (baoli) [mailto:ba...@cisco.com]
> > *Sent:* Tuesday, October 29, 2013 3:31 PM
> > *To:* Jiang, Yunhong; Irena Berezovsky;
> prashant.upadhy...@aricent.com;
> > chris.frie...@windriver.com; He, Yongli; Itzik Brown
> > *Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen);
> Kyle
> > Mestery (kmestery); Sandhya Dasu (sadasu)
> > *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through
> network support
> >
> >
> >
> > Hi Yunhong,
> >
> >
> >
> > I haven't looked at Mellanox in much detail. I think that we'll get more
> > details from Irena down the road. Regarding your question, I can only
> answer
> > based on my experience with Cisco's VM-FEX. In a nutshell:
> >
> >  -- a vNIC is connected to an external switch. Once the host is
> booted
> > up, all the PFs and VFs provisioned on the vNIC will be created, as well as
> > all the corresponding ethernet interfaces .
> >
> >  -- As far as Neutron is

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong

Your explanation of the virtual network and physical network is quite clear and 
should work well. We need change nova code to achieve it, including get the 
physical network for the virtual network, passing the physical network 
requirement to the filter properties etc.

For your port method, so you mean we are sure to passing network id to 'nova 
boot' and nova will create the port during VM boot, am I right?  Also, how can 
nova knows that it need allocate the PCI device for the port? I'd suppose that 
in SR-IOV NIC environment, user don't need specify the PCI requirement. 
Instead, the PCI requirement should come from the network configuration and 
image property. Or you think user still need passing flavor with pci request?

--jyh


From: Irena Berezovsky [mailto:ire...@mellanox.com]
Sent: Tuesday, October 29, 2013 3:17 AM
To: Jiang, Yunhong; Robert Li (baoli); prashant.upadhy...@aricent.com; 
chris.frie...@windriver.com; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Jiang, Robert,
IRC meeting option works for me.
If I understand your question below, you are looking for a way to tie up 
between requested virtual network(s) and requested PCI device(s). The way we 
did it in our solution  is to map a provider:physical_network to an interface 
that represents the Physical Function. Every virtual network is bound to the 
provider:physical_network, so the PCI device should be allocated based on this 
mapping.  We can  map a PCI alias to the provider:physical_network.

Another topic to discuss is where the mapping between neutron port and PCI 
device should be managed. One way to solve it, is to propagate the allocated 
PCI device details to neutron on port creation.
In case  there is no qbg/qbh support, VF networking configuration should be 
applied locally on the Host.
The question is when and how to apply networking configuration on the PCI 
device?
We see the following options:

* it can be done on port creation.

* It can be done when nova VIF driver is called for vNIC plugging. This 
will require to  have all networking configuration available to the VIF driver 
or send request to the neutron server to obtain it.

* It can be done by  having a dedicated L2 neutron agent on each Host 
that scans for allocated PCI devices  and then retrieves networking 
configuration from the server and configures the device. The agent will be also 
responsible for managing update requests coming from the neutron server.


For macvtap vNIC type assignment, the networking configuration can be applied 
by a dedicated L2 neutron agent.

BR,
Irena

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Tuesday, October 29, 2013 9:04 AM

To: Robert Li (baoli); Irena Berezovsky; 
prashant.upadhy...@aricent.com<mailto:prashant.upadhy...@aricent.com>; 
chris.frie...@windriver.com<mailto:chris.frie...@windriver.com>; He, Yongli; 
Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support

Robert, is it possible to have a IRC meeting? I'd prefer to IRC meeting because 
it's more openstack style and also can keep the minutes clearly.

To your flow, can you give more detailed example. For example, I can consider 
user specify the instance with -nic option specify a network id, and then how 
nova device the requirement to the PCI device? I assume the network id should 
define the switches that the device can connect to , but how is that 
information translated to the PCI property requirement? Will this translation 
happen before the nova scheduler make host decision?

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, October 28, 2013 12:22 PM
To: Irena Berezovsky; 
prashant.upadhy...@aricent.com<mailto:prashant.upadhy...@aricent.com>; Jiang, 
Yunhong; chris.frie...@windriver.com<mailto:chris.frie...@windriver.com>; He, 
Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Irena,

Thank you very much for your comments. See inline.

--Robert

On 10/27/13 3:48 AM, "Irena Berezovsky" 
mailto:ire...@mellanox.com>> wrote:

Hi Robert,
Thank you very much for sharing the information regarding your efforts. Can you 
please share your idea of the end to end flow? How do you suggest  to bind Nova 
and Neutron?

The end to end flow is actually encompassed in the blueprints in a nutshell. I 
will reiterate it in below. The binding between Nova and Neutron occurs with 
the neutron v2 API that nova invokes in order to provision the neutron 
servic

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong


> * describe resource external to nova that is attached to VM in the API
> (block device mapping and/or vif references)
> * ideally the nova scheduler needs to be aware of the local capacity,
> and how that relates to the above information (relates to the cross
> service scheduling issues)

I think this possibly a bit different. For volume, it's sure managed by Cinder, 
but for PCI devices, currently
It ;s managed by nova. So we possibly need nova to translate the information 
(possibly before nova scheduler).

> * state of the device should be stored by Neutron/Cinder
> (attached/detached, capacity, IP, etc), but still exposed to the
> "scheduler"

I'm not sure if we can keep the state of the device in Neutron. Currently nova 
manage all PCI devices.

Thanks
--jyh


> * connection params get given to Nova from Neutron/Cinder
> * nova still has the vif driver or volume driver to make the final connection
> * the disk should be formatted/expanded, and network info injected in
> the same way as before (cloud-init, config drive, DHCP, etc)
> 
> John
> 
> On 29 October 2013 10:17, Irena Berezovsky 
> wrote:
> > Hi Jiang, Robert,
> >
> > IRC meeting option works for me.
> >
> > If I understand your question below, you are looking for a way to tie up
> > between requested virtual network(s) and requested PCI device(s). The
> way we
> > did it in our solution  is to map a provider:physical_network to an
> > interface that represents the Physical Function. Every virtual network is
> > bound to the provider:physical_network, so the PCI device should be
> > allocated based on this mapping.  We can  map a PCI alias to the
> > provider:physical_network.
> >
> >
> >
> > Another topic to discuss is where the mapping between neutron port
> and PCI
> > device should be managed. One way to solve it, is to propagate the
> allocated
> > PCI device details to neutron on port creation.
> >
> > In case  there is no qbg/qbh support, VF networking configuration
> should be
> > applied locally on the Host.
> >
> > The question is when and how to apply networking configuration on the
> PCI
> > device?
> >
> > We see the following options:
> >
> > * it can be done on port creation.
> >
> > * It can be done when nova VIF driver is called for vNIC
> plugging.
> > This will require to  have all networking configuration available to the
> VIF
> > driver or send request to the neutron server to obtain it.
> >
> > * It can be done by  having a dedicated L2 neutron agent on
> each
> > Host that scans for allocated PCI devices  and then retrieves networking
> > configuration from the server and configures the device. The agent will
> be
> > also responsible for managing update requests coming from the neutron
> > server.
> >
> >
> >
> > For macvtap vNIC type assignment, the networking configuration can be
> > applied by a dedicated L2 neutron agent.
> >
> >
> >
> > BR,
> >
> > Irena
> >
> >
> >
> > From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> > Sent: Tuesday, October 29, 2013 9:04 AM
> >
> >
> > To: Robert Li (baoli); Irena Berezovsky;
> prashant.upadhy...@aricent.com;
> > chris.frie...@windriver.com; He, Yongli; Itzik Brown
> >
> >
> > Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
> Mestery
> > (kmestery); Sandhya Dasu (sadasu)
> > Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network
> > support
> >
> >
> >
> > Robert, is it possible to have a IRC meeting? I'd prefer to IRC meeting
> > because it's more openstack style and also can keep the minutes
> clearly.
> >
> >
> >
> > To your flow, can you give more detailed example. For example, I can
> > consider user specify the instance with -nic option specify a network id,
> > and then how nova device the requirement to the PCI device? I assume
> the
> > network id should define the switches that the device can connect to ,
> but
> > how is that information translated to the PCI property requirement? Will
> > this translation happen before the nova scheduler make host decision?
> >
> >
> >
> > Thanks
> >
> > --jyh
> >
> >
> >
> > From: Robert Li (baoli) [mailto:ba...@cisco.com]
> > Sent: Monday, October 28, 2013 12:22 PM
> > To: Irena Berezovsky; prashant.upadhy...@aricent.com; Jiang, Yunhong;
> > chris.frie...@windriver.com; He, Yongli; Itzik Brown
> > Cc:

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Henry Gessau

Lots of great info and discussion going on here.

One additional thing I would like to mention is regarding PF and VF usage.

Normally VFs will be assigned to instances, and the PF will either not be
used at all, or maybe some agent in the host of the compute node might have
access to the PF for something (management?).

There is a neutron design track around the development of "service VMs".
These are dedicated instances that run neutron services like routers,
firewalls, etc. It is plausible that a service VM would like to use PCI
passthrough and get the entire PF. This would allow it to have complete
control over a physical link, which I think will be wanted in some cases.

-- 
Henry

On Tue, Oct 29, at 10:23 am, Irena Berezovsky  wrote:

> Hi,
> 
> I would like to share some details regarding the support provided by
> Mellanox plugin. It enables networking via SRIOV pass-through devices or
> macvtap interfaces.  It plugin is available here:
> https://github.com/openstack/neutron/tree/master/neutron/plugins/mlnx.
> 
> To support either PCI pass-through device and macvtap interface type of
> vNICs, we set neutron port profile:vnic_type according to the required VIF
> type and then use the created port to ‘nova boot’ the VM.
> 
> To  overcome the missing scheduler awareness for PCI devices which was not
> part of the Havana release yet, we
> 
> have an additional service (embedded switch Daemon) that runs on each
> compute node.  
> 
> This service manages the SRIOV resources allocation,  answers vNICs
> discovery queries and applies VLAN/MAC configuration using standard Linux
> APIs (code is here: https://github.com/mellanox-openstack/mellanox-eswitchd
> ).  The embedded switch Daemon serves as a glue layer between VIF Driver and
> Neutron Agent.
> 
> In the Icehouse Release when SRIOV resources allocation is already part of
> the Nova, we plan to eliminate the need in embedded switch daemon service.
> So what is left to figure out is how to tie up between neutron port and PCI
> device and invoke networking configuration.
> 
>  
> 
> In our case what we have is actually the Hardware VEB that is not programmed
> via either 802.1Qbg or 802.1Qbh, but configured locally by Neutron Agent. We
> also support both Ethernet and InfiniBand physical network L2 technology.
> This means that we apply different configuration commands  to set
> configuration on VF.
> 
>  
> 
> I guess what we have to figure out is how to support the generic case for
> the PCI device networking support, for HW VEB, 802.1Qbg and 802.1Qbh cases.
> 
>  
> 
> BR,
> 
> Irena
> 
>  
> 
> *From:*Robert Li (baoli) [mailto:ba...@cisco.com]
> *Sent:* Tuesday, October 29, 2013 3:31 PM
> *To:* Jiang, Yunhong; Irena Berezovsky; prashant.upadhy...@aricent.com;
> chris.frie...@windriver.com; He, Yongli; Itzik Brown
> *Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
> Mestery (kmestery); Sandhya Dasu (sadasu)
> *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through network 
> support
> 
>  
> 
> Hi Yunhong,
> 
>  
> 
> I haven't looked at Mellanox in much detail. I think that we'll get more
> details from Irena down the road. Regarding your question, I can only answer
> based on my experience with Cisco's VM-FEX. In a nutshell:
> 
>  -- a vNIC is connected to an external switch. Once the host is booted
> up, all the PFs and VFs provisioned on the vNIC will be created, as well as
> all the corresponding ethernet interfaces . 
> 
>  -- As far as Neutron is concerned, a neutron port can be associated
> with a VF. One way to do so is to specify this requirement in the —nic
> option, providing information such as:
> 
>. PCI alias (this is the same alias as defined in your nova
> blueprints)
> 
>. direct pci-passthrough/macvtap
> 
>. port profileid that is compliant with 802.1Qbh
> 
>  -- similar to how you translate the nova flavor with PCI requirements
> to PCI requests for scheduling purpose, Nova API (the nova api component)
> can translate the above to PCI requests for scheduling purpose. I can give
> more detail later on this. 
> 
>  
> 
> Regarding your last question, since the vNIC is already connected with the
> external switch, the vNIC driver will be responsible for communicating the
> port profile to the external switch. As you have already known, libvirt
> provides several ways to specify a VM to be booted up with SRIOV. For
> example, in the following interface definition: 
> 
>   
> 
>   **
> 
> *  *
> 
> * function='0x01'/>*
> 
> *  *
> 
> *  *
> 
> *  *
>

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Robert Li (baoli)

Hi Yunhong,

I haven't looked at Mellanox in much detail. I think that we'll get more 
details from Irena down the road. Regarding your question, I can only answer 
based on my experience with Cisco's VM-FEX. In a nutshell:
 -- a vNIC is connected to an external switch. Once the host is booted up, 
all the PFs and VFs provisioned on the vNIC will be created, as well as all the 
corresponding ethernet interfaces .
 -- As far as Neutron is concerned, a neutron port can be associated with a 
VF. One way to do so is to specify this requirement in the —nic option, 
providing information such as:
   . PCI alias (this is the same alias as defined in your nova 
blueprints)
   . direct pci-passthrough/macvtap
   . port profileid that is compliant with 802.1Qbh
 -- similar to how you translate the nova flavor with PCI requirements to 
PCI requests for scheduling purpose, Nova API (the nova api component) can 
translate the above to PCI requests for scheduling purpose. I can give more 
detail later on this.

Regarding your last question, since the vNIC is already connected with the 
external switch, the vNIC driver will be responsible for communicating the port 
profile to the external switch. As you have already known, libvirt provides 
several ways to specify a VM to be booted up with SRIOV. For example, in the 
following interface definition:


  
  

  
  
  

  



The SRIOV VF (bus 0x09, VF 0x01) will be allocated, and the port profile 
'my-port-profile' will be used to provision this VF. Libvirt will be 
responsible for invoking the vNIC driver to configure this VF with the port 
profile my-port-porfile. The driver will talk to the external switch using the 
802.1qbh standards to complete the VF's configuration and binding with the VM.


Now that nova PCI passthrough is responsible for 
discovering/scheduling/allocating a VF, the rest of the puzzle is to associate 
this PCI device with the feature that's going to use it, and the feature will 
be responsible for configuring it. You can also see from the above example, in 
one implementation of SRIOV, the feature (in this case neutron) may not need to 
do much in terms of working with the external switch, the work is actually done 
by libvirt behind the scene.


Now the questions are:

-- how the port profile gets defined/managed

-- how the port profile gets associated with a neutron network

The first question will be specific to the particular product, and therefore a 
particular neutron plugin has to mange that.

There may be several approaches to address the second question. For example, in 
the simplest case, a port profile can be associated with a neutron network. 
This has some significant drawbacks. Since the port profile defines features 
for all the ports that use it, the one port profile to one neutron network 
mapping would mean all the ports on the network will have exactly the same 
features (for example, QoS characteristics). To make it flexible, the binding 
of a port profile to a port may be done at the port creation time.


Let me know if the above answered your question.


thanks,

Robert

On 10/29/13 3:03 AM, "Jiang, Yunhong" 
mailto:yunhong.ji...@intel.com>> wrote:

Robert, is it possible to have a IRC meeting? I’d prefer to IRC meeting because 
it’s more openstack style and also can keep the minutes clearly.

To your flow, can you give more detailed example. For example, I can consider 
user specify the instance with –nic option specify a network id, and then how 
nova device the requirement to the PCI device? I assume the network id should 
define the switches that the device can connect to , but how is that 
information translated to the PCI property requirement? Will this translation 
happen before the nova scheduler make host decision?

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, October 28, 2013 12:22 PM
To: Irena Berezovsky; 
prashant.upadhy...@aricent.com<mailto:prashant.upadhy...@aricent.com>; Jiang, 
Yunhong; chris.frie...@windriver.com<mailto:chris.frie...@windriver.com>; He, 
Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Irena,

Thank you very much for your comments. See inline.

--Robert

On 10/27/13 3:48 AM, "Irena Berezovsky" 
mailto:ire...@mellanox.com>> wrote:

Hi Robert,
Thank you very much for sharing the information regarding your efforts. Can you 
please share your idea of the end to end flow? How do you suggest  to bind Nova 
and Neutron?

The end to end flow is actually encompassed in the blueprints in a nutshell. I 
will reiterate it in below. The binding between Nova and Neutron occurs with 
the neutron v2 API that nova invokes in order

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Robert Li (baoli)

Hi John,

Great to hear from you on Cinder with pcipassthrough. I thought that it
would be coming. I like the idea.

thanks,
Robert

On 10/29/13 6:46 AM, "John Garbutt"  wrote:

>I would love to see a symmetry between Cinder local volumes and
>Neutron PCI passthrough VIFs.
>
>Not entirely sure I have that clear in my head right now, but I just
>wanted to share the idea:
>* describe resource external to nova that is attached to VM in the API
>(block device mapping and/or vif references)
>* ideally the nova scheduler needs to be aware of the local capacity,
>and how that relates to the above information (relates to the cross
>service scheduling issues)
>* state of the device should be stored by Neutron/Cinder
>(attached/detached, capacity, IP, etc), but still exposed to the
>"scheduler"
>* connection params get given to Nova from Neutron/Cinder
>* nova still has the vif driver or volume driver to make the final
>connection
>* the disk should be formatted/expanded, and network info injected in
>the same way as before (cloud-init, config drive, DHCP, etc)
>
>John
>
>On 29 October 2013 10:17, Irena Berezovsky  wrote:
>> Hi Jiang, Robert,
>>
>> IRC meeting option works for me.
>>
>> If I understand your question below, you are looking for a way to tie up
>> between requested virtual network(s) and requested PCI device(s). The
>>way we
>> did it in our solution  is to map a provider:physical_network to an
>> interface that represents the Physical Function. Every virtual network
>>is
>> bound to the provider:physical_network, so the PCI device should be
>> allocated based on this mapping.  We can  map a PCI alias to the
>> provider:physical_network.
>>
>>
>>
>> Another topic to discuss is where the mapping between neutron port and
>>PCI
>> device should be managed. One way to solve it, is to propagate the
>>allocated
>> PCI device details to neutron on port creation.
>>
>> In case  there is no qbg/qbh support, VF networking configuration
>>should be
>> applied locally on the Host.
>>
>> The question is when and how to apply networking configuration on the
>>PCI
>> device?
>>
>> We see the following options:
>>
>> · it can be done on port creation.
>>
>> · It can be done when nova VIF driver is called for vNIC
>>plugging.
>> This will require to  have all networking configuration available to
>>the VIF
>> driver or send request to the neutron server to obtain it.
>>
>> · It can be done by  having a dedicated L2 neutron agent on each
>> Host that scans for allocated PCI devices  and then retrieves networking
>> configuration from the server and configures the device. The agent will
>>be
>> also responsible for managing update requests coming from the neutron
>> server.
>>
>>
>>
>> For macvtap vNIC type assignment, the networking configuration can be
>> applied by a dedicated L2 neutron agent.
>>
>>
>>
>> BR,
>>
>> Irena
>>
>>
>>
>> From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
>> Sent: Tuesday, October 29, 2013 9:04 AM
>>
>>
>> To: Robert Li (baoli); Irena Berezovsky; prashant.upadhy...@aricent.com;
>> chris.frie...@windriver.com; He, Yongli; Itzik Brown
>>
>>
>> Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
>>Mestery
>> (kmestery); Sandhya Dasu (sadasu)
>> Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network
>> support
>>
>>
>>
>> Robert, is it possible to have a IRC meeting? I¹d prefer to IRC meeting
>> because it¹s more openstack style and also can keep the minutes clearly.
>>
>>
>>
>> To your flow, can you give more detailed example. For example, I can
>> consider user specify the instance with nic option specify a network
>>id,
>> and then how nova device the requirement to the PCI device? I assume the
>> network id should define the switches that the device can connect to ,
>>but
>> how is that information translated to the PCI property requirement? Will
>> this translation happen before the nova scheduler make host decision?
>>
>>
>>
>> Thanks
>>
>> --jyh
>>
>>
>>
>> From: Robert Li (baoli) [mailto:ba...@cisco.com]
>> Sent: Monday, October 28, 2013 12:22 PM
>> To: Irena Berezovsky; prashant.upadhy...@aricent.com; Jiang, Yunhong;
>> chris.frie...@windriver.com; He, Yongli; Itzik Brown
>> Cc: OpenStack Development Mailing List; Brian Bowen (b

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Robert Li (baoli)

Hi,

sounds like there are enough interests for an IRC meeting before the summit. Do 
you guys know how to schedule a #openstack IRC meeting?

thanks,
Robert

On 10/29/13 6:17 AM, "Irena Berezovsky" 
mailto:ire...@mellanox.com>> wrote:

Hi Jiang, Robert,
IRC meeting option works for me.
If I understand your question below, you are looking for a way to tie up 
between requested virtual network(s) and requested PCI device(s). The way we 
did it in our solution  is to map a provider:physical_network to an interface 
that represents the Physical Function. Every virtual network is bound to the 
provider:physical_network, so the PCI device should be allocated based on this 
mapping.  We can  map a PCI alias to the provider:physical_network.

Another topic to discuss is where the mapping between neutron port and PCI 
device should be managed. One way to solve it, is to propagate the allocated 
PCI device details to neutron on port creation.
In case  there is no qbg/qbh support, VF networking configuration should be 
applied locally on the Host.
The question is when and how to apply networking configuration on the PCI 
device?
We see the following options:

· it can be done on port creation.

· It can be done when nova VIF driver is called for vNIC plugging. This 
will require to  have all networking configuration available to the VIF driver 
or send request to the neutron server to obtain it.

· It can be done by  having a dedicated L2 neutron agent on each Host 
that scans for allocated PCI devices  and then retrieves networking 
configuration from the server and configures the device. The agent will be also 
responsible for managing update requests coming from the neutron server.


For macvtap vNIC type assignment, the networking configuration can be applied 
by a dedicated L2 neutron agent.

BR,
Irena

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Tuesday, October 29, 2013 9:04 AM

To: Robert Li (baoli); Irena Berezovsky; 
prashant.upadhy...@aricent.com<mailto:prashant.upadhy...@aricent.com>; 
chris.frie...@windriver.com<mailto:chris.frie...@windriver.com>; He, Yongli; 
Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support

Robert, is it possible to have a IRC meeting? I’d prefer to IRC meeting because 
it’s more openstack style and also can keep the minutes clearly.

To your flow, can you give more detailed example. For example, I can consider 
user specify the instance with –nic option specify a network id, and then how 
nova device the requirement to the PCI device? I assume the network id should 
define the switches that the device can connect to , but how is that 
information translated to the PCI property requirement? Will this translation 
happen before the nova scheduler make host decision?

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, October 28, 2013 12:22 PM
To: Irena Berezovsky; 
prashant.upadhy...@aricent.com<mailto:prashant.upadhy...@aricent.com>; Jiang, 
Yunhong; chris.frie...@windriver.com<mailto:chris.frie...@windriver.com>; He, 
Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Irena,

Thank you very much for your comments. See inline.

--Robert

On 10/27/13 3:48 AM, "Irena Berezovsky" 
mailto:ire...@mellanox.com>> wrote:

Hi Robert,
Thank you very much for sharing the information regarding your efforts. Can you 
please share your idea of the end to end flow? How do you suggest  to bind Nova 
and Neutron?

The end to end flow is actually encompassed in the blueprints in a nutshell. I 
will reiterate it in below. The binding between Nova and Neutron occurs with 
the neutron v2 API that nova invokes in order to provision the neutron 
services. The vif driver is responsible for plugging in an instance onto the 
networking setup that neutron has created on the host.

Normally, one will invoke "nova boot" api with the —nic options to specify the 
nic with which the instance will be connected to the network. It currently 
allows net-id, fixed ip and/or port-id to be specified for the option. However, 
it doesn't allow one to specify special networking requirements for the 
instance. Thanks to the nova pci-passthrough work, one can specify PCI 
passthrough device(s) in the nova flavor. But it doesn't provide means to tie 
up these PCI devices in the case of ethernet adpators with networking services. 
Therefore the idea is actually simple as indicated by the blueprint titles, to 
provide means to tie up SRIOV devices with neutron services. A work flow would 
roughly look like this for 'nova boot':

  -- Specifies networking requiremen

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread John Garbutt

I would love to see a symmetry between Cinder local volumes and
Neutron PCI passthrough VIFs.

Not entirely sure I have that clear in my head right now, but I just
wanted to share the idea:
* describe resource external to nova that is attached to VM in the API
(block device mapping and/or vif references)
* ideally the nova scheduler needs to be aware of the local capacity,
and how that relates to the above information (relates to the cross
service scheduling issues)
* state of the device should be stored by Neutron/Cinder
(attached/detached, capacity, IP, etc), but still exposed to the
"scheduler"
* connection params get given to Nova from Neutron/Cinder
* nova still has the vif driver or volume driver to make the final connection
* the disk should be formatted/expanded, and network info injected in
the same way as before (cloud-init, config drive, DHCP, etc)

John

On 29 October 2013 10:17, Irena Berezovsky  wrote:
> Hi Jiang, Robert,
>
> IRC meeting option works for me.
>
> If I understand your question below, you are looking for a way to tie up
> between requested virtual network(s) and requested PCI device(s). The way we
> did it in our solution  is to map a provider:physical_network to an
> interface that represents the Physical Function. Every virtual network is
> bound to the provider:physical_network, so the PCI device should be
> allocated based on this mapping.  We can  map a PCI alias to the
> provider:physical_network.
>
>
>
> Another topic to discuss is where the mapping between neutron port and PCI
> device should be managed. One way to solve it, is to propagate the allocated
> PCI device details to neutron on port creation.
>
> In case  there is no qbg/qbh support, VF networking configuration should be
> applied locally on the Host.
>
> The question is when and how to apply networking configuration on the PCI
> device?
>
> We see the following options:
>
> · it can be done on port creation.
>
> · It can be done when nova VIF driver is called for vNIC plugging.
> This will require to  have all networking configuration available to the VIF
> driver or send request to the neutron server to obtain it.
>
> · It can be done by  having a dedicated L2 neutron agent on each
> Host that scans for allocated PCI devices  and then retrieves networking
> configuration from the server and configures the device. The agent will be
> also responsible for managing update requests coming from the neutron
> server.
>
>
>
> For macvtap vNIC type assignment, the networking configuration can be
> applied by a dedicated L2 neutron agent.
>
>
>
> BR,
>
> Irena
>
>
>
> From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> Sent: Tuesday, October 29, 2013 9:04 AM
>
>
> To: Robert Li (baoli); Irena Berezovsky; prashant.upadhy...@aricent.com;
> chris.frie...@windriver.com; He, Yongli; Itzik Brown
>
>
> Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery
> (kmestery); Sandhya Dasu (sadasu)
> Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
>
>
>
> Robert, is it possible to have a IRC meeting? I’d prefer to IRC meeting
> because it’s more openstack style and also can keep the minutes clearly.
>
>
>
> To your flow, can you give more detailed example. For example, I can
> consider user specify the instance with –nic option specify a network id,
> and then how nova device the requirement to the PCI device? I assume the
> network id should define the switches that the device can connect to , but
> how is that information translated to the PCI property requirement? Will
> this translation happen before the nova scheduler make host decision?
>
>
>
> Thanks
>
> --jyh
>
>
>
> From: Robert Li (baoli) [mailto:ba...@cisco.com]
> Sent: Monday, October 28, 2013 12:22 PM
> To: Irena Berezovsky; prashant.upadhy...@aricent.com; Jiang, Yunhong;
> chris.frie...@windriver.com; He, Yongli; Itzik Brown
> Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery
> (kmestery); Sandhya Dasu (sadasu)
> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
>
>
>
> Hi Irena,
>
>
>
> Thank you very much for your comments. See inline.
>
>
>
> --Robert
>
>
>
> On 10/27/13 3:48 AM, "Irena Berezovsky"  wrote:
>
>
>
> Hi Robert,
>
> Thank you very much for sharing the information regarding your efforts. Can
> you please share your idea of the end to end flow? How do you suggest  to
> bind Nova and Neutron?
>
>
>
> The end to end flow is actually encompassed in the blueprints in a nutshell.
> I will reiterate it in below. The binding between No

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Irena Berezovsky

Hi Jiang, Robert,
IRC meeting option works for me.
If I understand your question below, you are looking for a way to tie up 
between requested virtual network(s) and requested PCI device(s). The way we 
did it in our solution  is to map a provider:physical_network to an interface 
that represents the Physical Function. Every virtual network is bound to the 
provider:physical_network, so the PCI device should be allocated based on this 
mapping.  We can  map a PCI alias to the provider:physical_network.

Another topic to discuss is where the mapping between neutron port and PCI 
device should be managed. One way to solve it, is to propagate the allocated 
PCI device details to neutron on port creation.
In case  there is no qbg/qbh support, VF networking configuration should be 
applied locally on the Host.
The question is when and how to apply networking configuration on the PCI 
device?
We see the following options:

* it can be done on port creation.

* It can be done when nova VIF driver is called for vNIC plugging. This 
will require to  have all networking configuration available to the VIF driver 
or send request to the neutron server to obtain it.

* It can be done by  having a dedicated L2 neutron agent on each Host 
that scans for allocated PCI devices  and then retrieves networking 
configuration from the server and configures the device. The agent will be also 
responsible for managing update requests coming from the neutron server.


For macvtap vNIC type assignment, the networking configuration can be applied 
by a dedicated L2 neutron agent.

BR,
Irena

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Tuesday, October 29, 2013 9:04 AM

To: Robert Li (baoli); Irena Berezovsky; prashant.upadhy...@aricent.com; 
chris.frie...@windriver.com; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support

Robert, is it possible to have a IRC meeting? I'd prefer to IRC meeting because 
it's more openstack style and also can keep the minutes clearly.

To your flow, can you give more detailed example. For example, I can consider 
user specify the instance with -nic option specify a network id, and then how 
nova device the requirement to the PCI device? I assume the network id should 
define the switches that the device can connect to , but how is that 
information translated to the PCI property requirement? Will this translation 
happen before the nova scheduler make host decision?

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, October 28, 2013 12:22 PM
To: Irena Berezovsky; 
prashant.upadhy...@aricent.com<mailto:prashant.upadhy...@aricent.com>; Jiang, 
Yunhong; chris.frie...@windriver.com<mailto:chris.frie...@windriver.com>; He, 
Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Irena,

Thank you very much for your comments. See inline.

--Robert

On 10/27/13 3:48 AM, "Irena Berezovsky" 
mailto:ire...@mellanox.com>> wrote:

Hi Robert,
Thank you very much for sharing the information regarding your efforts. Can you 
please share your idea of the end to end flow? How do you suggest  to bind Nova 
and Neutron?

The end to end flow is actually encompassed in the blueprints in a nutshell. I 
will reiterate it in below. The binding between Nova and Neutron occurs with 
the neutron v2 API that nova invokes in order to provision the neutron 
services. The vif driver is responsible for plugging in an instance onto the 
networking setup that neutron has created on the host.

Normally, one will invoke "nova boot" api with the -nic options to specify the 
nic with which the instance will be connected to the network. It currently 
allows net-id, fixed ip and/or port-id to be specified for the option. However, 
it doesn't allow one to specify special networking requirements for the 
instance. Thanks to the nova pci-passthrough work, one can specify PCI 
passthrough device(s) in the nova flavor. But it doesn't provide means to tie 
up these PCI devices in the case of ethernet adpators with networking services. 
Therefore the idea is actually simple as indicated by the blueprint titles, to 
provide means to tie up SRIOV devices with neutron services. A work flow would 
roughly look like this for 'nova boot':

  -- Specifies networking requirements in the -nic option. Specifically for 
SRIOV, allow the following to be specified in addition to the existing required 
information:
   . PCI alias
   . direct pci-passthrough/macvtap
   . port profileid that is compliant with 802.1Qbh

The above information is optional. In the

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Robert Li (baoli)

erwise, we can continue the discussion with email.

Regards,
Irena

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Friday, October 25, 2013 11:16 PM
To: prashant.upadhy...@aricent.com<mailto:prashant.upadhy...@aricent.com>; 
Irena Berezovsky; yunhong.ji...@intel.com<mailto:yunhong.ji...@intel.com>; 
chris.frie...@windriver.com<mailto:chris.frie...@windriver.com>; 
yongli...@intel.com<mailto:yongli...@intel.com>
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Irena,

This is Robert Li from Cisco Systems. Recently, I was tasked to investigate 
such support for Cisco's systems that support VM-FEX, which is a SRIOV 
technology supporting 802-1Qbh. I was able to bring up nova instances with 
SRIOV interfaces, and establish networking in between the instances that 
employes the SRIOV interfaces. Certainly, this was accomplished with hacking 
and some manual intervention. Based on this experience and my study with the 
two existing nova pci-passthrough blueprints that have been implemented and 
committed into Havana 
(https://blueprints.launchpad.net/nova/+spec/pci-passthrough-base and
https://blueprints.launchpad.net/nova/+spec/pci-passthrough-libvirt),  I 
registered a couple of blueprints (one on Nova side, the other on the Neutron 
side):

https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov
https://blueprints.launchpad.net/neutron/+spec/pci-passthrough-sriov

in order to address SRIOV support in openstack.

Please take a look at them and see if they make sense, and let me know any 
comments and questions. We can also discuss this in the summit, I suppose.

I noticed that there is another thread on this topic, so copy those folks  from 
that thread as well.

thanks,
Robert

On 10/16/13 4:32 PM, "Irena Berezovsky" 
mailto:ire...@mellanox.com>> wrote:

Hi,
As one of the next steps for PCI pass-through I would like to discuss is the 
support for PCI pass-through vNIC.
While nova takes care of PCI pass-through device resources  management and VIF 
settings, neutron should manage their networking configuration.
I would like to register asummit proposal to discuss the support for PCI 
pass-through networking.
I am not sure what would be the right topic to discuss the PCI pass-through 
networking, since it involve both nova and neutron.
There is already a session registered by Yongli on nova topic to discuss the 
PCI pass-through next steps.
I think PCI pass-through networking is quite a big topic and it worth to have a 
separate discussion.
Is there any other people who are interested to discuss it and share their 
thoughts and experience?

Regards,
Irena

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong

Robert, is it possible to have a IRC meeting? I'd prefer to IRC meeting because 
it's more openstack style and also can keep the minutes clearly.

To your flow, can you give more detailed example. For example, I can consider 
user specify the instance with -nic option specify a network id, and then how 
nova device the requirement to the PCI device? I assume the network id should 
define the switches that the device can connect to , but how is that 
information translated to the PCI property requirement? Will this translation 
happen before the nova scheduler make host decision?

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, October 28, 2013 12:22 PM
To: Irena Berezovsky; prashant.upadhy...@aricent.com; Jiang, Yunhong; 
chris.frie...@windriver.com; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Irena,

Thank you very much for your comments. See inline.

--Robert

On 10/27/13 3:48 AM, "Irena Berezovsky" 
mailto:ire...@mellanox.com>> wrote:

Hi Robert,
Thank you very much for sharing the information regarding your efforts. Can you 
please share your idea of the end to end flow? How do you suggest  to bind Nova 
and Neutron?

The end to end flow is actually encompassed in the blueprints in a nutshell. I 
will reiterate it in below. The binding between Nova and Neutron occurs with 
the neutron v2 API that nova invokes in order to provision the neutron 
services. The vif driver is responsible for plugging in an instance onto the 
networking setup that neutron has created on the host.

Normally, one will invoke "nova boot" api with the -nic options to specify the 
nic with which the instance will be connected to the network. It currently 
allows net-id, fixed ip and/or port-id to be specified for the option. However, 
it doesn't allow one to specify special networking requirements for the 
instance. Thanks to the nova pci-passthrough work, one can specify PCI 
passthrough device(s) in the nova flavor. But it doesn't provide means to tie 
up these PCI devices in the case of ethernet adpators with networking services. 
Therefore the idea is actually simple as indicated by the blueprint titles, to 
provide means to tie up SRIOV devices with neutron services. A work flow would 
roughly look like this for 'nova boot':

  -- Specifies networking requirements in the -nic option. Specifically for 
SRIOV, allow the following to be specified in addition to the existing required 
information:
   . PCI alias
   . direct pci-passthrough/macvtap
   . port profileid that is compliant with 802.1Qbh

The above information is optional. In the absence of them, the existing 
behavior remains.

 -- if special networking requirements exist, Nova api creates PCI requests 
in the nova instance type for scheduling purpose

 -- Nova scheduler schedules the instance based on the requested flavor 
plus the PCI requests that are created for networking.

 -- Nova compute invokes neutron services with PCI passthrough information 
if any

 --  Neutron performs its normal operations based on the request, such as 
allocating a port, assigning ip addresses, etc. Specific to SRIOV, it should 
validate the information such as profileid, and stores them in its db. It's 
also possible to associate a port profileid with a neutron network so that port 
profileid becomes optional in the -nic option. Neutron returns  nova the port 
information, especially for PCI passthrough related information in the port 
binding object. Currently, the port binding object contains the following 
information:
  binding:vif_type
  binding:host_id
  binding:profile
  binding:capabilities

-- nova constructs the domain xml and plug in the instance by calling the 
vif driver. The vif driver can build up the interface xml based on the port 
binding information.




The blueprints you registered make sense. On Nova side, there is a need to bind 
between requested virtual network and PCI device/interface to be allocated as 
vNIC.
On the Neutron side, there is a need to  support networking configuration of 
the vNIC. Neutron should be able to identify the PCI device/macvtap interface 
in order to apply configuration. I think it makes sense to provide neutron 
integration via dedicated Modular Layer 2 Mechanism Driver to allow PCI 
pass-through vNIC support along with other networking technologies.

I haven't sorted through this yet. A neutron port could be associated with a 
PCI device or not, which is a common feature, IMHO. However, a ML2 driver may 
be needed specific to a particular SRIOV technology.


During the Havana Release, we introduced Mellanox Neutron plugin that enables 
networking via SRIOV pass-throug

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-28 Thread yongli he


On 2013?10?27? 15:48, Irena Berezovsky wrote:


Hi Robert,

Thank you very much for sharing the information regarding your 
efforts. Can you please share your idea of the end to end flow? How do 
you suggest  to bind Nova and Neutron?


The blueprints you registered make sense. On Nova side, there is a 
need to bind between requested virtual network and PCI 
device/interface to be allocated as vNIC.


On the Neutron side, there is a need to  support networking 
configuration of the vNIC. Neutron should be able to identify the PCI 
device/macvtap interface in order to apply configuration. I think it 
makes sense to provide neutron integration via dedicated Modular Layer 
2 Mechanism Driver to allow PCI pass-through vNIC support along with 
other networking technologies.


During the Havana Release, we introduced Mellanox Neutron plugin that 
enables networking via SRIOV pass-through devices or macvtap interfaces.



Hi, Irena & Robert

I'm very intresting on your work Mellanox Neutron plugin, which enable 
SRIOV devices or mactap interfaces. and could you provide more 
infomation about it: bp/patches/current work flow/what is expect from 
nova pci passthourgh.  and then, plus Robert's requements/discuss, i 
know the more detail about what's expected from nova pci, what pci next 
will to be.


in current stats i got:

a) fine classify of devices by auto discovery and request
1) enable white list specify the address
2) enable white list append group info like (IN/OUT/... anything)
3) enable pci request can apppend  more infomation into the extra info
i need input here, what is it? eventhough pci don't care the 
extra info, but clear is better.

i.e. Robet's
 . direct pci-passthrough/macvtap
  port profile

b) extra info awawness allocation ('feature pci' by Robert)
<https://launchpad.net/%7Ebaoli>

   1) had API and code level interface to access extra info
   2) Scheduler awawa ness about extra info/or device type so vNIC can 
be differentiated.
   3) boot/interface-attach APIs:  API interface for convertneutron NIC 
info to PCI request. :

from binding:capabilities binding:profile  to
   PCI alias(request)/
   direct pci-passthrough/macvtap  ( is it need store into pci 
device extra info?)

   port profile( is it need store into pci device extra info?)
   4) scheduler enhancement to meet NIC requements

Yongli He@intel

We want to integrate our solution with PCI pass-through Nova support. 
 I will be glad to share more details if you are interested.


The PCI pass-through networking support is planned to be discussed 
during the summit: http://summit.openstack.org/cfp/details/129. I 
think it's worth to drill down into more detailed proposal and present 
it during the summit, especially since it impacts both nova and 
neutron projects.


Would you be interested in collaboration on this effort? Would you be 
interested to exchange more emails or set an IRC/WebEx meeting during 
this week before the summit?


Regards,

Irena

*From:*Robert Li (baoli) [mailto:ba...@cisco.com]
*Sent:* Friday, October 25, 2013 11:16 PM
*To:* prashant.upadhy...@aricent.com; Irena Berezovsky; 
yunhong.ji...@intel.com; chris.frie...@windriver.com; yongli...@intel.com
*Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle 
Mestery (kmestery); Sandhya Dasu (sadasu)
*Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through 
network support


Hi Irena,

This is Robert Li from Cisco Systems. Recently, I was tasked to 
investigate such support for Cisco's systems that support VM-FEX, 
which is a SRIOV technology supporting 802-1Qbh. I was able to bring 
up nova instances with SRIOV interfaces, and establish networking in 
between the instances that employes the SRIOV interfaces. Certainly, 
this was accomplished with hacking and some manual intervention. Based 
on this experience and my study with the two existing nova 
pci-passthrough blueprints that have been implemented and committed 
into Havana 
(https://blueprints.launchpad.net/nova/+spec/pci-passthrough-base and
https://blueprints.launchpad.net/nova/+spec/pci-passthrough-libvirt), 
 I registered a couple of blueprints (one on Nova side, the other on 
the Neutron side):


https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov

https://blueprints.launchpad.net/neutron/+spec/pci-passthrough-sriov

in order to address SRIOV support in openstack.

Please take a look at them and see if they make sense, and let me know 
any comments and questions. We can also discuss this in the summit, I 
suppose.


I noticed that there is another thread on this topic, so copy those 
folks  from that thread as well.


thanks,

Robert

On 10/16/13 4:32 PM, "Irena Berezovsky" <mailto:ire...@mellanox.com>> wrote:


Hi,

As one of the next steps for PCI pass-through I would like to
discuss is the support for PC

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-28 Thread yongli he

ails.


The PCI pass-through networking support is planned to be discussed
during the summit: http://summit.openstack.org/cfp/details/129. I
think it’s worth to drill down into more detailed proposal and
present it during the summit, especially since it impacts both
nova and neutron projects.

I agree. Maybe we can steal some time in that discussion.

Would you be interested in collaboration on this effort? Would you
be interested to exchange more emails or set an IRC/WebEx meeting
during this week before the summit?


Sure. If folks want to discuss it before the summit, we can schedule a 
webex later this week. Or otherwise, we can continue the discussion 
with email.


Regards,

Irena

*From:*Robert Li (baoli) [mailto:ba...@cisco.com]
*Sent:* Friday, October 25, 2013 11:16 PM
*To:* prashant.upadhy...@aricent.com
<mailto:prashant.upadhy...@aricent.com>; Irena Berezovsky;
yunhong.ji...@intel.com <mailto:yunhong.ji...@intel.com>;
chris.frie...@windriver.com <mailto:chris.frie...@windriver.com>;
yongli...@intel.com <mailto:yongli...@intel.com>
*Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen);
    Kyle Mestery (kmestery); Sandhya Dasu (sadasu)
*Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through
network support

Hi Irena,

This is Robert Li from Cisco Systems. Recently, I was tasked to
investigate such support for Cisco's systems that support VM-FEX,
which is a SRIOV technology supporting 802-1Qbh. I was able to
bring up nova instances with SRIOV interfaces, and establish
networking in between the instances that employes the SRIOV
interfaces. Certainly, this was accomplished with hacking and some
manual intervention. Based on this experience and my study with
the two existing nova pci-passthrough blueprints that have been
implemented and committed into Havana
(https://blueprints.launchpad.net/nova/+spec/pci-passthrough-base and
https://blueprints.launchpad.net/nova/+spec/pci-passthrough-libvirt),
 I registered a couple of blueprints (one on Nova side, the other
on the Neutron side):

https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov

https://blueprints.launchpad.net/neutron/+spec/pci-passthrough-sriov

in order to address SRIOV support in openstack.

Please take a look at them and see if they make sense, and let me
know any comments and questions. We can also discuss this in the
summit, I suppose.

I noticed that there is another thread on this topic, so copy
those folks  from that thread as well.

thanks,

Robert

On 10/16/13 4:32 PM, "Irena Berezovsky" mailto:ire...@mellanox.com>> wrote:

Hi,

As one of the next steps for PCI pass-through I would like to
discuss is the support for PCI pass-through vNIC.

While nova takes care of PCI pass-through device resources
 management and VIF settings, neutron should manage their
networking configuration.

I would like to register asummit proposal to discuss the
support for PCI pass-through networking.

I am not sure what would be the right topic to discuss the PCI
pass-through networking, since it involve both nova and neutron.

There is already a session registered by Yongli on nova topic
to discuss the PCI pass-through next steps.

I think PCI pass-through networking is quite a big topic and
it worth to have a separate discussion.

Is there any other people who are interested to discuss it and
share their thoughts and experience?

Regards,

Irena



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-25 Thread Robert Li (baoli)

Hi Irena,

This is Robert Li from Cisco Systems. Recently, I was tasked to investigate 
such support for Cisco's systems that support VM-FEX, which is a SRIOV 
technology supporting 802-1Qbh. I was able to bring up nova instances with 
SRIOV interfaces, and establish networking in between the instances that 
employes the SRIOV interfaces. Certainly, this was accomplished with hacking 
and some manual intervention. Based on this experience and my study with the 
two existing nova pci-passthrough blueprints that have been implemented and 
committed into Havana 
(https://blueprints.launchpad.net/nova/+spec/pci-passthrough-base and
https://blueprints.launchpad.net/nova/+spec/pci-passthrough-libvirt),  I 
registered a couple of blueprints (one on Nova side, the other on the Neutron 
side):

https://blueprints.launchpad.net/nova/+spec/pci-passthrough-sriov
https://blueprints.launchpad.net/neutron/+spec/pci-passthrough-sriov

in order to address SRIOV support in openstack.

Please take a look at them and see if they make sense, and let me know any 
comments and questions. We can also discuss this in the summit, I suppose.

I noticed that there is another thread on this topic, so copy those folks  from 
that thread as well.

thanks,
Robert

On 10/16/13 4:32 PM, "Irena Berezovsky" 
mailto:ire...@mellanox.com>> wrote:

Hi,
As one of the next steps for PCI pass-through I would like to discuss is the 
support for PCI pass-through vNIC.
While nova takes care of PCI pass-through device resources  management and VIF 
settings, neutron should manage their networking configuration.
I would like to register a summit proposal to discuss the support for PCI 
pass-through networking.
I am not sure what would be the right topic to discuss the PCI pass-through 
networking, since it involve both nova and neutron.
There is already a session registered by Yongli on nova topic to discuss the 
PCI pass-through next steps.
I think PCI pass-through networking is quite a big topic and it worth to have a 
separate discussion.
Is there any other people who are interested to discuss it and share their 
thoughts and experience?

Regards,
Irena

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

89 matches

Mail list logo