Re: [openstack-dev] [Nova] pci stats format and functional tests

2015-03-05 Thread Jiang, Yunhong
Paul, you are right that the 'extra_info' should not be in the 
os-pci:pci_stats, since it's not part of 'pool-keys' anymore, but I'm not sure 
if both 'key1' and 'phys_function' will be part of the pci_stats.

Thanks
--jyh

From: Murray, Paul (HP Cloud) [mailto:pmur...@hp.com]
Sent: Thursday, March 5, 2015 11:39 AM
To: openstack-dev@lists.openstack.org
Subject: [openstack-dev] [Nova] pci stats format and functional tests

Hi All,

I know Yunhong Jiang and Daniel Berrange have been involved in the following, 
but I thought it worth sending to the list for visibility.

While writing code to convert the resource tracker to use the ComputeNode 
object realized that the api samples used in the functional tests are not the 
same as the format as the PciDevicePool object. For example: 
hypervisor-pci-detail-resp.json has something like this:

"os-pci:pci_stats": [
{
"count": 5,
"extra_info": {
"key1": "value1",
"phys_function": "[[\"0x\", \"0x04\", \"0x00\", \"0x1\"]]"
},
"keya": "valuea",
"product_id": "1520",
"vendor_id": "8086"
}
],

My understanding from interactions with yjiang5 in the past leads me to think 
that something like this is what is actually expected:

"os-pci:pci_stats": [
{
"count": 5,
"key1": "value1",
"phys_function": "[[\"0x\", \"0x04\", \"0x00\", \"0x1\"]]",
"keya": "valuea",
"product_id": "1520",
"vendor_id": "8086"
}
],

This is the way the PciDevicePool object expects the data structure to be and 
is also the way the libvirt virt driver creates pci device information (i.e. 
without the "extra_info" key). Other than that (which is actually pretty clear) 
I couldn't find anything to tell me definitively if my interpretation is 
correct and I don't want to change the functional tests without being sure they 
are wrong. So if anyone can give some guidance here I would appreciate it.

I separated this stuff out into a patch with a couple of other minor cleanups 
in preparation for the ComputeNode change, see: 
https://review.openstack.org/#/c/161843

Let me know if I am on the right track,

Cheers,
Paul


Paul Murray
Nova Technical Lead, HP Cloud
+44 117 316 2527

Hewlett-Packard Limited registered Office: Cain Road, Bracknell, Berks RG12 1HN 
Registered No: 690597 England. The contents of this message and any attachments 
to it are confidential and may be legally privileged. If you have received this 
message in error, you should delete it from your system immediately and advise 
the sender. To any recipient of this message within HP, unless otherwise stated 
you should consider this message and attachments as "HP CONFIDENTIAL".

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][libvirt] The None and 'none' for CONF.libvirt.cpu_mode

2015-03-05 Thread Jiang, Yunhong

> -Original Message-
> From: Daniel P. Berrange [mailto:berra...@redhat.com]
> Sent: Wednesday, March 4, 2015 9:56 AM
> To: Jiang, Yunhong
> Cc: openstack-dev@lists.openstack.org; Xu, Hejie
> Subject: Re: [nova][libvirt] The None and 'none' for CONF.libvirt.cpu_mode
> 
> On Wed, Mar 04, 2015 at 05:24:53PM +, Jiang, Yunhong wrote:
> > Daniel, thanks for your clarification.
> >
> > Another related question is, what will be the guest's real cpu model
> > is the cpu_model is None? This is about a reported regression at
> 
> The guest CPU will be unspecified - it will be some arbitrary
> hypervisor decided default which nova cannot know.
> 
> > https://bugs.launchpad.net/nova/+bug/1082414 . When the
> > instance.vcpu_model.mode is None, we should compare the source/target
> > cpu model, as the suggestion from Tony, am I right?
> 
> If CPU model is none, best we can do is compare the *host* CPU of
> the two hosts to make sure the host doesn't loose any features, as
> we have no way of knowing what features the guest is relying on.

Thanks for clarification. I will cook a patch for this issue.

--jyh

> 
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][libvirt] The None and 'none' for CONF.libvirt.cpu_mode

2015-03-04 Thread Jiang, Yunhong
Daniel, thanks for your clarification.

Another related question is, what will be the guest's real cpu model is the 
cpu_model is None? This is about a reported regression at 
https://bugs.launchpad.net/nova/+bug/1082414 . When the 
instance.vcpu_model.mode is None, we should compare the source/target cpu 
model, as the suggestion from Tony, am I right?

Thanks
--jyh

> -Original Message-
> From: Daniel P. Berrange [mailto:berra...@redhat.com]
> Sent: Wednesday, March 4, 2015 6:56 AM
> To: Jiang, Yunhong
> Cc: openstack-dev@lists.openstack.org; Xu, Hejie
> Subject: Re: [nova][libvirt] The None and 'none' for CONF.libvirt.cpu_mode
> 
> On Wed, Mar 04, 2015 at 02:52:06PM +, Jiang, Yunhong wrote:
> > Hi, Daniel
> > I'm a bit confused of the None/'none' for CONF.libvirt.cpu_mode.
> Per my understanding, None means there is no configuration provided and
> libvirt will select the default value based on the virt_type, none means no
> cpu_mode information should be provided. For the guest, am I right?
> >
> >   In _get_guest_cpu_model_config() on virt/libvirt/driver.py,
> > if mode is 'none', kvm/qemu virt_type will return a
> > vconfig.LibvirtConfigGuestCPU() while other virt type will return None.
> > What's the difference of this return difference?
> 
> The LibvirtConfigGuestCPU  object is used for more than just configuring
> the CPU model. It is also used for expressing CPU topology (sockets, cores,
> threads) and NUMA topology. So even if cpu model is None, we still need
> that object in the kvm case.
> 
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][libvirt] The None and 'none' for CONF.libvirt.cpu_mode

2015-03-04 Thread Jiang, Yunhong
Hi, Daniel
I'm a bit confused of the None/'none' for CONF.libvirt.cpu_mode. Per my 
understanding, None means there is no configuration provided and libvirt will 
select the default value based on the virt_type, none means no cpu_mode 
information should be provided. For the guest, am I right?

  In _get_guest_cpu_model_config() on virt/libvirt/driver.py, if 
mode is 'none', kvm/qemu virt_type will return a 
vconfig.LibvirtConfigGuestCPU() while other virt type will return None. What's 
the difference of this return difference?

Thanks
--jyh

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] The libvirt.cpu_mode and libvirt.cpu_model

2015-01-29 Thread Jiang, Yunhong

> -Original Message-
> From: Daniel P. Berrange [mailto:berra...@redhat.com]
> Sent: Thursday, January 29, 2015 2:34 AM
> To: Jiang, Yunhong
> Cc: openstack-dev@lists.openstack.org
> Subject: Re: [nova] The libvirt.cpu_mode and libvirt.cpu_model
> 
> On Wed, Jan 28, 2015 at 10:10:29PM +, Jiang, Yunhong wrote:
> > Hi, Daniel
> > I recently tried the libvirt.cpu_mode and libvirt.cpu_model
> > when I was working on cpu_info related code and found bug
> > https://bugs.launchpad.net/nova/+bug/1412994 .  The reason is because
> > with these two flags, all guests launched on the host will use them,
> > while when host report back the compute capability, they report the
> > real-hardware compute capability, instead of the compute capabilities
> > masked by these two configs.
> >
> > I think the key thing is, these two flags are per-instance properties
> > instead of per-host properties.
> 
> No, these are intended to be per host properties. The idea is that all
> hosts should be configured with a consistent CPU model so you can live
> migrate between all hosts without hitting compatibility probems. There
> is however currently a bug in the live migration CPU compat checking
> but I have a fix for that in progress.

Although configuring all hosts with consistent CPU model means nova live 
migration issue, does it also means the cloud can only present features based 
on oldest machine in the cloud, and latest CPU features like SSE4.1 etc can't 
be utilized? 

Also, if we want to expose host feature to guest (please check 
https://bugs.launchpad.net/nova/+bug/1412930 for a related issue), we have to 
use per-instance cpu_model configuration, which will anyway break the 
all-host-live-migration.

For your live migration fix, is it https://review.openstack.org/#/c/53746/ ? On 
your patch, the check_can_live_migrate_destination() will use the guest CPU 
info to compare with the target host CPU info, instead of compare the 
source/target host cpu_model. 

Thanks
--jyh

> 
> > How about remove these two config items? And I don't think we
> > should present cpu_mode/model option to end user, instead, we should
> > only expose the feature request like disable/force some cpu_features,
> > and the libvirt driver select the cpu_mode/model based on user's
> > feature requirement.
> 
> I don't see any reason to remove these config items
> 
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] The libvirt.cpu_mode and libvirt.cpu_model

2015-01-28 Thread Jiang, Yunhong
Hi, Daniel
I recently tried the libvirt.cpu_mode and libvirt.cpu_model when I was 
working on cpu_info related code and found bug 
https://bugs.launchpad.net/nova/+bug/1412994 .  The reason is because with 
these two flags, all guests launched on the host will use them, while when host 
report back the compute capability, they report the real-hardware compute 
capability, instead of the compute capabilities masked by these two configs.

I think the key thing is, these two flags are per-instance properties 
instead of per-host properties. 

If we do want to keep it as per-host property, libvirt driver should 
return the capabilities that has been altered by these two items, but the 
problem is, I checked libvirt doc and seems we can't get the cpu_info for the 
custom cpu_mode.

How about remove these two config items? And I don't think we should 
present cpu_mode/model option to end user, instead, we should only expose the 
feature request like disable/force some cpu_features, and the libvirt driver 
select the cpu_mode/model based on user's feature requirement.

Your opinion?

Thanks
--jyh

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] The constraints from flavor and image metadata

2015-01-27 Thread Jiang, Yunhong
Hi, Travis
Thanks for your reply. I think I'm more talking about the resource 
constraints instead of behavior constraints. My example like  serial_port_count,
memory_pagesize, hw_numa_nodes  etc are all requires resources.  Sorry for the 
confusion. If image property requires resource that's not specified in flavor, 
what should be the result.

I'm not sure if there are blanket rule, but maybe there are some 
generic consideration factor?

Thanks
--jyh

> -Original Message-
> From: Tripp, Travis S [mailto:travis.tr...@hp.com]
> Sent: Wednesday, January 21, 2015 6:00 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] The constraints from flavor and image
> metadata
> 
> JYH,
> 
> Are you asking for this to be a blanket rule?  It seems to me that this
> could be a case by case basis, but I question making it a blanket rule.
> 
> For example, the os_shutdown_timeout property [1] seems very workload
> specific. In your proposal, this would mean that the operator would have
> to add that property with a max value to every single flavor in order for
> it to be taken advantage of, right? Is that really the desired behavior?
> 
> Or what about the watchdog behavior (hw_watchdog_action)?  It supports
> an
> enum of possibilities:
> 
> "disabled","reset","poweroff",
>"pause","none²
> 
> If the flavor provides a default value what would it even mean for an
> image to specify something different?
> 
> [1](https://review.openstack.org/#/c/89650/12)
> 
> 
> George,
> 
> Regarding constraints, you should take a look at this:
> http://docs.openstack.org/developer/glance/metadefs-concepts.html
> 
> Almost all of the available nova properties with constraint enforcement
> can be viewed by getting a current devstack, going into horizon, and
> launching the Update Metadata action on flavors / Host Aggregates, or
> Images.
> 
> -Travis
> 
> 
> On 1/17/15, 7:41 AM, "George Shuklin"  wrote:
> 
> >When I played with metadata, I had have constant feeling it had mess
> >together few things:
> >
> >1. H/W requirements for images.
> >2. Accounting requirements (good CPU for good price, HDD for cheap)
> >3. Licensing restrictions (run this one only on the hosts with licenses)
> >4. Administrative management (like 'flavors of tenant X should be run
> >only on hosts Y')
> >5. OS information (like inherited metadata on images)
> >
> >All that together is called 'metadata'. Some metadata have special
> >meaning in one context (like 'availability_zone' for hosts, or CPU
> >limitation), some is used by administrator in other context.
> >
> >All together it looks like pre-datastructure code (if someone remembers
> >that). No data types, no type restrictions, you can assign letter to
> >instruction address and pointer to string to float.
> >
> >Same with current metadata in nova/glance. Raw namespace of key-value
> >items without any meaningful restriction and specific expression. It
> >gives flexibility, but cause a huge strain on operators.
> >
> >I think it needs more expressive representation.
> >
> >On 01/13/2015 11:39 PM, Jiang, Yunhong wrote:
> >> Hi,
> >>There are some discussion and disagreement on the requirement
> from
> >>flavor and image metadata at nova spec
> >>https://review.openstack.org/#/c/138937/ and I want to get more input
> >>from the community.
> >>When launch a VM, some requirements may come from image
> metadata and
> >>flavor. There are a lot of such cases like serial_port_count,
> >>memory_pagesize, hw_numa_nodes, hw:cpu_max_sockets etc. Most of
> them are
> >>done in nova/virt/hardware.py.
> >>
> >>Both the nova-spec and the current implementation seems agree
> that if
> >>flavor has the requirement, the image's metadata should not require
> more
> >>than the flavor requirement.
> >>
> >>However, the disagreement comes when no requirement from
> flavor, i.e.
> >>only image has the resource requirement. For example, for
> >>serial_port_count, "If flavor extra specs is not set, then any image
> >>meta value is permitted". For hw_mem_page_size, it's forbidden if only
> >>image request and no flavor request
> >>(https://github.com/openstack/nova/blob/master/nova/virt/hardware.p
> y#L873
> >> ), and hw_numa_nodes will fail if both flavor and image metadata are
>

Re: [openstack-dev] [nova] The constraints from flavor and image metadata

2015-01-27 Thread Jiang, Yunhong
Sorry for slow response, this mail is lost in the mail flood.

For the pre-datastructure code, I think there have been some discussion on it, 
like https://wiki.openstack.org/wiki/VirtDriverImageProperties or 
https://bugs.launchpad.net/nova/+bug/1275875 . But I do agree that we should 
enhance it to be more formal, if we do treat it as something like API.

Thanks
--jyh

> -Original Message-
> From: George Shuklin [mailto:george.shuk...@gmail.com]
> Sent: Saturday, January 17, 2015 6:41 AM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] The constraints from flavor and image
> metadata
> 
> When I played with metadata, I had have constant feeling it had mess
> together few things:
> 
> 1. H/W requirements for images.
> 2. Accounting requirements (good CPU for good price, HDD for cheap)
> 3. Licensing restrictions (run this one only on the hosts with licenses)
> 4. Administrative management (like 'flavors of tenant X should be run
> only on hosts Y')
> 5. OS information (like inherited metadata on images)
> 
> All that together is called 'metadata'. Some metadata have special
> meaning in one context (like 'availability_zone' for hosts, or CPU
> limitation), some is used by administrator in other context.
> 
> All together it looks like pre-datastructure code (if someone remembers
> that). No data types, no type restrictions, you can assign letter to
> instruction address and pointer to string to float.
> 
> Same with current metadata in nova/glance. Raw namespace of key-value
> items without any meaningful restriction and specific expression. It
> gives flexibility, but cause a huge strain on operators.
> 
> I think it needs more expressive representation.
> 
> On 01/13/2015 11:39 PM, Jiang, Yunhong wrote:
> > Hi,
> > There are some discussion and disagreement on the requirement
> from flavor and image metadata at nova spec
> https://review.openstack.org/#/c/138937/ and I want to get more input
> from the community.
> > When launch a VM, some requirements may come from image
> metadata and flavor. There are a lot of such cases like serial_port_count,
> memory_pagesize, hw_numa_nodes, hw:cpu_max_sockets etc. Most of
> them are done in nova/virt/hardware.py.
> >
> > Both the nova-spec and the current implementation seems agree
> that if flavor has the requirement, the image's metadata should not require
> more than the flavor requirement.
> >
> > However, the disagreement comes when no requirement from
> flavor, i.e. only image has the resource requirement. For example, for
> serial_port_count, "If flavor extra specs is not set, then any image meta
> value is permitted". For hw_mem_page_size, it's forbidden if only image
> request and no flavor request
> (https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L
> 873 ), and hw_numa_nodes will fail if both flavor and image metadata are
> specified
> (https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L
> 852 ).
> >
> > As to this nova spec at https://review.openstack.org/#/c/138937/ ,
> someone (Don, Malini) think if image requires some feature/resource that is
> not specified in flavor, it should be ok, while I think it should be 
> forbidden.
> >
> > I discussed with Jay Pipe on IRC before and he thought if flavor has
> no requirement, image requirement should be failed, and I created a bug at
> https://bugs.launchpad.net/nova/+bug/1403276 at that time.  But according
> to the discussion on this BP, seems this is not always accepted by others.
> >
> > I hope to get feedback from the mailing list on the relationship of
> requirement from image/flavor. Possibly we should take different policy for
> different resource requirement, but some general rule and the reason for
> those rules will be helpful.
> >
> > BTW, This topic was sent to the operator ML yesterday by Malini at
> >   This http://lists.openstack.org/pipermail/openstack-operators/2015-
> January/005882.html and I raise it here to cover both lists.
> >
> > Thanks
> > --jyh
> >
> >
> __
> 
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-
> requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> __
> 
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] The constraints from flavor and image metadata

2015-01-14 Thread Jiang, Yunhong
Resend because I forgot the [nova] in subject.

Hi, 
There are some discussion and disagreement on the requirement from 
flavor and image metadata at nova spec https://review.openstack.org/#/c/138937/ 
and I want to get more input from the community. 
When launch a VM, some requirements may come from image metadata and 
flavor. There are a lot of such cases like serial_port_count, memory_pagesize, 
hw_numa_nodes, hw:cpu_max_sockets etc. Most of them are done in 
nova/virt/hardware.py.

Both the nova-spec and the current implementation seems agree that if 
flavor has the requirement, the image's metadata should not require more than 
the flavor requirement.

However, the disagreement comes when no requirement from flavor, i.e. 
only image has the resource requirement. For example, for serial_port_count, 
"If flavor extra specs is not set, then any image meta value is permitted". For 
hw_mem_page_size, it's forbidden if only image request and no flavor request 
(https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L873 ), 
and hw_numa_nodes will fail if both flavor and image metadata are specified 
(https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L852 ). 

As to this nova spec at https://review.openstack.org/#/c/138937/ , 
someone (Don, Malini) think if image requires some feature/resource that is not 
specified in flavor, it should be ok, while I think it should be forbidden.

I discussed with Jay Pipe on IRC before and he thought if flavor has no 
requirement, image requirement should be failed, and I created a bug at 
https://bugs.launchpad.net/nova/+bug/1403276 at that time.  But according to 
the discussion on this BP, seems this is not always accepted by others.

I hope to get feedback from the mailing list on the relationship of 
requirement from image/flavor. Possibly we should take different policy for 
different resource requirement, but some general rule and the reason for those 
rules will be helpful.

Thanks
--jyh

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] The constraints from flavor and image metadata

2015-01-13 Thread Jiang, Yunhong
Hi, 
There are some discussion and disagreement on the requirement from 
flavor and image metadata at nova spec https://review.openstack.org/#/c/138937/ 
and I want to get more input from the community. 
When launch a VM, some requirements may come from image metadata and 
flavor. There are a lot of such cases like serial_port_count, memory_pagesize, 
hw_numa_nodes, hw:cpu_max_sockets etc. Most of them are done in 
nova/virt/hardware.py.

Both the nova-spec and the current implementation seems agree that if 
flavor has the requirement, the image's metadata should not require more than 
the flavor requirement.

However, the disagreement comes when no requirement from flavor, i.e. 
only image has the resource requirement. For example, for serial_port_count, 
"If flavor extra specs is not set, then any image meta value is permitted". For 
hw_mem_page_size, it's forbidden if only image request and no flavor request 
(https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L873 ), 
and hw_numa_nodes will fail if both flavor and image metadata are specified 
(https://github.com/openstack/nova/blob/master/nova/virt/hardware.py#L852 ). 

As to this nova spec at https://review.openstack.org/#/c/138937/ , 
someone (Don, Malini) think if image requires some feature/resource that is not 
specified in flavor, it should be ok, while I think it should be forbidden.

I discussed with Jay Pipe on IRC before and he thought if flavor has no 
requirement, image requirement should be failed, and I created a bug at 
https://bugs.launchpad.net/nova/+bug/1403276 at that time.  But according to 
the discussion on this BP, seems this is not always accepted by others.

I hope to get feedback from the mailing list on the relationship of 
requirement from image/flavor. Possibly we should take different policy for 
different resource requirement, but some general rule and the reason for those 
rules will be helpful.

BTW, This topic was sent to the operator ML yesterday by Malini at 
 This 
http://lists.openstack.org/pipermail/openstack-operators/2015-January/005882.html
 and I raise it here to cover both lists.

Thanks
--jyh

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Request Spec Freeze Exception For More Image Properties Support

2015-01-09 Thread Jiang, Yunhong
Hello Nova Community,
Please grant a freeze exception for the nova spec "more image 
properties support" at https://review.openstack.org/#/c/138937/ . 

The potential changes in nova are limited, affecting only to the 
corresponding scheduler filters. Its purpose is to ensure and enforce image 
provider hints/recommendations, something the image provider knows best, to 
achieve optimal performance and/or meet compliance or other constraints. 

Thanks

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Propose to define the compute capability clearly

2014-10-21 Thread Jiang, Yunhong
Hi, Daniel's & all,
This is a follow up to Daniel's 
http://osdir.com/ml/openstack-dev/2014-10/msg00557.html , "Info on XenAPI data 
format for 'host_data' call". 
I'm considering to change the compute capability to be a nova object, 
with well defined field, the reasons are: a) currently the compute capability 
is a dict returned from hypervisor, however, different hypervisor may have 
different return value; b) currently the compute capability filter make 
decision simply match the flavor extra_specs with this not-well-defined dict, 
this is not good IMHO. 
Just want to get some feedback from the mailing list before I try to 
create a BP and spec for it.

Thanks
--jyh

> -Original Message-
> From: Daniel P. Berrange [mailto:berra...@redhat.com]
> Sent: Wednesday, October 8, 2014 8:56 AM
> To: Bob Ball
> Cc: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] Info on XenAPI data format for 'host_data' call
> 
> On Wed, Oct 08, 2014 at 03:53:25PM +, Bob Ball wrote:
> > Hi Daniel,
> >
> > The following is an example return value from one of my hosts
> >
> > {"host_name-description": "Default install of XenServer",
> "host_hostname": "ciceronicus", "host_memory": {"total": 17169604608,
> "overhead": 266592256, "free": 16132087808, "free-computed":
> 16111337472}, "enabled": "true", "host_capabilities": ["xen-3.0-x86_64",
> "xen-3.0-x86_32p", "hvm-3.0-x86_32", "hvm-3.0-x86_32p", "hvm-3.0-
> x86_64"], "host_other-config": {"agent_start_time": "1412774967.",
> "iscsi_iqn": "iqn.2014-10.com.xensource.hq.eng:587b598c", "boot_time":
> "1412774885."}, "host_ip_address": "10.219.10.24", "host_cpu_info":
> {"physical_features": "0098e3fd-bfebfbff-0001-28100800",
> "modelname": "Intel(R) Xeon(R) CPU   X3430  @ 2.40GHz", "vendor":
> "GenuineIntel", "features": "0098e3fd-bfebfbff-0001-28100800",
> "family": 6, "maskable": "full", "cpu_count": 4, "socket_count": "1", "flags":
> "fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr
> sse sse2 ss ht nx constant_tsc nonstop_tsc aperfmperf pni vmx est ssse3
> sse4_1 sse4_2 popcnt hypervisor ida tpr_shadow vnmi flexpriority ept vpid",
> "stepping": 5, "model": 30, "features_after_reboot": "0098e3fd-bfebfbff-
> 0001-28100800", "speed": "2394.086"}, "host_uuid": "ec54eebe-b14b-
> 4b0a-aa89-d2c468771cd3", "host_name-label": "ciceronicus"}
> >
> > Is that enough for what you're looking at?  If there is anything
> > I can help with let me know on IRC.
> 
> Yes, that is perfect, thank you.
> 
> Regards,
> Daniel
> --
> |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org  -o- http://virt-manager.org :|
> |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] [feature freeze exception] Feature freeze exception for config-drive-image-property

2014-09-04 Thread Jiang, Yunhong
Hi,
I'd like to ask for a feature freeze exception for the 
config-drive-image-property.

The spec has been approved, and the corresponding patch 
(https://review.openstack.org/#/c/77027/ ) has been +W three time, but failed 
to be merged in the end because of gate issue and conflict on 
nova/exception.py. 

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

2014-08-13 Thread Jiang, Yunhong


> -Original Message-
> From: Nikola Đipanov [mailto:ndipa...@redhat.com]
> Sent: Tuesday, August 12, 2014 3:22 AM
> To: OpenStack Development Mailing List
> Subject: [openstack-dev] [Nova] Concerns around the Extensible Resource
> Tracker design - revert maybe?
> 
> Hey Nova-istas,
> 
> While I was hacking on [1] I was considering how to approach the fact
> that we now need to track one more thing (NUMA node utilization) in our
> resources. I went with - "I'll add it to compute nodes table" thinking
> it's a fundamental enough property of a compute host that it deserves to
> be there, although I was considering  Extensible Resource Tracker at one
> point (ERT from now on - see [2]) but looking at the code - it did not
> seem to provide anything I desperately needed, so I went with keeping it
> simple.
> 
> So fast-forward a few days, and I caught myself solving a problem that I
> kept thinking ERT should have solved - but apparently hasn't, and I
> think it is fundamentally a broken design without it - so I'd really
> like to see it re-visited.
> 
> The problem can be described by the following lemma (if you take 'lemma'
> to mean 'a sentence I came up with just now' :)):
> 
> """
> Due to the way scheduling works in Nova (roughly: pick a host based on
> stale(ish) data, rely on claims to trigger a re-schedule), _same exact_
> information that scheduling service used when making a placement
> decision, needs to be available to the compute service when testing the
> placement.
> """
> 
> This is not the case right now, and the ERT does not propose any way to
> solve it - (see how I hacked around needing to be able to get
> extra_specs when making claims in [3], without hammering the DB). The
> result will be that any resource that we add and needs user supplied
> info for scheduling an instance against it, will need a buggy
> re-implementation of gathering all the bits from the request that
> scheduler sees, to be able to work properly.
> 
> This is obviously a bigger concern when we want to allow users to pass
> data (through image or flavor) that can affect scheduling, but still a
> huge concern IMHO.

I'd think this is not ERT itself, but more a RT issue. And the issue happens to
PCI also, which has to save the PCI request in the system metadata (No ERT at 
that time).
 It will be great to have a more generic solution.

Thanks
-jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][pci] A couple of questions

2014-06-10 Thread Jiang, Yunhong
Hi, Robert
 For your first question, I suspect it's something wrong and should be 
'devi_id', which is the hypervisor's identification for the device. I will 
leave Yongli to have more comments on it.
 For the second one, thanks for point the issue out. Yes, I'm working 
on fixing it.

--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Tuesday, June 10, 2014 1:46 PM
To: Jiang, Yunhong; He, Yongli
Cc: OpenStack Development Mailing List (not for usage questions)
Subject: [openstack-dev][nova][pci] A couple of questions

Hi Yunhong & Yongli,

In the routine _prepare_pci_devices_for_use(), it's referring to 
dev['hypervisor_name']. I didn't see code that's setting it up, or the libvirt 
nodedev xml includes hypervisor_name. Is this specific to Xen?

Another question is about the issue that was raised in this review: 
https://review.openstack.org/#/c/82206/. It's about the use of node id or host 
name in the PCI device table. I'd like to know you guys' thoughts on that.

thanks,
Robert
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [NFV][QA] Mission statement prosal

2014-05-19 Thread Jiang, Yunhong
Hi, Nick,
For “have a test case which can be verified using a an OpenSource 
implementation ….. ensure that tests can be done without any special hardware 
or proprietary software”, I totally agree the requirement for without 
proprietary software, however, I’m not sure about  your exact meaing of 
“special hardware”.

I had a quick chat with Daniel in the summit on this also. Several NFV tasks, 
like large page, guest NUMA, SR-IOV, require hardware support. Those features 
are widely supported in volume servers already for a long time, but can’t be 
achieved, or can’t be achieved well,  in VM yet, thus can’t be verified in 
current  gate. IMHO, even VM can support/emulate such feature, it’s not so good 
to use VM to verify them.

How about have a standard 3rd party CI test for hardware based feature testing 
and make it an extensible framework? I think there are requirement at least 
from both ironic and NFV?

Our team have 3rd party CI test  for PCI pass-through and OAT trusted 
computing, which can’t be achieved through CI now.  These tests are based on 
real hardware environment instead of VM. We didn’t publish result yet because 
of some IT logistic support.

Thanks
--jyh

From: Nicolas Barcet [mailto:nico...@barcet.com]
Sent: Monday, May 19, 2014 10:19 AM
To: openstack-dev
Subject: [openstack-dev] [NFV] Mission statement prosal

Hello,

As promised during the second BoF session (thanks a lot to Chris Wright for 
leading this), here is a first try at defining the purpose of our special 
interest group.

---
Mission statement for the OpenStack NFV Special Interest Group:

The SIG aims to define and prioritize the use cases which are required to run 
Network Function Virtualization (NFV) instances on top of OpenStack. The 
requirements are to be passed on to various projects within OpenStack to 
promote their implementation.

The requirements expressed by this group should be made so that each of them 
have a test case which can be verified using a an OpenSource implementation. 
This is to ensure that tests can be done without any special hardware or 
proprietary software, which is key for continuous integration tests in the 
OpenStack gate.
---

Comments, suggestions and fixes are obviously welcome!

Best,
Nick

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Heat] Custom Nova Flavor creation through Heat (pt.2)

2014-05-09 Thread Jiang, Yunhong
> 
> This is why there is a distinction between properties set on images
> vs properties set on flavours. Image properties, which a normal user
> can set, are restricted to aspects of the VM which don't involve
> consumption of compute host resources. Flavour properties, which
> only a user with 'flavourmanage' permission can change, control
> aspects of the VM  config which consume finite compute resources.

I think the VM property should give requirement of resource, including CPU 
features like AES-NI, HVM/PV vCPU type, PCI device type because the image may 
have some special requirement for the resource, or the minimal of RAM size 
required etc. But IMHO it's not easy to say that " don't involve
consumption of compute host resources". After all, it defines the type of 
resources. 
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Heat] Custom Nova Flavor creation through Heat (pt.2)

2014-05-06 Thread Jiang, Yunhong


> -Original Message-
> From: Solly Ross [mailto:sr...@redhat.com]
> Sent: Tuesday, May 06, 2014 10:16 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Nova] [Heat] Custom Nova Flavor creation
> through Heat (pt.2)
> 
> For your first question, I'll probably create a BP sometime today.
> 
> For your second question, allowing tenants to create flavors
> prevents one of the main parts of the flavor idea from working --
> having flavors that nicely fit together to prevent "wasted" host
> resources.  For instance suppose the normal system flavors used
> memory in powers of 2GB (2, 4, 8, 16, 32).  Now suppose someone
> came in, created a private flavor that used 3GB of RAM.  We now
> have 1GB of RAM that can never be used, unless someone decides
> to come along and create a 1GB flavor (actually, since RAM has
> even more granularity than that, you could have someone specify
> that they wanted 1.34GB of RAM, for instance, and then you have
> all sorts of weird stuff going on).

Hi, Solly, I don't think the 3G is really that important since it has no 
alignment requirement and will not cause fragmentation, at least not that much 
to against the previous suggestion. After all, 3G + 3G + 2G can sit in a 8G 
host quite well, maybe a multiplier of 64M is fair enough and should work in 
large scale system.. Of course, 1.34G is strange for an engineer :)

--jyh

> 
> Best Regards,
> Solly Ross
> 
> - Original Message -
> From: "Dimitri Mazmanov" 
> To: "OpenStack Development Mailing List (not for usage questions)"
> 
> Sent: Monday, May 5, 2014 3:40:08 PM
> Subject: Re: [openstack-dev] [Nova] [Heat] Custom Nova Flavor creation
> through Heat (pt.2)
> 
> This is good! Is there a blueprint describing this idea? Or any plans
> describing it in a blueprint?
> Would happily share the work.
> 
> Should we mix it with flavors in horizon though? I¹m thinking of having a
> separate ³Resources² page,
> wherein the user can ³define² resources. I¹m not a UX expert though.
> 
> But let me come back to the project-scoped flavor creation issues.
> Why do you think it¹s such a bad idea to let tenants create flavors for
> their project specific needs?
> 
> I¹ll refer again to the Steve Hardy¹s proposal:
> - Normal user : Can create a private flavor in a tenant where they
>   have the Member role (invisible to any other users)
> - Tenant Admin user : Can create public flavors in the tenants where they
>   have the admin role (visible to all users in the tenant)
> - Domain admin user : Can create public flavors in the domains where they
>   have the admin role (visible to all users in all tenants in that domain)
> 
> 
> > If you actually have 64 flavors, though, and it's overwhelming
> > your users, ...
> 
> The users won¹t see all 64 flavor, only those they have defined and public.
> 
> -
> 
> Dimitri
> 
> On 05/05/14 20:18, "Chris Friesen" 
> wrote:
> 
> >On 05/05/2014 11:40 AM, Solly Ross wrote:
> >> One thing that I was discussing with @jaypipes and @dansmith over
> >> on IRC was the possibility of breaking flavors down into separate
> >> components -- i.e have a disk flavor, a CPU flavor, and a RAM flavor.
> >> This way, you still get the control of the size of your building blocks
> >> (e.g. you could restrict RAM to only 2GB, 4GB, or 16GB), but you avoid
> >> exponential flavor explosion by separating out the axes.
> >
> >I like this idea because it allows for greater flexibility, but I think
> >we'd need to think carefully about how to expose it via horizon--maybe
> >separate tabs within the overall "flavors" page?
> >
> >As a simplifying view you could keep the existing flavors which group
> >all of them, while still allowing instances to specify each one
> >separately if desired.
> >
> >Chris
> >
> >___
> >OpenStack-dev mailing list
> >OpenStack-dev@lists.openstack.org
> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] about pci device filter

2014-05-06 Thread Jiang, Yunhong
Hi, Ricky, can you please provide any specific requirement on your mind?

--jyh

> -Original Message-
> From: Bohai (ricky) [mailto:bo...@huawei.com]
> Sent: Monday, May 05, 2014 11:45 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] about pci device filter
> 
> Hi jiang,
> 
> Maybe what I said in last mail has mistaken you.
> 
> It's not to provide a instance scheduler filter such as
> "PciPassthroughFilter" to filter which host
> to boot the instance.
> 
> I hope to add our special filter like "nova/pci/PciHostDevicesWhiteList.py".
> So we hope to add a mechanism to specify which filter to use for getting
> the
> pci devices from host.
> 
> Best regards to you.
> Ricky
> 
> > -Original Message-
> > From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> > Sent: Tuesday, May 06, 2014 1:54 AM
> > To: OpenStack Development Mailing List (not for usage questions)
> > Subject: Re: [openstack-dev] [nova] about pci device filter
> >
> > Hi, Bohai, are you talking about the scheduler filter for PCI, right?
> >
> > I think the scheduler filters can be changed by nova options already, so I
> don't
> > think we need another mechanism and just create another filter to
> replace the
> > default pci filter?
> >
> > --jyh
> >
> > > -Original Message-
> > > From: Bohai (ricky) [mailto:bo...@huawei.com]
> > > Sent: Monday, May 05, 2014 1:32 AM
> > > To: OpenStack-dev@lists.openstack.org
> > > Subject: [openstack-dev] [nova] about pci device filter
> > >
> > > Hi, stackers:
> > >
> > > Now there is an default while list filter for PCI device.
> > > But maybe it's not enough in some scenario.
> > >
> > > Maybe it's better if we provide a mechanism to specify a customize
> filter.
> > >
> > > For example:
> > > So user can make a special filter , then specify which filter to use
> > > in configure files.
> > >
> > > Any advices?
> > >
> > > Best regards to you.
> > > Ricky
> > >
> > >
> > > ___
> > > OpenStack-dev mailing list
> > > OpenStack-dev@lists.openstack.org
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Question about addit log in nova-compute.log

2014-05-06 Thread Jiang, Yunhong


> -Original Message-
> From: Jay Pipes [mailto:jaypi...@gmail.com]
> Sent: Tuesday, May 06, 2014 10:44 AM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [nova] Question about addit log in
> nova-compute.log
> 
> On 05/06/2014 01:37 PM, Jiang, Yunhong wrote:
> >> -Original Message-
> >> From: Jay Pipes [mailto:jaypi...@gmail.com]
> >> Sent: Monday, May 05, 2014 6:19 PM
> >> To: openstack-dev@lists.openstack.org
> >> Subject: Re: [openstack-dev] [nova] Question about addit log in
> >> nova-compute.log
> >>
> >> On 05/05/2014 04:19 PM, Jiang, Yunhong wrote:
> >>>> -Original Message-
> >>>> From: Jay Pipes [mailto:jaypi...@gmail.com]
> >>>> Sent: Monday, May 05, 2014 9:50 AM
> >>>> To: openstack-dev@lists.openstack.org
> >>>> Subject: Re: [openstack-dev] [nova] Question about addit log in
> >>>> nova-compute.log
> >>>>
> >>>> On 05/04/2014 11:09 PM, Chen CH Ji wrote:
> >>>>> Hi
> >>>>>   I saw in my compute.log has following logs
> which
> >> looks
> >>>>> to me strange at first, Free resource is negative make me confused
> >> and I
> >>>>> take a look at the existing code
> >>>>>   looks to me the logic is correct and calculation
> >> doesn't
> >>>>> have problem ,but the output 'Free' is confusing
> >>>>>
> >>>>>   Is this on purpose or might need to be
> enhanced?
> >>>>>
> >>>>> 2014-05-05 10:51:33.732 4992 AUDIT
> >> nova.compute.resource_tracker
> >>>> [-]
> >>>>> Free ram (MB): -1559
> >>>>> 2014-05-05 10:51:33.732 4992 AUDIT
> >> nova.compute.resource_tracker
> >>>> [-]
> >>>>> Free disk (GB): 29
> >>>>> 2014-05-05 10:51:33.732 4992 AUDIT
> >> nova.compute.resource_tracker
> >>>> [-]
> >>>>> Free VCPUS: -3
> >>>>
> >>>> Hi Kevin,
> >>>>
> >>>> I think changing "free" to "available" might make things a little more
> >>>> clear. In the above case, it may be that your compute worker has
> both
> >>>> CPU and RAM overcommit enabled.
> >>>>
> >>>> Best,
> >>>> -jay
> >>>
> >>> HI, Jay,
> >>>   I don't think change 'free' to 'available' will make it clearer.
> >>>   IMHO, the calculation of the 'free' is bogus. When report the
> status in
> >> the periodic task, the resource tracker has no idea of the over-commit
> >> ration at all, thus it simply subtract the total RAM number assigned to
> >> instances from the RAM number provided by hypervisor w/o
> considering
> >> the over-commitment at all. So this number really have meaningless.
> >>
> >> Agreed that in it's current state, it's meaningless. But... that said,
> >> the numbers *could* be used to show oversubscription percentage,
> and
> >> you
> >> don't need to know the max overcommit ratio in order to calculate that
> >> with the numbers already known.
> >
> > I don't think user can use these number to calculate the 'available'. User
> has to know the max overcommit ratio to know the 'available'. Also, it's
> really ironic to provide some meaningless information and have the user
> to calculate to get meaningful.
> >
> > This is related to https://bugs.launchpad.net/nova/+bug/1300775 . I
> think it will be better if we can have the resource tracker to knows about
> the ratio.
> 
> Sorry, you misunderstood me... I was referring to the resource tracker
> above, not a regular user. The resource tracker already knows the total
> amount of physical resources available on each compute node, and it
> knows the resource usage reported by each compute node. Therefore,
> the
> resource tracker already has all the information it needs to understand
> the *actual* overcommit ratio of CPU and memory on each compute
> node,
> regardless of the settings of the *maximum* overcommit ratio on a
> compute node (which is in each compute node's nova.conf).
> 
> Hope that makes things a bit clearer! Sorry for the confusion :)

Aha, the 'actual' and 'maximum' makes it quite clear now , thanks for 
clarification.

--jyh
> 
> Best,
> -jay
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Question about addit log in nova-compute.log

2014-05-06 Thread Jiang, Yunhong


> -Original Message-
> From: Jay Pipes [mailto:jaypi...@gmail.com]
> Sent: Monday, May 05, 2014 6:19 PM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [nova] Question about addit log in
> nova-compute.log
> 
> On 05/05/2014 04:19 PM, Jiang, Yunhong wrote:
> >> -Original Message-
> >> From: Jay Pipes [mailto:jaypi...@gmail.com]
> >> Sent: Monday, May 05, 2014 9:50 AM
> >> To: openstack-dev@lists.openstack.org
> >> Subject: Re: [openstack-dev] [nova] Question about addit log in
> >> nova-compute.log
> >>
> >> On 05/04/2014 11:09 PM, Chen CH Ji wrote:
> >>> Hi
> >>>  I saw in my compute.log has following logs which
> looks
> >>> to me strange at first, Free resource is negative make me confused
> and I
> >>> take a look at the existing code
> >>>  looks to me the logic is correct and calculation
> doesn't
> >>> have problem ,but the output 'Free' is confusing
> >>>
> >>>  Is this on purpose or might need to be enhanced?
> >>>
> >>> 2014-05-05 10:51:33.732 4992 AUDIT
> nova.compute.resource_tracker
> >> [-]
> >>> Free ram (MB): -1559
> >>> 2014-05-05 10:51:33.732 4992 AUDIT
> nova.compute.resource_tracker
> >> [-]
> >>> Free disk (GB): 29
> >>> 2014-05-05 10:51:33.732 4992 AUDIT
> nova.compute.resource_tracker
> >> [-]
> >>> Free VCPUS: -3
> >>
> >> Hi Kevin,
> >>
> >> I think changing "free" to "available" might make things a little more
> >> clear. In the above case, it may be that your compute worker has both
> >> CPU and RAM overcommit enabled.
> >>
> >> Best,
> >> -jay
> >
> > HI, Jay,
> > I don't think change 'free' to 'available' will make it clearer.
> > IMHO, the calculation of the 'free' is bogus. When report the status in
> the periodic task, the resource tracker has no idea of the over-commit
> ration at all, thus it simply subtract the total RAM number assigned to
> instances from the RAM number provided by hypervisor w/o considering
> the over-commitment at all. So this number really have meaningless.
> 
> Agreed that in it's current state, it's meaningless. But... that said,
> the numbers *could* be used to show oversubscription percentage, and
> you
> don't need to know the max overcommit ratio in order to calculate that
> with the numbers already known.

I don't think user can use these number to calculate the 'available'. User has 
to know the max overcommit ratio to know the 'available'. Also, it's really 
ironic to provide some meaningless information and have the user to calculate 
to get meaningful.

This is related to https://bugs.launchpad.net/nova/+bug/1300775 . I think it 
will be better if we can have the resource tracker to knows about the ratio.

--jyh

> 
> Best,
> -jay
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][olso] How is the trusted message going on?

2014-05-05 Thread Jiang, Yunhong
Hi, all
The trusted messaging 
(https://blueprints.launchpad.net/oslo.messaging/+spec/trusted-messaging) has 
been removed from icehouse, does anyone know how is current status? I noticed a 
summit session may cover it ( 
http://junodesignsummit.sched.org/event/9a6b59be11cdeaacfea70fef34328931#.U2gMo_ldWrQ
 ), but would really appreciate if anyone can provide some information.

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova]The status of the hypervisor node and the service

2014-05-05 Thread Jiang, Yunhong
Hi, all
Currently I'm working on spec at https://review.openstack.org/#/c/90172/4 
which return the status of the hypervisor node. Per the comments, including 
comments from operator, this is a welcome features. As in 
https://review.openstack.org/#/c/90172/4/specs/juno/return-status-for-hypervisor-node.rst#77
 , I try to return the status as "up", "down", "disabled", which is in fact a 
mix of corresponding service's status and state.

However, there are some disagreement on how to return the status. For 
example, should we return both the 'status' and 'state', and should we return 
the 'disabled reason' if the corresponding service is disabled. 

I have several questions that want to get feedback from the community:
a) Why do we distinguish he service 'status' and 'state'? What's the exact 
difference of 'state' and 'status' in English? IMHO, a service is 'up' when 
enabled, and is 'down' when either disabled or when temporally lost the 
heartbeat. 

b) The difference of the hypervisor node status and service status. I know the 
relationship of 'node' and 'host' in still under discussion 
(http://junodesignsummit.sched.org/event/a0d38e1278182eb09f06e22457d94c0c#.U2fy3PldWrQ
 ), do you think the node status and the service status is in scope of this 
discussion? Or I should simply copy the status/state/disabled_reason in the 
hypvisor_node status? Would it be possible that in some virt driver, one 
hypervisor node can have its own status value different with the service value?

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Question about addit log in nova-compute.log

2014-05-05 Thread Jiang, Yunhong


> -Original Message-
> From: Jay Pipes [mailto:jaypi...@gmail.com]
> Sent: Monday, May 05, 2014 9:50 AM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [nova] Question about addit log in
> nova-compute.log
> 
> On 05/04/2014 11:09 PM, Chen CH Ji wrote:
> > Hi
> > I saw in my compute.log has following logs which looks
> > to me strange at first, Free resource is negative make me confused and I
> > take a look at the existing code
> > looks to me the logic is correct and calculation doesn't
> > have problem ,but the output 'Free' is confusing
> >
> > Is this on purpose or might need to be enhanced?
> >
> > 2014-05-05 10:51:33.732 4992 AUDIT nova.compute.resource_tracker
> [-]
> > Free ram (MB): -1559
> > 2014-05-05 10:51:33.732 4992 AUDIT nova.compute.resource_tracker
> [-]
> > Free disk (GB): 29
> > 2014-05-05 10:51:33.732 4992 AUDIT nova.compute.resource_tracker
> [-]
> > Free VCPUS: -3
> 
> Hi Kevin,
> 
> I think changing "free" to "available" might make things a little more
> clear. In the above case, it may be that your compute worker has both
> CPU and RAM overcommit enabled.
> 
> Best,
> -jay

HI, Jay,
I don't think change 'free' to 'available' will make it clearer. 
IMHO, the calculation of the 'free' is bogus. When report the status in 
the periodic task, the resource tracker has no idea of the over-commit ration 
at all, thus it simply subtract the total RAM number assigned to instances from 
the RAM number provided by hypervisor w/o considering the over-commitment at 
all. So this number really have meaningless.

--jyh

> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] about pci device filter

2014-05-05 Thread Jiang, Yunhong
Hi, Bohai, are you talking about the scheduler filter for PCI, right?

I think the scheduler filters can be changed by nova options already, so I 
don't think we need another mechanism and just create another filter to replace 
the default pci filter?

--jyh

> -Original Message-
> From: Bohai (ricky) [mailto:bo...@huawei.com]
> Sent: Monday, May 05, 2014 1:32 AM
> To: OpenStack-dev@lists.openstack.org
> Subject: [openstack-dev] [nova] about pci device filter
> 
> Hi, stackers:
> 
> Now there is an default while list filter for PCI device.
> But maybe it's not enough in some scenario.
> 
> Maybe it's better if we provide a mechanism to specify a customize filter.
> 
> For example:
> So user can make a special filter , then specify which filter to use in
> configure files.
> 
> Any advices?
> 
> Best regards to you.
> Ricky
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] How to add a property to the API extension?

2014-04-21 Thread Jiang, Yunhong
Alex, thanks for your answer very much.

--jyh

> -Original Message-
> From: Alex Xu [mailto:x...@linux.vnet.ibm.com]
> Sent: Sunday, April 20, 2014 7:06 PM
> To: OpenStack Development Mailing List (not for usage questions);
> Christopher Yeoh
> Subject: Re: [openstack-dev] How to add a property to the API extension?
> 
> On 2014年04月17日 05:25, Jiang, Yunhong wrote:
> > Hi, Christopher,
> > I have some question to the API changes related to
> https://review.openstack.org/#/c/80707/4/nova/api/openstack/compute/
> plugins/v3/hypervisors.py , which adds a property to the hypervisor
> information.
> 
> Hi, Yunhong, Chris may be available for a while. Let me answer your
> question.
> 
> > a) I checked the https://wiki.openstack.org/wiki/APIChangeGuidelines
> but not sure if it's ok to "Adding a property to a resource representation"
> as I did in the patch, or I need another extension to add this property?
> Does "OK when conditionally added as a new API extension" means I need
> another extension?
> You can add a property for v3 api directly for now. Because v3 api
> didn't release yet. We needn't wrong about any back-compatibility
> problem. if you add a property for v2 api
> you need another extension.
> >
> > b) If we can simply add a property like the patch is doing, would it
> requires to bump the version number? If yes, how should the version
> number be? Would it be like 1/2/3 etc, or should it be something like
> 1.1/1.2/2.1 etc?
> 
> You needn't bump the version number for same reason v3 api didn't
> release yet. After v3 api released, we should bump the version, it would
> be like 1/2/3 etc.
> 
> > Thanks
> > --jyh
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] How to add a property to the API extension?

2014-04-16 Thread Jiang, Yunhong
Hi, Christopher,
I have some question to the API changes related to 
https://review.openstack.org/#/c/80707/4/nova/api/openstack/compute/plugins/v3/hypervisors.py
 , which adds a property to the hypervisor information.

a) I checked the https://wiki.openstack.org/wiki/APIChangeGuidelines but not 
sure if it's ok to "Adding a property to a resource representation" as I did in 
the patch, or I need another extension to add this property? Does "OK when 
conditionally added as a new API extension" means I need another extension?

b) If we can simply add a property like the patch is doing, would it requires 
to bump the version number? If yes, how should the version number be? Would it 
be like 1/2/3 etc, or should it be something like 1.1/1.2/2.1 etc?

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] nova-specs

2014-04-16 Thread Jiang, Yunhong
> single comment they made was -1 worthy on its own. Often times I will -1
> for a spelling mistake and then make a bunch of other purely-opinion
> comments which don't necessarily need to change.

Do we really want to -1 for spelling mistake in nova-specs? This is really a 
bad news for non-native speaker like me because I'm really not sensitive to the 
spelling even after checking again and again. (Of course I can point out any 
spelling wrong in Chinese very quickly).

The major concern of the -1 on spelling is, seems nova-drive and nova-cores 
will not check a patch/spec at all if it's -1 already. Thus -1 on spelling will 
delay the review, especially if the nova-drive member and the author is in 
different time zone.

Would it be better to -1 mainly for design issue, and comments on spelling 
error w/o -1? After all, nova-spec is different with a nova patch.

Thanks
--jyh


> 
> --Dan
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] Some changes on config drive

2014-02-28 Thread Jiang, Yunhong
Sorry forgot nova prefix in subject.

--jyh

> -Original Message-
> From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> Sent: Friday, February 28, 2014 9:32 PM
> To: openstack-dev@lists.openstack.org
> Subject: [openstack-dev] Some changes on config drive
> 
> Hi, Michael and all,
> 
>   I created some changes to config_drive, and hope to get some
> feedback. The patches are at
> https://review.openstack.org/#/q/status:open+project:openstack/nova+b
> ranch:master+topic:config_drive_cleanup,n,z
> 
>   The basically ideas of the changes are:
>   1) Instead of using host based config option to decide config_drive
> and config_drive_format, fetch such information from image property.
> Accordingly to Michael, its image that decide if it need config drive, and I
> think it's image that decide what's the config drive format supported. (like
> cloudinit verion 1.0 does not support iso9660 format.
> (http://cloudinit.readthedocs.org/en/latest/topics/datasources.html#versi
> on-1)
> 
>   2) I noticed some virt drivers like VMWare/hyperv support only
> iso9660 format, , thus select the host based on image property, for
> example, if a host can't support vfat, don't try to schedule a server
> requires 'vfat' to that host.
> 
>   The implementation detais are:
> 
>   1) Image can provide two properties, 'config_drive' and
> 'config_drive_format'.
> 
>   2) There is a cloud wise force_config_drive option (in the api service)
> to decide if the config_drive will be forced applied.
> 
>   3) There is a host specific config_drive_format to set the default
> config_drive format if not specified in the image property.
> 
>   4) In the image property filter, we will select the host that support 
> the
> config_drive_format in image property
> 
>   Any feedback is welcome to these changes.
> 
> Thanks
> --jyh
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] inconsistent naming? node vs host vs vs hypervisor_hostname vs OS-EXT-SRV-ATTR:host

2014-02-28 Thread Jiang, Yunhong

> -Original Message-
> From: Chris Friesen [mailto:chris.frie...@windriver.com]
> Sent: Friday, February 28, 2014 10:07 AM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] inconsistent naming? node vs host vs vs
> hypervisor_hostname vs OS-EXT-SRV-ATTR:host
> 
> On 02/28/2014 11:38 AM, Jiang, Yunhong wrote:
> > One reason of the confusion is, in some virt driver (maybe xenapi or
> > vmwareapi), one compute service manages multiple node.
> 
> Okay, so in the scenario above, is the nova-compute service running on a

I think the nova compute service runs on a host, as you can see from 
compute/manager.py and manager.py. 

> "node" or a "host"?  (And if it's a "host", then what is the "compute
> node"?)

Check the update_available_resource() at compute/manager.py for the node idea.

> 
> What is the distinction between "OS-EXT-SRV-ATTR:host" and
> "OS-EXT-SRV-ATTR:hypervisor_hostname" in the above case?

According to _extend_server() at 
./api/openstack/compute/contrib/extended_server_attributes.py, the 
"OS-EXT-SRV-ATTR:hypervisor_hostname is the node and the  
"OS-EXT-SRV-ATTR:host" is the host.

I agree this is a bit confusing, especially the document is not clearly, I'd 
call the " OS-EXT-SRV-ATTR:hypervisor_hostname" as 
" OS-EXT-SRV-ATTR:hypervisor_nodename", which makes more sense and more 
clearly. Per my understanding with the xenapi, there is a hypervisor on each 
compute node, and XenAPI (or any name for that software layer) manage multiple 
(or 1 in extreme case) nodes, that XenAPI software layer interact with nova 
service and is like a host from nova point of view.

Dan has some interesting discussion on the Nova meet up on this and the cell 
(so called cloud NUMA IIRC?)

Thanks
--jyh

> 
> Chris
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Some changes on config drive

2014-02-28 Thread Jiang, Yunhong
Hi, Michael and all,

I created some changes to config_drive, and hope to get some feedback. 
The patches are at 
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:config_drive_cleanup,n,z
 

The basically ideas of the changes are:
1) Instead of using host based config option to decide config_drive and 
config_drive_format, fetch such information from image property. Accordingly to 
Michael, its image that decide if it need config drive, and I think it's image 
that decide what's the config drive format supported. (like cloudinit verion 
1.0 does not support iso9660 format. 
(http://cloudinit.readthedocs.org/en/latest/topics/datasources.html#version-1)

2) I noticed some virt drivers like VMWare/hyperv support only iso9660 
format, , thus select the host based on image property, for example, if a host 
can't support vfat, don't try to schedule a server requires 'vfat' to that host.

The implementation detais are:

1) Image can provide two properties, 'config_drive' and 
'config_drive_format'.

2) There is a cloud wise force_config_drive option (in the api service) 
to decide if the config_drive will be forced applied.

3) There is a host specific config_drive_format to set the default 
config_drive format if not specified in the image property.

4) In the image property filter, we will select the host that support 
the config_drive_format in image property

Any feedback is welcome to these changes.

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] inconsistent naming? node vs host vs vs hypervisor_hostname vs OS-EXT-SRV-ATTR:host

2014-02-28 Thread Jiang, Yunhong
One reason of the confusion is, in some virt driver (maybe xenapi or 
vmwareapi), one compute service manages multiple node.

--jyh

> -Original Message-
> From: Chris Friesen [mailto:chris.frie...@windriver.com]
> Sent: Friday, February 28, 2014 7:40 AM
> To: openstack-dev@lists.openstack.org
> Subject: [openstack-dev] inconsistent naming? node vs host vs vs
> hypervisor_hostname vs OS-EXT-SRV-ATTR:host
> 
> Hi,
> 
> I've been working with OpenStack for a while now but I'm still a bit
> fuzzy on the precise meaning of some of the terminology.
> 
> It seems reasonably clear that a "node" is a computer running at least
> one component of an Openstack system.
> 
> However, "nova service-list" talks about the "host" that a given service
> runs on.  Shouldn't that be "node"?  Normally "host" is used to
> distinguish from "guest", but that doesn't really make sense for a
> dedicated controller node.
> 
> "nova show" reports "OS-EXT-SRV-ATTR:host" and
> "OS-EXT-SRV-ATTR:hypervisor_hostname" for an instance.  What is the
> distinction between the two and how do they relate to OpenStack "nodes"
> or the "host" names in "nova service-list"?
> 
> "nova hypervisor-list" uses the term "hypervisor hostname", but "nova
> hypervisor-stats" talks about "compute nodes".  Is this distinction
> accurate or should they both use the hypervisor terminology?  What is
> the distinction between hypervisor/host/node?
> 
> "nova host-list" reports "host_name", but seems to include all services.
>   Does "host_name" correspond to host, hypervisor_host, or node?  And
> just to make things interesting, the other "nova host-*" commands only
> work on compute hosts, so maybe "nova host-list" should only output info
> for systems running nova-compute?
> 
> 
> Thanks,
> Chris
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] why force_config_drive is a per comptue node config

2014-02-28 Thread Jiang, Yunhong
Hi, Michael, I cooked a patch at https://review.openstack.org/#/c/77027/ and 
please have a look.

Another thing I'm not sure is, currently when "nova show" will only show if the 
user specify the 'config_drive' according to the DB, however, user have no idea 
if the config_drive is success or not, or the format used etc. Do you think we 
should extend such information to be more useful?

Also, how do you think about make the config_drive_format also based on image 
property instead of compute node config? IIRC, vfat/cdrom is mostly based on 
image requirement, right? Or, take image property as precendence.

Thanks
--jyh

> -Original Message-
> From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> Sent: Thursday, February 27, 2014 1:55 PM
> To: OpenStack Development Mailing List (not for usage questions);
> yunhong jiang
> Subject: Re: [openstack-dev] [Nova] why force_config_drive is a per
> comptue node config
> 
> Hi, Michael, I created a bug at
> https://bugs.launchpad.net/nova/+bug/1285880 and please have a look.
> 
> Thanks
> --jyh
> 
> > -Original Message-
> > From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> > Sent: Thursday, February 27, 2014 1:35 PM
> > To: OpenStack Development Mailing List (not for usage questions);
> > yunhong jiang
> > Subject: Re: [openstack-dev] [Nova] why force_config_drive is a per
> > comptue node config
> >
> >
> >
> > > -Original Message-
> > > From: Michael Still [mailto:mi...@stillhq.com]
> > > Sent: Thursday, February 27, 2014 1:04 PM
> > > To: yunhong jiang
> > > Cc: OpenStack Development Mailing List
> > > Subject: Re: [openstack-dev] [Nova] why force_config_drive is a per
> > > comptue node config
> > >
> > > On Fri, Feb 28, 2014 at 6:34 AM, yunhong jiang
> > >  wrote:
> > > > Greeting,
> > > > I have some questions on the force_config_drive
> > configuration
> > > options
> > > > and hope get some hints.
> > > > a) Why do we want this? Per my understanding, if the user
> > > want to use
> > > > the config drive, they need specify it in the nova boot. Or is it
> > > > because possibly user have no idea of the cloudinit installation in the
> > > > image?
> > >
> > > It is possible for a cloud admin to have only provided images which
> > > work with config drive. In that case the admin would want to force
> > > config drive on, to ensure that instances always boot correctly.
> >
> > So would it make sense to keep it as image property, instead of compute
> > node config?
> >
> > >
> > > > b) even if we want to force config drive, why it's a compute
> > > node
> > > > config instead of cloud wise config? Compute-node config will have
> > > some
> > > > migration issue per my understanding.
> > >
> > > That's a fair point. It should probably have been a flag on the api
> > > servers. I'd file a bug for that one.
> >
> > Thanks, and I can cook a patch for it. Still I think it will be better if 
> > we use
> > image property?
> >
> > --jyh
> >
> > >
> > > Michael
> > >
> > > --
> > > Rackspace Australia
> > >
> > > ___
> > > OpenStack-dev mailing list
> > > OpenStack-dev@lists.openstack.org
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] why force_config_drive is a per comptue node config

2014-02-27 Thread Jiang, Yunhong
Hi, Michael, I created a bug at https://bugs.launchpad.net/nova/+bug/1285880 
and please have a look.

Thanks
--jyh

> -Original Message-
> From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> Sent: Thursday, February 27, 2014 1:35 PM
> To: OpenStack Development Mailing List (not for usage questions);
> yunhong jiang
> Subject: Re: [openstack-dev] [Nova] why force_config_drive is a per
> comptue node config
> 
> 
> 
> > -Original Message-
> > From: Michael Still [mailto:mi...@stillhq.com]
> > Sent: Thursday, February 27, 2014 1:04 PM
> > To: yunhong jiang
> > Cc: OpenStack Development Mailing List
> > Subject: Re: [openstack-dev] [Nova] why force_config_drive is a per
> > comptue node config
> >
> > On Fri, Feb 28, 2014 at 6:34 AM, yunhong jiang
> >  wrote:
> > > Greeting,
> > > I have some questions on the force_config_drive
> configuration
> > options
> > > and hope get some hints.
> > > a) Why do we want this? Per my understanding, if the user
> > want to use
> > > the config drive, they need specify it in the nova boot. Or is it
> > > because possibly user have no idea of the cloudinit installation in the
> > > image?
> >
> > It is possible for a cloud admin to have only provided images which
> > work with config drive. In that case the admin would want to force
> > config drive on, to ensure that instances always boot correctly.
> 
> So would it make sense to keep it as image property, instead of compute
> node config?
> 
> >
> > > b) even if we want to force config drive, why it's a compute
> > node
> > > config instead of cloud wise config? Compute-node config will have
> > some
> > > migration issue per my understanding.
> >
> > That's a fair point. It should probably have been a flag on the api
> > servers. I'd file a bug for that one.
> 
> Thanks, and I can cook a patch for it. Still I think it will be better if we 
> use
> image property?
> 
> --jyh
> 
> >
> > Michael
> >
> > --
> > Rackspace Australia
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] why force_config_drive is a per comptue node config

2014-02-27 Thread Jiang, Yunhong


> -Original Message-
> From: Michael Still [mailto:mi...@stillhq.com]
> Sent: Thursday, February 27, 2014 1:04 PM
> To: yunhong jiang
> Cc: OpenStack Development Mailing List
> Subject: Re: [openstack-dev] [Nova] why force_config_drive is a per
> comptue node config
> 
> On Fri, Feb 28, 2014 at 6:34 AM, yunhong jiang
>  wrote:
> > Greeting,
> > I have some questions on the force_config_drive configuration
> options
> > and hope get some hints.
> > a) Why do we want this? Per my understanding, if the user
> want to use
> > the config drive, they need specify it in the nova boot. Or is it
> > because possibly user have no idea of the cloudinit installation in the
> > image?
> 
> It is possible for a cloud admin to have only provided images which
> work with config drive. In that case the admin would want to force
> config drive on, to ensure that instances always boot correctly.

So would it make sense to keep it as image property, instead of compute node 
config?

> 
> > b) even if we want to force config drive, why it's a compute
> node
> > config instead of cloud wise config? Compute-node config will have
> some
> > migration issue per my understanding.
> 
> That's a fair point. It should probably have been a flag on the api
> servers. I'd file a bug for that one.

Thanks, and I can cook a patch for it. Still I think it will be better if we 
use image property?

--jyh

> 
> Michael
> 
> --
> Rackspace Australia
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] The BP for the persistent resource claim

2014-02-14 Thread Jiang, Yunhong
Hi, Brian
I created the BP for persistent resource claim at 
https://blueprints.launchpad.net/nova/+spec/persistent-resource-claim , based 
on the discussion on IRC, can you please have a look on it to see any potential 
issue? 

Thanks
--jyh 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] The simplified blueprint for PCI extra attributes

2014-02-03 Thread Jiang, Yunhong
Hi, John and all,
I updated the blueprint 
https://blueprints.launchpad.net/nova/+spec/pci-extra-info-icehouse  according 
to your feedback, to add the backward compatibility/upgrade issue/examples.

I try to separate this BP with the SR-IOV NIC support as a standalone 
enhancement, because this requirement is more a generic PCI pass through 
feature, and will benefit other usage scenario as well.

And the reasons that I want to finish this BP in I release are:

a) it's a generic requirement, and push it into I release is helpful to 
other scenario.
b) I don't see upgrade issue, and the only thing will be discarded in 
future is the PCI alias if we all agree to use PCI flavor. But that effort will 
be small and there is no conclusion to PCI flavor yet.
c) SR-IOV NIC support is complex, it will be really helpful if we can 
keep ball rolling and push the all-agreed items forward. 

Considering the big patch list for I-3 release, I'm not optimistic to 
merge this in I release, but as said, we should keep the ball rolling and move 
forward.

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-17 Thread Jiang, Yunhong
Robert, thanks for your long reply. Personally I'd prefer option 2/3 as it keep 
Nova the only entity for PCI management.

Glad you are ok with Ian's proposal and we have solution to resolve the libvirt 
network scenario in that framework.

Thanks
--jyh

> -Original Message-
> From: Robert Li (baoli) [mailto:ba...@cisco.com]
> Sent: Friday, January 17, 2014 7:08 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
> 
> Yunhong,
> 
> Thank you for bringing that up on the live migration support. In addition
> to the two solutions you mentioned, Irena has a different solution. Let me
> put all the them here again:
> 1. network xml/group based solution.
>In this solution, each host that supports a provider net/physical
> net can define a SRIOV group (it's hard to avoid the term as you can see
> from the suggestion you made based on the PCI flavor proposal). For each
> SRIOV group supported on a compute node, A network XML will be
> created the
> first time the nova compute service is running on that node.
> * nova will conduct scheduling, but not PCI device allocation
> * it's a simple and clean solution, documented in libvirt as the
> way to support live migration with SRIOV. In addition, a network xml is
> nicely mapped into a provider net.
> 2. network xml per PCI device based solution
>This is the solution you brought up in this email, and Ian
> mentioned this to me as well. In this solution, a network xml is created
> when A VM is created. the network xml needs to be removed once the
> VM is
> removed. This hasn't been tried out as far as I  know.
> 3. interface xml/interface rename based solution
>Irena brought this up. In this solution, the ethernet interface
> name corresponding to the PCI device attached to the VM needs to be
> renamed. One way to do so without requiring system reboot is to change
> the
> udev rule's file for interface renaming, followed by a udev reload.
> 
> Now, with the first solution, Nova doesn't seem to have control over or
> visibility of the PCI device allocated for the VM before the VM is
> launched. This needs to be confirmed with the libvirt support and see if
> such capability can be provided. This may be a potential drawback if a
> neutron plugin requires detailed PCI device information for operation.
> Irena may provide more insight into this. Ideally, neutron shouldn't need
> this information because the device configuration can be done by libvirt
> invoking the PCI device driver.
> 
> The other two solutions are similar. For example, you can view the second
> solution as one way to rename an interface, or camouflage an interface
> under a network name. They all require additional works before the VM is
> created and after the VM is removed.
> 
> I also agree with you that we should take a look at XenAPI on this.
> 
> 
> With regard to your suggestion on how to implement the first solution with
> some predefined group attribute, I think it definitely can be done. As I
> have pointed it out earlier, the PCI flavor proposal is actually a
> generalized version of the PCI group. In other words, in the PCI group
> proposal, we have one predefined attribute called PCI group, and
> everything else works on top of that. In the PCI flavor proposal,
> attribute is arbitrary. So certainly we can define a particular attribute
> for networking, which let's temporarily call sriov_group. But I can see
> with this idea of predefined attributes, more of them will be required by
> different types of devices in the future. I'm sure it will keep us busy
> although I'm not sure it's in a good way.
> 
> I was expecting you or someone else can provide a practical deployment
> scenario that would justify the flexibilities and the complexities.
> Although I'd prefer to keep it simple and generalize it later once a
> particular requirement is clearly identified, I'm fine to go with it if
> that's most of the folks want to do.
> 
> --Robert
> 
> 
> 
> On 1/16/14 8:36 PM, "yunhong jiang" 
> wrote:
> 
> >On Thu, 2014-01-16 at 01:28 +0100, Ian Wells wrote:
> >> To clarify a couple of Robert's points, since we had a conversation
> >> earlier:
> >> On 15 January 2014 23:47, Robert Li (baoli)  wrote:
> >>   ---  do we agree that BDF address (or device id, whatever
> >> you call it), and node id shouldn't be used as attributes in
> >> defining a PCI flavor?
> >>
> >>
> >> Note that the current spec doesn't actually exclude it as an option.
> >> It's just an unwise thing to do.  In theory, you could elect to define
> >> your flavors using the BDF attribute but determining 'the card in this
> >> slot is equivalent to all the other cards in the same slot in other
> >> machines' is probably not the best idea...  We could lock it out as an
> >> option or we could just assume that administrators wouldn't be daft
> >> enough to 

Re: [openstack-dev] [nova] how is resource tracking supposed to work for live migration and evacuation?

2014-01-17 Thread Jiang, Yunhong
Paul, thanks for clarification.

--jyh

> -Original Message-
> From: Murray, Paul (HP Cloud Services) [mailto:pmur...@hp.com]
> Sent: Friday, January 17, 2014 7:02 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] how is resource tracking supposed to
> work for live migration and evacuation?
> 
> To be clear - the changes that Yunhong describes below are not part of the
> extensible-resource-tracking blueprint. Extensible-resource-tracking has
> the more modest aim to provide plugins to track additional resource data.
> 
> Paul.
> 
> -Original Message-
> From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> Sent: 17 January 2014 05:54
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] how is resource tracking supposed to
> work for live migration and evacuation?
> 
> There are some related discussion on this before.
> 
> There is a BP at
> https://blueprints.launchpad.net/nova/+spec/extensible-resource-trackin
> g which try to support more resources.
> 
> And I have a documentation at
> https://docs.google.com/document/d/1gI_GE0-H637lTRIyn2UPfQVebfk5Qj
> Di6ohObt6MIc0 . My idea is to keep the claim as an object which can be
> invoked remotely, and the claim result is kept in DB as the instance's usage.
> I'm working on it now.
> 
> Thanks
> --jyh
> 
> > -Original Message-
> > From: Vishvananda Ishaya [mailto:vishvana...@gmail.com]
> > Sent: Thursday, January 16, 2014 2:27 PM
> > To: OpenStack Development Mailing List (not for usage questions)
> > Subject: Re: [openstack-dev] [nova] how is resource tracking supposed
> > to work for live migration and evacuation?
> >
> >
> > On Jan 16, 2014, at 1:12 PM, Chris Friesen
> > 
> > wrote:
> >
> > > Hi,
> > >
> > > I'm trying to figure out how resource tracking is intended to work
> > > for live
> > migration and evacuation.
> > >
> > > For a while I thought that maybe we were relying on the call to
> > ComputeManager._instance_update() in
> > ComputeManager.post_live_migration_at_destination().  However, in
> > > ResourceTracker.update_usage() we see that on a live migration the
> > instance that has just migrated over isn't listed in
> > self.tracked_instances and so we don't actually update its usage.
> > >
> > > As far as I can see, the current code will just wait for the audit
> > > to run at
> > some unknown time in the future and call update_available_resource(),
> > which will add the newly-migrated instance to self.tracked_instances
> > and update the resource usage.
> > >
> > > From my poking around so far the same thing holds true for
> > > evacuation
> > as well.
> > >
> > > In either case, just waiting for the audit seems somewhat haphazard.
> > >
> > > Would it make sense to do something like
> > ResourceTracker.instance_claim() during the migration/evacuate and
> > properly track the resources rather than wait for the audit?
> >
> > Yes that makes sense to me. Live migration was around before we had a
> > resource tracker so it probably was just never updated.
> >
> > Vish
> >
> > >
> > > Chris
> > >
> > > ___
> > > OpenStack-dev mailing list
> > > OpenStack-dev@lists.openstack.org
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] how is resource tracking supposed to work for live migration and evacuation?

2014-01-16 Thread Jiang, Yunhong
There are some related discussion on this before. 

There is a BP at 
https://blueprints.launchpad.net/nova/+spec/extensible-resource-tracking which 
try to support more resources.

And I have a documentation at  
https://docs.google.com/document/d/1gI_GE0-H637lTRIyn2UPfQVebfk5QjDi6ohObt6MIc0 
. My idea is to keep the claim as an object which can be invoked remotely, and 
the claim result is kept in DB as the instance's usage. I'm working on it now.

Thanks
--jyh

> -Original Message-
> From: Vishvananda Ishaya [mailto:vishvana...@gmail.com]
> Sent: Thursday, January 16, 2014 2:27 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] how is resource tracking supposed to
> work for live migration and evacuation?
> 
> 
> On Jan 16, 2014, at 1:12 PM, Chris Friesen 
> wrote:
> 
> > Hi,
> >
> > I'm trying to figure out how resource tracking is intended to work for live
> migration and evacuation.
> >
> > For a while I thought that maybe we were relying on the call to
> ComputeManager._instance_update() in
> ComputeManager.post_live_migration_at_destination().  However, in
> > ResourceTracker.update_usage() we see that on a live migration the
> instance that has just migrated over isn't listed in self.tracked_instances
> and so we don't actually update its usage.
> >
> > As far as I can see, the current code will just wait for the audit to run at
> some unknown time in the future and call update_available_resource(),
> which will add the newly-migrated instance to self.tracked_instances and
> update the resource usage.
> >
> > From my poking around so far the same thing holds true for evacuation
> as well.
> >
> > In either case, just waiting for the audit seems somewhat haphazard.
> >
> > Would it make sense to do something like
> ResourceTracker.instance_claim() during the migration/evacuate and
> properly track the resources rather than wait for the audit?
> 
> Yes that makes sense to me. Live migration was around before we had a
> resource tracker so it probably was just never updated.
> 
> Vish
> 
> >
> > Chris
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] why don't we deal with "claims" when live migrating an instance?

2014-01-16 Thread Jiang, Yunhong
I noticed the BP has been approved, but I really want to understand more on the 
reason, can anyone provide me some hints?

In the BP, it states that "For resize, we need to confirm, as we want to give 
end user an opportunity to rollback". But why do we want to give user an 
opportunity to rollback to resize? And why that reason does not apply to cold 
migration and live migration?

Thanks
--jyh

From: Jay Lau [mailto:jay.lau@gmail.com]
Sent: Thursday, January 16, 2014 3:27 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Nova] why don't we deal with "claims" when live 
migrating an instance?

Hi Scott,
I'm now trying to fix this issue at 
https://blueprints.launchpad.net/nova/+spec/auto-confirm-cold-migration
After the fix, we do not need to "confirm" the cold migration.

http://lists.openstack.org/pipermail/openstack-dev/2014-January/023726.html
Thanks,
Jay

2014/1/17 Scott Devoid mailto:dev...@anl.gov>>
Related question: Why does resize get called (and the VM put in "RESIZE VERIFY" 
state) when migrating from one machine to another, keeping the same flavor?

On Thu, Jan 16, 2014 at 9:54 AM, Brian Elliott 
mailto:bdelli...@gmail.com>> wrote:

On Jan 15, 2014, at 4:34 PM, Clint Byrum 
mailto:cl...@fewbar.com>> wrote:

> Hi Chris. Your thread may have gone unnoticed as it lacked the Nova tag.
> I've added it to the subject of this reply... that might attract them.  :)
>
> Excerpts from Chris Friesen's message of 2014-01-15 12:32:36 -0800:
>> When we create a new instance via _build_instance() or
>> _build_and_run_instance(), in both cases we call instance_claim() to
>> reserve and test for resources.
>>
>> During a cold migration I see us calling prep_resize() which calls
>> resize_claim().
>>
>> How come we don't need to do something like this when we live migrate an
>> instance?  Do we track the hypervisor overhead somewhere in the instance?
>>
>> Chris
>>
It is a good point and it should be done.  It is effectively a bug.

Brian

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] The extra_resource in compute node object

2014-01-13 Thread Jiang, Yunhong


> -Original Message-
> From: Dan Smith [mailto:d...@danplanet.com]
> Sent: Monday, January 13, 2014 6:35 AM
> To: Jiang, Yunhong; Murray, Paul (HP Cloud Services) (pmur...@hp.com)
> Cc: openstack-dev@lists.openstack.org
> Subject: Re: The extra_resource in compute node object
> 
> > This patch set makes the extra_resources a list of object, instead of
> > opaque json string. How do you think about that?
> 
> Sounds better to me, I'll go have a look.
> 
> > However, the compute resource object is different with current
> > NovaObject, a) it has no corresponding table, but just a field in
> > another table, and I assume it will have no save/update functions. b)
> > it defines the functions for the object like alloc/free etc. Not sure
> > if this is correct direction.
> 
> Having a NovaObject that isn't backed by a conventional SQLAlchemy
> model
> is fine with me, FWIW.

Thanks, please give any feedback on the gerrit. If it's ok, I will update the 
patch.

--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Jiang, Yunhong
Ian, not sure if I get your question. Why should scheduler get the number of 
flavor types requested? The scheduler will only translate the PCI flavor to the 
pci property match requirement like it does now, (either vendor_id, device_id, 
or item in extra_info), then match the translated pci flavor, i.e. pci 
requests, to the pci stats.

Thanks
--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Monday, January 13, 2014 10:57 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

It's worth noting that this makes the scheduling a computationally hard 
problem. The answer to that in this scheme is to reduce the number of inputs to 
trivialise the problem.  It's going to be O(f(number of flavor types requested, 
number of pci_stats pools)) and if you group appropriately there shouldn't be 
an excessive number of pci_stats pools.  I am not going to stand up and say 
this makes it achievable - and if it doesn't them I'm not sure that anything 
would make overlapping flavors achievable - but I think it gives us some hope.
--
Ian.

On 13 January 2014 19:27, Jiang, Yunhong 
mailto:yunhong.ji...@intel.com>> wrote:
Hi, Robert, scheduler keep count based on pci_stats instead of the pci flavor.

As stated by Ian at 
https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg13455.html 
already, the flavor will only use the tags used by pci_stats.

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com<mailto:ba...@cisco.com>]
Sent: Monday, January 13, 2014 8:22 AM

To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

As I have responded in the other email, and If I understand PCI flavor 
correctly, then the issue that we need to deal with is the overlapping issue. A 
simplest case of this overlapping is that you can define a flavor F1 as 
[vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v'] .  Let's 
assume that only the admin can define the flavors. It's not hard to see that a 
device can belong to the two different flavors in the same time. This 
introduces an issue in the scheduler. Suppose the scheduler (counts or stats 
based) maintains counts based on flavors (or the keys corresponding to the 
flavors). To request a device with the flavor F1,  counts in F2 needs to be 
subtracted by one as well. There may be several ways to achieve that. But 
regardless, it introduces tremendous overhead in terms of system processing and 
administrative costs.

What are the use cases for that? How practical are those use cases?

thanks,
Robert

On 1/10/14 9:34 PM, "Ian Wells" 
mailto:ijw.ubu...@cack.org.uk>> wrote:


>
> OK - so if this is good then I think the question is how we could change the 
> 'pci_whitelist' parameter we have - which, as you say, should either *only* 
> do whitelisting or be renamed - to allow us to add information.  Yongli has 
> something along those lines but it's not flexible and it distinguishes poorly 
> between which bits are extra information and which bits are matching 
> expressions (and it's still called pci_whitelist) - but even with those 
> criticisms it's very close to what we're talking about.  When we have that I 
> think a lot of the rest of the arguments should simply resolve themselves.
>
>
>
> [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
> change to pci_whitelist is because it combined two things. So a stupid/naive 
> solution in my head is, change it to VERY generic name, 
> 'pci_devices_information',
>
> and change schema as an array of {'devices_property'=regex exp, 'group_name' 
> = 'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
> vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
> more into the "pci_devices_information" in future, like 'network_information' 
> = xxx or "Neutron specific information" you required in previous mail.


We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ match = { class = "Acme inc. discombobulator" }, info = { group = "we like 
teh groups", volume = "11" } }

>
> All keys other than 'device_property' becomes extra information, i.e. 
> software defined property. These extra information will be carried with the 
> PCI devices,. Some implementation details, A)we can limit the acceptable 
> keys, like we only support 'group_name', 'network_id', or we can accept any 
> keys other than reserved (ven

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Jiang, Yunhong
I'm not network engineer and always lost at 802.1Qbh/802.1BR specs :(  So I'd 
wait for requirement from Neutron. A quick check seems my discussion with Ian 
meet the requirement already?

Thanks
--jyh

From: Irena Berezovsky [mailto:ire...@mellanox.com]
Sent: Monday, January 13, 2014 12:51 AM
To: OpenStack Development Mailing List (not for usage questions)
Cc: Jiang, Yunhong; He, Yongli; Robert Li (baoli) (ba...@cisco.com); Sandhya 
Dasu (sadasu) (sad...@cisco.com); ijw.ubu...@cack.org.uk; j...@johngarbutt.com
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi,
After having a lot of discussions both on IRC and mailing list, I would like to 
suggest to define basic use cases for PCI pass-through network support with 
agreed list of limitations and assumptions  and implement it.  By doing this 
Proof of Concept we will be able to deliver basic PCI pass-through network 
support in Icehouse timeframe and understand better how to provide complete 
solution starting from  tenant /admin API enhancement, enhancing nova-neutron 
communication and eventually provide neutron plugin  supporting the PCI 
pass-through networking.
We can try to split tasks between currently involved participants and bring up 
the basic case. Then we can enhance the implementation.
Having more knowledge and experience with neutron parts, I would like  to start 
working on neutron mechanism driver support.  I have already started to arrange 
the following blueprint doc based on everyone's ideas:
https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit#<https://docs.google.com/document/d/1RfxfXBNB0mD_kH9SamwqPI8ZM-jg797ky_Fze7SakRc/edit>

For the basic PCI pass-through networking case we can assume the following:

1.   Single provider network (PN1)

2.   White list of available SRIOV PCI devices for allocation as NIC for 
neutron networks on provider network  (PN1) is defined on each compute node

3.   Support directly assigned SRIOV PCI pass-through device as vNIC. (This 
will limit the number of tests)

4.   More 


If my suggestion seems reasonable to you, let's try to reach an agreement and 
split the work during our Monday IRC meeting.

BR,
Irena

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Saturday, January 11, 2014 8:36 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Comments with prefix [yjiang5_2] , including the double confirm.

I think we (you and me) is mostly on the same page, would you please give a 
summary, and then we can have community , including Irena/Robert, to check it. 
We need Cores to sponsor it. We should check with John to see if this is 
different with his mentor picture, and we may need a neutron core (I assume 
Cisco has a bunch of Neutron cores :) )to sponsor it?

And, will anyone from Cisco can help on the implementation? After this long 
discussion, we are in half bottom of I release and I'm not sure if Yongli and I 
alone can finish them in I release.

Thanks
--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 6:34 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support


>
> OK - so if this is good then I think the question is how we could change the 
> 'pci_whitelist' parameter we have - which, as you say, should either *only* 
> do whitelisting or be renamed - to allow us to add information.  Yongli has 
> something along those lines but it's not flexible and it distinguishes poorly 
> between which bits are extra information and which bits are matching 
> expressions (and it's still called pci_whitelist) - but even with those 
> criticisms it's very close to what we're talking about.  When we have that I 
> think a lot of the rest of the arguments should simply resolve themselves.
>
>
>
> [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
> change to pci_whitelist is because it combined two things. So a stupid/naive 
> solution in my head is, change it to VERY generic name, 
> 'pci_devices_information',
>
> and change schema as an array of {'devices_property'=regex exp, 'group_name' 
> = 'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
> vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
> more into the "pci_devices_information" in future, like 'network_information' 
> = xxx or "Neutron specific information" you required in previous mail.


We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ matc

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-13 Thread Jiang, Yunhong
Hi, Robert, scheduler keep count based on pci_stats instead of the pci flavor.

As stated by Ian at 
https://www.mail-archive.com/openstack-dev@lists.openstack.org/msg13455.html 
already, the flavor will only use the tags used by pci_stats.

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, January 13, 2014 8:22 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

As I have responded in the other email, and If I understand PCI flavor 
correctly, then the issue that we need to deal with is the overlapping issue. A 
simplest case of this overlapping is that you can define a flavor F1 as 
[vendor_id='v', product_id='p'], and a flavor F2 as [vendor_id = 'v'] .  Let's 
assume that only the admin can define the flavors. It's not hard to see that a 
device can belong to the two different flavors in the same time. This 
introduces an issue in the scheduler. Suppose the scheduler (counts or stats 
based) maintains counts based on flavors (or the keys corresponding to the 
flavors). To request a device with the flavor F1,  counts in F2 needs to be 
subtracted by one as well. There may be several ways to achieve that. But 
regardless, it introduces tremendous overhead in terms of system processing and 
administrative costs.

What are the use cases for that? How practical are those use cases?

thanks,
Robert

On 1/10/14 9:34 PM, "Ian Wells" 
mailto:ijw.ubu...@cack.org.uk>> wrote:


>
> OK - so if this is good then I think the question is how we could change the 
> 'pci_whitelist' parameter we have - which, as you say, should either *only* 
> do whitelisting or be renamed - to allow us to add information.  Yongli has 
> something along those lines but it's not flexible and it distinguishes poorly 
> between which bits are extra information and which bits are matching 
> expressions (and it's still called pci_whitelist) - but even with those 
> criticisms it's very close to what we're talking about.  When we have that I 
> think a lot of the rest of the arguments should simply resolve themselves.
>
>
>
> [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
> change to pci_whitelist is because it combined two things. So a stupid/naive 
> solution in my head is, change it to VERY generic name, 
> 'pci_devices_information',
>
> and change schema as an array of {'devices_property'=regex exp, 'group_name' 
> = 'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
> vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
> more into the "pci_devices_information" in future, like 'network_information' 
> = xxx or "Neutron specific information" you required in previous mail.


We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ match = { class = "Acme inc. discombobulator" }, info = { group = "we like 
teh groups", volume = "11" } }

>
> All keys other than 'device_property' becomes extra information, i.e. 
> software defined property. These extra information will be carried with the 
> PCI devices,. Some implementation details, A)we can limit the acceptable 
> keys, like we only support 'group_name', 'network_id', or we can accept any 
> keys other than reserved (vendor_id, device_id etc) one.


Not sure we have a good list of reserved keys at the moment, and with two dicts 
it isn't really necessary, I guess.  I would say that we have one match parser 
which looks something like this:

# does this PCI device match the expression given?
def match(expression, pci_details, extra_specs):
   for (k, v) in expression:
if k.starts_with('e.'):
   mv = extra_specs.get(k[2:])
else:
   mv = pci_details.get(k[2:])
if not match(m, mv):
return False
return True

Usable in this matching (where 'e.' just won't work) and also for flavor 
assignment (where e. will indeed match the extra values).

> B) if a device match 'device_property' in several entries, raise exception, 
> or use the first one.

Use the first one, I think.  It's easier, and potentially more useful.

> [yjiang5_1] Another thing need discussed is, as you pointed out, "we would 
> need to add a config param on the control host to decide which flags to group 
> on when doing the stats".  I agree with the design, but some details need 
> decided.

This is a patch that can come at any point after we do the above stuff (which 
we need for Neutron), clearly.

> Where should it defined. If we a) define it in both control node and compute 
> node, then it should be static defined (just change pool_keys in 
> "/opt/stack/nova/nova/pci/pci_stats.py" to a configuration parameter) . Or b) 
> define only in control node, then I assume the control node should be the 
> scheduler node, and the scheduler manager need save such information, present 
> a A

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Jiang, Yunhong
Comments with prefix [yjiang5_2] , including the double confirm.

I think we (you and me) is mostly on the same page, would you please give a 
summary, and then we can have community , including Irena/Robert, to check it. 
We need Cores to sponsor it. We should check with John to see if this is 
different with his mentor picture, and we may need a neutron core (I assume 
Cisco has a bunch of Neutron cores :) )to sponsor it?

And, will anyone from Cisco can help on the implementation? After this long 
discussion, we are in half bottom of I release and I'm not sure if Yongli and I 
alone can finish them in I release.

Thanks
--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 6:34 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support


>
> OK - so if this is good then I think the question is how we could change the 
> 'pci_whitelist' parameter we have - which, as you say, should either *only* 
> do whitelisting or be renamed - to allow us to add information.  Yongli has 
> something along those lines but it's not flexible and it distinguishes poorly 
> between which bits are extra information and which bits are matching 
> expressions (and it's still called pci_whitelist) - but even with those 
> criticisms it's very close to what we're talking about.  When we have that I 
> think a lot of the rest of the arguments should simply resolve themselves.
>
>
>
> [yjiang5_1] The reason that not easy to find a flexible/distinguishable 
> change to pci_whitelist is because it combined two things. So a stupid/naive 
> solution in my head is, change it to VERY generic name, 
> 'pci_devices_information',
>
> and change schema as an array of {'devices_property'=regex exp, 'group_name' 
> = 'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
> vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
> more into the "pci_devices_information" in future, like 'network_information' 
> = xxx or "Neutron specific information" you required in previous mail.


We're getting to the stage that an expression parser would be useful, 
annoyingly, but if we are going to try and squeeze it into JSON can I suggest:

{ match = { class = "Acme inc. discombobulator" }, info = { group = "we like 
teh groups", volume = "11" } }

[yjiang5_2] Double confirm that 'match' is whitelist, and info is 'extra info', 
right?  Can the key be more meaningful, for example, 
s/match/pci_device_property,  s/info/pci_device_info, or s/match/pci_devices/  
etc.
Also assume the class should be the class code in the configuration space, and 
be digital, am I right? Otherwise, it's not easy to get the 'Acme inc. 
discombobulator' information.


>
> All keys other than 'device_property' becomes extra information, i.e. 
> software defined property. These extra information will be carried with the 
> PCI devices,. Some implementation details, A)we can limit the acceptable 
> keys, like we only support 'group_name', 'network_id', or we can accept any 
> keys other than reserved (vendor_id, device_id etc) one.


Not sure we have a good list of reserved keys at the moment, and with two dicts 
it isn't really necessary, I guess.  I would say that we have one match parser 
which looks something like this:

# does this PCI device match the expression given?
def match(expression, pci_details, extra_specs):
   for (k, v) in expression:
if k.starts_with('e.'):
   mv = extra_specs.get(k[2:])
else:
   mv = pci_details.get(k[2:])
if not match(m, mv):
return False
return True

Usable in this matching (where 'e.' just won't work) and also for flavor 
assignment (where e. will indeed match the extra values).
[yjiang5_2] I think if we use same function to check or use two functions for 
match/flavor will be implementation detail and can be discussed in next step. 
Of course, we should always avoid code duplication.


> B) if a device match 'device_property' in several entries, raise exception, 
> or use the first one.

Use the first one, I think.  It's easier, and potentially more useful.
[yjiang5] good.


> [yjiang5_1] Another thing need discussed is, as you pointed out, "we would 
> need to add a config param on the control host to decide which flags to group 
> on when doing the stats".  I agree with the design, but some details need 
> decided.

This is a patch that can come at any point after we do the above stuff (which 
we need for Neutron), clearly.

> Where should it defined. If we a) define it in both control node and compute 
> node, then it should be static defined (just change pool_keys in 
> "/opt/stack/nova/nova/pci/pci_stats.py" to a configuration parameter) . Or b) 
> define only in control node, then I assume the control node should be the 
> scheduler node, and the scheduler manager need save such information, present 
> a API to fetch 

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Jiang, Yunhong
I have to use [yjiang5_1] prefix now :)

--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 3:55 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

On 11 January 2014 00:04, Jiang, Yunhong 
mailto:yunhong.ji...@intel.com>> wrote:
[yjiang5] Really thanks for the summary and it is quite clear. So what's the 
object of "equivalent devices at host level"? Because 'equivalent device * to 
an end user *" is flavor, so is it 'equivalent to *scheduler*" or 'equivalent 
to *xxx*'? If equivalent to scheduler, then I'd take the pci_stats as a 
flexible group for scheduler

To the scheduler, indeed.  And with the group proposal the scheduler and end 
user equivalences are one and the same.
[yjiang5_1] Once use the proposal, then we missed the flexible for 'end user 
equivalences" and that's the reason I'm against the group :)


Secondly, for your definition of 'whitelist', I'm hesitate to your '*and*' 
because IMHO, 'and' means mixed two things together, otherwise, we can state in 
simply one sentence. For example, I prefer to have another configuration option 
to define the 'put devices in the group', or, if we extend it , be "define 
extra information like 'group name' for devices".

I'm not stating what we should do, or what the definitions should mean; I'm 
saying how they've been interpreted as weve discussed this in the past.  We've 
had issues in the past where we've had continuing difficulties in describing 
anything without coming back to a 'whitelist' (generally meaning 'matching 
expression, as an actual 'whitelist' is implied, rather than separately 
required, in a grouping system.
 Bearing in mind what you said about scheduling, and if we skip 'group' for a 
moment, then can I suggest (or possibly restate, because your comments are 
pointing in this direction):
- we allow extra information to be added at what is now the whitelisting stage, 
that just gets carried around with the device
[yjiang5] For 'added at ... whitelisting stage', see my above statement about 
the configuration. However, if you do want to use whitelist, I'm ok, but please 
keep in mind that it's two functionality combined: device you may assign *and* 
the group name for these devices.

Indeed - which is in fact what we've been proposing all along.


- when we're turning devices into flavors, we can also match on that extra 
information if we want (which means we can tag up the devices on the compute 
node if we like, according to taste, and then bundle them up by tag to make 
flavors; or we can add Neutron specific information and ignore it when making 
flavors)
[yjiang5] Agree. Currently we can only use vendor_id and device_id for 
flavor/alias, but we can extend it to cover such extra information since now 
it's a API.

- we would need to add a config param on the control host to decide which flags 
to group on when doing the stats (and they would additionally be the only 
params that would work for flavors, I think)
[yjiang5] Agree. And this is achievable because we switch the flavor to be API, 
then we can control the flavor creation process.

OK - so if this is good then I think the question is how we could change the 
'pci_whitelist' parameter we have - which, as you say, should either *only* do 
whitelisting or be renamed - to allow us to add information.  Yongli has 
something along those lines but it's not flexible and it distinguishes poorly 
between which bits are extra information and which bits are matching 
expressions (and it's still called pci_whitelist) - but even with those 
criticisms it's very close to what we're talking about.  When we have that I 
think a lot of the rest of the arguments should simply resolve themselves.

[yjiang5_1] The reason that not easy to find a flexible/distinguishable change 
to pci_whitelist is because it combined two things. So a stupid/naive solution 
in my head is, change it to VERY generic name, 'pci_devices_information', and 
change schema as an array of {'devices_property'=regex exp, 'group_name' = 
'g1'} dictionary, and the device_property expression can be 'address ==xxx, 
vendor_id == xxx' (i.e. similar with current white list),  and we can squeeze 
more into the "pci_devices_information" in future, like 'network_information' = 
xxx or "Neutron specific information" you required in previous mail. All keys 
other than 'device_property' becomes extra information, i.e. software defined 
property. These extra information will be carried with the PCI devices,. Some 
implementation details, A)we can limit

[openstack-dev] The extra_resource in compute node object

2014-01-10 Thread Jiang, Yunhong
Hi, Paul/Dan
For the extra_resource (refer to Dan's comments in 
https://review.openstack.org/#/c/60258/ for more information), I created a 
patch set 
https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:extra_resources,n,z
 and want to get some feedback.

This patch set makes the extra_resources a list of object, instead of 
opaque json string. How do you think about that?

However, the compute resource object is different with current 
NovaObject, a) it has no corresponding table, but just a field in another 
table, and I assume it will have no save/update functions. b) it defines the 
functions for the object like alloc/free etc. Not sure if this is correct 
direction.

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Jiang, Yunhong
Ian, thanks for your reply. Please check comments prefix with [yjiang5].

Thanks
--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 12:17 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hey Yunhong,

The thing about 'group' and 'flavor' and 'whitelist' is that they once meant 
distinct things (and I think we've been trying to reduce them back from three 
things to two or one):

- group: equivalent devices at a host level - use any one, no-one will care, 
because they're either identical or as near as makes no difference
- flavor: equivalent devices to an end user - we may re-evaluate our offerings 
and group them differently on the fly
- whitelist: either 'something to match the devices you may assign' 
(originally) or 'something to match the devices you may assign *and* put them 
in the group (in the group proposal)

[yjiang5] Really thanks for the summary and it is quite clear. So what's the 
object of "equivalent devices at host level"? Because 'equivalent device * to 
an end user *" is flavor, so is it 'equivalent to *scheduler*" or 'equivalent 
to *xxx*'? If equivalent to scheduler, then I'd take the pci_stats as a 
flexible group for scheduler, and I'd think 'equivalent for scheduler' as a 
restriction for 'equivalent to end user' because of performance issue, 
otherwise, it's needless.   Secondly, for your definition of 'whitelist', I'm 
hesitate to your '*and*' because IMHO, 'and' means mixed two things together, 
otherwise, we can state in simply one sentence. For example, I prefer to have 
another configuration option to define the 'put devices in the group', or, if 
we extend it , be "define extra information like 'group name' for devices".

Bearing in mind what you said about scheduling, and if we skip 'group' for a 
moment, then can I suggest (or possibly restate, because your comments are 
pointing in this direction):
- we allow extra information to be added at what is now the whitelisting stage, 
that just gets carried around with the device
[yjiang5] For 'added at ... whitelisting stage', see my above statement about 
the configuration. However, if you do want to use whitelist, I'm ok, but please 
keep in mind that it's two functionality combined: device you may assign *and* 
the group name for these devices.

- when we're turning devices into flavors, we can also match on that extra 
information if we want (which means we can tag up the devices on the compute 
node if we like, according to taste, and then bundle them up by tag to make 
flavors; or we can add Neutron specific information and ignore it when making 
flavors)
[yjiang5] Agree. Currently we can only use vendor_id and device_id for 
flavor/alias, but we can extend it to cover such extra information since now 
it's a API.

- we would need to add a config param on the control host to decide which flags 
to group on when doing the stats (and they would additionally be the only 
params that would work for flavors, I think)
[yjiang5] Agree. And this is achievable because we switch the flavor to be API, 
then we can control the flavor creation process.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Jiang, Yunhong
Brian, the issue of 'class name' is because currently the libvirt does not 
provide such information, otherwise we are glad to add that :(
But this is a good point and we have considered already. One solution is to 
retrieve it through some code like read the configuration space directly. But 
that's not so easy especially considering the different platform has different 
method to get the configuration space. A workaround (at least in first step) is 
to use the user defined property, so that user can define it through 
configuration space.

The issue to udev is, it's linux specific, and it may even various in different 
distribution.

Thanks
--jyh

From: Brian Schott [mailto:brian.sch...@nimbisservices.com]
Sent: Thursday, January 09, 2014 11:19 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Ian,

The idea of pci flavors is a great and using vendor_id and product_id make 
sense, but I could see a case for adding the class name such as 'VGA compatible 
controller'. Otherwise, slightly different generations of hardware will mean 
custom whitelist setups on each compute node.

01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)

On the flip side, vendor_id and product_id might not be sufficient.  Suppose I 
have two identical NICs, one for nova internal use and the second for guest 
tenants?  So, bus numbering may be required.

01:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation G71 [GeForce 7900 GTX] 
(rev a1)

Some possible combinations:

# take 2 gpus
pci_passthrough_whitelist=[
 { "vendor_id":"NVIDIA Corporation G71","product_id":"GeForce 7900 GTX", 
"name":"GPU"},
]

# only take the GPU on PCI 2
pci_passthrough_whitelist=[
 { "vendor_id":"NVIDIA Corporation G71","product_id":"GeForce 7900 GTX", 
'bus_id': '02:', "name":"GPU"},
]
pci_passthrough_whitelist=[
 {"bus_id": "01:00.0", "name": "GPU"},
 {"bus_id": "02:00.0", "name": "GPU"},
]

pci_passthrough_whitelist=[
 {"class": "VGA compatible controller", "name": "GPU"},
]

pci_passthrough_whitelist=[
 { "product_id":"GeForce 7900 GTX", "name":"GPU"},
]

I know you guys are thinking of PCI devices, but any though of mapping to 
something like udev rather than pci?  Supporting udev rules might be easier and 
more robust rather than making something up.

Brian

-
Brian Schott, CTO
Nimbis Services, Inc.
brian.sch...@nimbisservices.com
ph: 443-274-6064  fx: 443-274-6060



On Jan 9, 2014, at 12:47 PM, Ian Wells 
mailto:ijw.ubu...@cack.org.uk>> wrote:


I think I'm in agreement with all of this.  Nice summary, Robert.
It may not be where the work ends, but if we could get this done the rest is 
just refinement.

On 9 January 2014 17:49, Robert Li (baoli) 
mailto:ba...@cisco.com>> wrote:

Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
"vendor_id":"","product_id":""}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={"vendor_id":"", "product_id":"", "name":"str"}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:"pci_passthrough:alias"="name:count"

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"", 
"name":"str"}]

By doing so, we eliminated the PCI alias. And we call the "name" in above as a 
PCI group name. You can think of it as comb

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-10 Thread Jiang, Yunhong
Ian, thanks for your reply. Please check my response prefix with 'yjiang5'.

--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Friday, January 10, 2014 4:08 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

On 10 January 2014 07:40, Jiang, Yunhong 
mailto:yunhong.ji...@intel.com>> wrote:
Robert, sorry that I'm not fan of * your group * term. To me, *your group" 
mixed two thing. It's an extra property provided by configuration, and also 
it's a very-not-flexible mechanism to select devices (you can only select 
devices based on the 'group name' property).

It is exactly that.  It's 0 new config items, 0 new APIs, just an extra tag on 
the whitelists that are already there (although the proposal suggests changing 
the name of them to be more descriptive of what they now do).  And you talk 
about flexibility as if this changes frequently, but in fact the grouping / 
aliasing of devices almost never changes after installation, which is, not 
coincidentally, when the config on the compute nodes gets set up.

1)   A dynamic group is much better. For example, user may want to select 
GPU device based on vendor id, or based on vendor_id+device_id. In another 
word, user want to create group based on vendor_id, or vendor_id+device_id and 
select devices from these group.  John's proposal is very good, to provide an 
API to create the PCI flavor(or alias). I prefer flavor because it's more 
openstack style.
I disagree with this.  I agree that what you're saying offers a more 
flexibilibility after initial installation but I have various issues with it.
[yjiang5] I think you talking is mostly about white list, instead of PCI 
flavor. PCI flavor is more about PCI request, like I want to have a device with 
"vendor_id = cisco, device_id= 15454E", or 'vendor_id=intel device_class=nic' , 
( because the image have the driver for all Intel NIC card :)  ). While 
whitelist is to decide the device that is assignable in a host.
"

This is directly related to the hardware configuation on each compute node.  
For (some) other things of this nature, like provider networks, the compute 
node is the only thing that knows what it has attached to it, and it is the 
store (in configuration) of that information.  If I add a new compute node then 
it's my responsibility to configure it correctly on attachment, but when I add 
a compute node (when I'm setting the cluster up, or sometime later on) then 
it's at that precise point that I know how I've attached it and what hardware 
it's got on it.  Also, it's at this that point in time that I write out the 
configuration file (not by hand, note; there's almost certainly automation when 
configuring hundreds of nodes so arguments that 'if I'm writing hundreds of 
config files one will be wrong' are moot).

I'm also not sure there's much reason to change the available devices 
dynamically after that, since that's normally an activity that results from 
changing the physical setup of the machine which implies that actually you're 
going to have access to and be able to change the config as you do it.  John 
did come up with one case where you might be trying to remove old GPUs from 
circulation, but it's a very uncommon case that doesn't seem worth coding for, 
and it's still achievable by changing the config and restarting the compute 
processes.
[yjiag5] I totally agree with you that whitelist is static defined when 
provision. I just want to separate the information of 'provider network' to 
another configuration (like extra information). Whitelist is just white list to 
decide the device assignable. The provider network is information of the 
device, it's not in the scope of the white list.
This also reduces the autonomy of the compute node in favour of centralised 
tracking, which goes against the 'distributed where possible' philosophy of 
Openstack.
Finally, you're not actually removing configuration from the compute node.  You 
still have to configure a whitelist there; in the grouping design you also have 
to configure grouping (flavouring) on the control node as well.  The groups 
proposal adds one extra piece of information to the whitelists that are already 
there to mark groups, not a whole new set of config lines.
[yjiang5] Still, while list is to decide the device assignable, not to provide 
device information. We should mixed functionality to the configuration. If it's 
ok, I simply want to discard the 'group' term :) The nova PCI flow is simple, 
compute node provide PCI device (based on white list), the scheduler track the 
PCI device information (abstracted as pci_stats for performance issue), the API 
provide method that user specify the de

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Jiang, Yunhong
BTW, I like the PCI flavor :)

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Thursday, January 09, 2014 10:41 PM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi, Ian, when you in aggrement with all of this, do you agree with the 'group 
name', or agree with John's pci flavor?
I'm against the PCI group and will send out a reply later.

--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Thursday, January 09, 2014 9:47 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

I think I'm in agreement with all of this.  Nice summary, Robert.
It may not be where the work ends, but if we could get this done the rest is 
just refinement.

On 9 January 2014 17:49, Robert Li (baoli) 
mailto:ba...@cisco.com>> wrote:
Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
"vendor_id":"","product_id":""}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={"vendor_id":"", "product_id":"", "name":"str"}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:"pci_passthrough:alias"="name:count"

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"", 
"name":"str"}]

By doing so, we eliminated the PCI alias. And we call the "name" in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the -nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.

Further, we are saying that we can define default PCI groups based on the PCI 
device's class.

For vnic-type (or nic-type), we are saying that it defines the link 
characteristics of the nic that is attached to a VM: a nic that's connected to 
a virtual switch, a nic that is connected to a physical switch, or a nic that 
is connecte

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Jiang, Yunhong
Hi, Ian, when you in aggrement with all of this, do you agree with the 'group 
name', or agree with John's pci flavor?
I'm against the PCI group and will send out a reply later.

--jyh

From: Ian Wells [mailto:ijw.ubu...@cack.org.uk]
Sent: Thursday, January 09, 2014 9:47 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

I think I'm in agreement with all of this.  Nice summary, Robert.
It may not be where the work ends, but if we could get this done the rest is 
just refinement.

On 9 January 2014 17:49, Robert Li (baoli) 
mailto:ba...@cisco.com>> wrote:
Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
"vendor_id":"","product_id":""}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={"vendor_id":"", "product_id":"", "name":"str"}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:"pci_passthrough:alias"="name:count"

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"", 
"name":"str"}]

By doing so, we eliminated the PCI alias. And we call the "name" in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simplified
 * provisioning is simplified by eliminating the PCI alias
 * a compute node only needs to report stats with something like: PCI 
group name:count. A compute node processes all the PCI passthrough devices 
against the whitelist, and assign a PCI group based on the whitelist definition.
 * on the controller, we may only need to define the PCI group names. 
if we use a nova api to define PCI groups (could be private or public, for 
example), one potential benefit, among other things (validation, etc),  they 
can be owned by the tenant that creates them. And thus a wholesale of PCI 
passthrough devices is also possible.
 * scheduler only works with PCI group names.
 * request for PCI passthrough device is based on PCI-group
 * deployers can provision the cloud based on the PCI groups
 * Particularly for SRIOV, deployers can design SRIOV PCI groups based 
on network connectivities.

Further, to support SRIOV, we are saying that PCI group names not only can be 
used in the extra specs, it can also be used in the -nic option and the neutron 
commands. This allows the most flexibilities and functionalities afforded by 
SRIOV.

Further, we are saying that we can define default PCI groups based on the PCI 
device's class.

For vnic-type (or nic-type), we are saying that it defines the link 
characteristics of the nic that is attached to a VM: a nic that's connected to 
a virtual switch, a nic that is connected to a physical switch, or a nic that 
is connected to a physical switch, but has a host macvtap device in between. 
The actual names of the choices are not important here, and can be debated.

I'm hoping that we can go over the above on Monday. But any comments are 
welcome by email.

Thanks,
Robert


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.open

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2014-01-09 Thread Jiang, Yunhong
Robert, sorry that I'm not fan of * your group * term. To me, *your group" 
mixed two thing. It's an extra property provided by configuration, and also 
it's a very-not-flexible mechanism to select devices (you can only select 
devices based on the 'group name' property).


1)   A dynamic group is much better. For example, user may want to select 
GPU device based on vendor id, or based on vendor_id+device_id. In another 
word, user want to create group based on vendor_id, or vendor_id+device_id and 
select devices from these group.  John's proposal is very good, to provide an 
API to create the PCI flavor(or alias). I prefer flavor because it's more 
openstack style.



2)   As for the second thing of your 'group', I'd understand it as an extra 
property provided by configuration.  I don't think we should put it into the 
white list, which is to configure devices that are assignable.  I'd add another 
configuration option to provide extra attribute to devices. When nova compute 
is up, it will parse this configuration and add them to the corresponding PCI 
devices. I don't think adding another configuration will cause too many trouble 
to deployment. Openstack already have a lot of configuration items :)



3)   I think currently we mixed the neutron and nova design. To me, Neutron 
SRIOV support is a user of nova PCI support. Thus we should firstly analysis 
the requirement from neutron PCI support to nova PCI support in a more generic  
way, and then, we can discuss how we enhance the nova PCI support, or, if you 
want, re-design the nova PCI support. IMHO, if don't consider network, current 
implementation should be ok.



4)   IMHO, the core for nova PCI support is *PCI property*. The property 
means not only generic PCI devices like vendor id, device id, device type, 
compute specific property like BDF address, the adjacent switch IP address,  
but also user defined property like nuertron's physical net name etc. And then, 
it's about how to get these property, how to select/group devices based on the 
property, how to store/fetch these properties.



Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Thursday, January 09, 2014 8:49 AM
To: OpenStack Development Mailing List (not for usage questions); Irena 
Berezovsky; Sandhya Dasu (sadasu); Jiang, Yunhong; Itzik Brown; 
j...@johngarbutt.com; He, Yongli
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Folks,

With John joining the IRC, so far, we had a couple of productive meetings in an 
effort to come to consensus and move forward. Thanks John for doing that, and I 
appreciate everyone's effort to make it to the daily meeting. Let's reconvene 
on Monday.

But before that, and based on our today's conversation on IRC, I'd like to say 
a few things. I think that first of all, we need to get agreement on the 
terminologies that we are using so far. With the current nova PCI passthrough

PCI whitelist: defines all the available PCI passthrough devices on a 
compute node. pci_passthrough_whitelist=[{ 
"vendor_id":"","product_id":""}]
PCI Alias: criteria defined on the controller node with which requested 
PCI passthrough devices can be selected from all the PCI passthrough devices 
available in a cloud.
Currently it has the following format: 
pci_alias={"vendor_id":"", "product_id":"", "name":"str"}

nova flavor extra_specs: request for PCI passthrough devices can be 
specified with extra_specs in the format for 
example:"pci_passthrough:alias"="name:count"

As you can see, currently a PCI alias has a name and is defined on the 
controller. The implications for it is that when matching it against the PCI 
devices, it has to match the vendor_id and product_id against all the available 
PCI devices until one is found. The name is only used for reference in the 
extra_specs. On the other hand, the whitelist is basically the same as the 
alias without a name.

What we have discussed so far is based on something called PCI groups (or PCI 
flavors as Yongli puts it). Without introducing other complexities, and with a 
little change of the above representation, we will have something like:

pci_passthrough_whitelist=[{ "vendor_id":"","product_id":"", 
"name":"str"}]

By doing so, we eliminated the PCI alias. And we call the "name" in above as a 
PCI group name. You can think of it as combining the definitions of the 
existing whitelist and PCI alias. And believe it or not, a PCI group is 
actually a PCI alias. However, with that change of thinking, a lot of benefits 
can be harvested:

 * the implementation is significantly simpli

Re: [openstack-dev] [nova][object] One question to the resource tracker session

2013-11-15 Thread Jiang, Yunhong
I have no particular part thus any part that need help is ok for me.

My own purpose is to support dynamic resource claim to support live migration 
to instance with hardware allocated. For instance with device assigned (PCI 
device, USB or everything), we can't migrate it unless it's unplugged.

Also currently the resource tracker is more than tracker, it in fact also 
update host to the instance, which I assume should be done by conducto. It even 
create the migration object, which I  think should also be created by 
conductor. IIUC, the reason is to keep the atomic operation to avoid race with 
the audit. 

Combine the above two, I'm considering if we can change current resource 
tracker. Instead of passing the instance/flavor, a new object, resource 
requirement is used.  The process is: the conductor calculate the resource 
requirement for the instance on both building and resize , and passing it to 
the resource tracker, the resource tracker will add the hypervisor overhead, 
and then claim it. If the claim is success, the resource tracker save this 
resource requirement to the database.

For build time, this should work well. For resize, this should also work, only 
that the instance will have two resource requirement. And there will be no race 
condition here. Other than that, another benefit is, the resource tracker don't 
need care for the flavor anymore. I'm not sure if we can totally remove the 
new_/old_ flavor information in system_metadata, but we can move the key user 
of it.

Your opinion?

Thanks
--jyh

> -Original Message-
> From: Murray, Paul (HP Cloud Services) [mailto:pmur...@hp.com]
> Sent: Friday, November 15, 2013 6:55 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova][object] One question to the resource
> tracker session
> 
> I was leading that session and put the comment there - sorry it has lead to
> confusion - I'll add something to make it clear.
> 
> I'm actually drafting the bp at the moment - probably going to split some of
> the tasks up into different bps (at the suggestion of Dan and Russell).
> 
> Is there a particular part you were interested in?
> 
> -Original Message-
> From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> Sent: 14 November 2013 18:20
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova][object] One question to the resource
> tracker session
> 
> 
> 
> > -Original Message-
> > From: Andrew Laski [mailto:andrew.la...@rackspace.com]
> > Sent: Thursday, November 14, 2013 10:02 AM
> > To: OpenStack Development Mailing List (not for usage questions)
> > Subject: Re: [openstack-dev] [nova][object] One question to the
> > resource tracker session
> >
> > On 11/14/13 at 05:37pm, Jiang, Yunhong wrote:
> > >
> > >> -Original Message-
> > >> From: Andrew Laski [mailto:andrew.la...@rackspace.com]
> > >> Sent: Wednesday, November 13, 2013 3:22 PM
> > >> To: OpenStack Development Mailing List (not for usage questions)
> > >> Subject: Re: [openstack-dev] [nova][object] One question to the
> > resource
> > >> tracker session
> > >>
> > >> On 11/13/13 at 11:12pm, Jiang, Yunhong wrote:
> > >> >Hi, Dan Smith and all,
> > >> >I noticed followed statement in 'Icehouse tasks' in
> > >>
> >
> https://etherpad.openstack.org/p/IcehouseNovaExtensibleSchedulerMetr
> > >> ics
> > >> >
> > >> >convert resource tracker to objects
> > >> >make resoruce tracker extensible
> > >> >no db migrations ever again!!
> > >> >extra specs to cover resources - use a name space
> > >> >
> > >> >How is it planned to achieve the 'no db migrations ever again'?
> > Even
> > >> with the object, we still need keep resource information in database.
> > And
> > >> when new resource type added, we either add a new column to the
> > table.
> > >> Or it means we merge all resource information into a single column
> > >> as
> > json
> > >> string and parse it in the resource tracker object?.
> > >>
> > >> You're right, it's not really achievable without moving to a
> > >> schemaless persistence model.  I'm fairly certain it was added to
> > >> be humorous and should not be considered an outcome of that
> session.
> > >
> > >Andrew, thanks for the explanation. Not sure anyo

Re: [openstack-dev] [nova][api] Is this a potential issue

2013-11-15 Thread Jiang, Yunhong


> -Original Message-
> From: Dan Smith [mailto:d...@danplanet.com]
> Sent: Friday, November 15, 2013 7:30 AM
> To: OpenStack Development Mailing List (not for usage questions);
> isaku.yamah...@gmail.com
> Subject: Re: [openstack-dev] [nova][api] Is this a potential issue
> 
> > You're not missing anything.  But I think that's a bug, or at least an
> > unexpected change in behaviour from how it used to work.  If you
> follow
> > instance_update() in nova.db.sqlalchemy.api just the presence of
> > expected_task_state triggers the check.  So we may need to find a way
> to
> > pass that through with the save method.
> 
> This came up recently. We decided that since we no longer have a kwargs
> dictionary to test for the presence or absence of that flag, that we
> would require setting it to a tuple, which is already supported for
> allowing multiple state possibilities. So, if you pass
> expected_task_state=(None,) then it will do the right thing.
> 
> Make sense?

This should work, although I'm not sure if it's so clean. I will cook a patch 
for it.

--jyh

> 
> --Dan
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][api] Is this a potential issue

2013-11-14 Thread Jiang, Yunhong
> 
> The migration shouldn't end up being set to 'reverting' twice because of
> the expected_task_state set and check in
> instance.save(expected_task_state=None).  The quota reservation could
> happen twice, so a rollback in the case of a failure in instance.save
> could be good.

A patch uploaded to fix this small windows at 
https://review.openstack.org/#/c/56288/ .

--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][object] One question to the resource tracker session

2013-11-14 Thread Jiang, Yunhong


> -Original Message-
> From: Dan Smith [mailto:d...@danplanet.com]
> Sent: Thursday, November 14, 2013 10:43 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova][object] One question to the resource
> tracker session
> 
> >> You're right, it's not really achievable without moving to a schemaless
> >> persistence model.  I'm fairly certain it was added to be humorous and
> >> should not be considered an outcome of that session.
> >
> > But we can avoid most data migrations by adding any required
> > conversion code into the objects DB layer, once we start using it. But
> > it might not be what we want.
> 
> Right, I'm sure it was added to the notes in response to discussion in
> the room about hating migrations in general. We can't avoid them
> entirely, but we do need to start moving away from making them
> monolithic and scary. This is important not only for performance when
> running migrations on large databases, but also for live upgrade. The
> general agreement from the summit sessions was that we need to make
> conductor able to tolerate all the schema versions from N-1 to N (where
> N is a release) so that it can be upgraded first, the schema next, and
> then (when possible) data migrations happen live instead of in bulk.

Thanks for clarification and seems very promising.
I assume this back support include also object and the N means major release 
like H, I, right?

--jyh



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][object] One question to the resource tracker session

2013-11-14 Thread Jiang, Yunhong


> -Original Message-
> From: Andrew Laski [mailto:andrew.la...@rackspace.com]
> Sent: Thursday, November 14, 2013 10:02 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova][object] One question to the resource
> tracker session
> 
> On 11/14/13 at 05:37pm, Jiang, Yunhong wrote:
> >
> >> -Original Message-
> >> From: Andrew Laski [mailto:andrew.la...@rackspace.com]
> >> Sent: Wednesday, November 13, 2013 3:22 PM
> >> To: OpenStack Development Mailing List (not for usage questions)
> >> Subject: Re: [openstack-dev] [nova][object] One question to the
> resource
> >> tracker session
> >>
> >> On 11/13/13 at 11:12pm, Jiang, Yunhong wrote:
> >> >Hi, Dan Smith and all,
> >> >  I noticed followed statement in 'Icehouse tasks' in
> >>
> https://etherpad.openstack.org/p/IcehouseNovaExtensibleSchedulerMetr
> >> ics
> >> >
> >> >  convert resource tracker to objects
> >> >  make resoruce tracker extensible
> >> >  no db migrations ever again!!
> >> >  extra specs to cover resources - use a name space
> >> >
> >> >  How is it planned to achieve the 'no db migrations ever again'?
> Even
> >> with the object, we still need keep resource information in database.
> And
> >> when new resource type added, we either add a new column to the
> table.
> >> Or it means we merge all resource information into a single column as
> json
> >> string and parse it in the resource tracker object?.
> >>
> >> You're right, it's not really achievable without moving to a schemaless
> >> persistence model.  I'm fairly certain it was added to be humorous and
> >> should not be considered an outcome of that session.
> >
> >Andrew, thanks for the explanation. Not sure anyone have interests on
> this task, otherwise I will take it.
> 
> There is a blueprint for part of this from Paul Murray,
> https://blueprints.launchpad.net/nova/+spec/make-resource-tracker-use-
> objects.
> So you could coordinate the work if you're interested.

Yes, just noticed it and the first 2 sponsor. I will keep an eye on it.

--jyh

> 
> >
> >--jyh
> >
> >>
> >> >
> >> >Thanks
> >> >--jyh
> >> >
> >> >___
> >> >OpenStack-dev mailing list
> >> >OpenStack-dev@lists.openstack.org
> >> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >> ___
> >> OpenStack-dev mailing list
> >> OpenStack-dev@lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >___
> >OpenStack-dev mailing list
> >OpenStack-dev@lists.openstack.org
> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][object] One question to the resource tracker session

2013-11-14 Thread Jiang, Yunhong

> -Original Message-
> From: Andrew Laski [mailto:andrew.la...@rackspace.com]
> Sent: Wednesday, November 13, 2013 3:22 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova][object] One question to the resource
> tracker session
> 
> On 11/13/13 at 11:12pm, Jiang, Yunhong wrote:
> >Hi, Dan Smith and all,
> > I noticed followed statement in 'Icehouse tasks' in
> https://etherpad.openstack.org/p/IcehouseNovaExtensibleSchedulerMetr
> ics
> >
> > convert resource tracker to objects
> > make resoruce tracker extensible
> > no db migrations ever again!!
> > extra specs to cover resources - use a name space
> >
> > How is it planned to achieve the 'no db migrations ever again'? Even
> with the object, we still need keep resource information in database. And
> when new resource type added, we either add a new column to the table.
> Or it means we merge all resource information into a single column as json
> string and parse it in the resource tracker object?.
> 
> You're right, it's not really achievable without moving to a schemaless
> persistence model.  I'm fairly certain it was added to be humorous and
> should not be considered an outcome of that session.

Andrew, thanks for the explanation. Not sure anyone have interests on this 
task, otherwise I will take it.

--jyh

> 
> >
> >Thanks
> >--jyh
> >
> >___
> >OpenStack-dev mailing list
> >OpenStack-dev@lists.openstack.org
> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][object] One question to the resource tracker session

2013-11-13 Thread Jiang, Yunhong
Hi, Dan Smith and all, 
I noticed followed statement in 'Icehouse tasks' in  
https://etherpad.openstack.org/p/IcehouseNovaExtensibleSchedulerMetrics 

convert resource tracker to objects
make resoruce tracker extensible
no db migrations ever again!!
extra specs to cover resources - use a name space

How is it planned to achieve the 'no db migrations ever again'? Even 
with the object, we still need keep resource information in database. And when 
new resource type added, we either add a new column to the table. Or it means 
we merge all resource information into a single column as json string and parse 
it in the resource tracker object?.

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Core pinning

2013-11-13 Thread Jiang, Yunhong


> -Original Message-
> From: Chris Friesen [mailto:chris.frie...@windriver.com]
> Sent: Wednesday, November 13, 2013 9:57 AM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [nova] Core pinning
> 
> On 11/13/2013 11:40 AM, Jiang, Yunhong wrote:
> 
> >> But, from performance point of view it is better to exclusively
> >> dedicate PCPUs for VCPUs and emulator. In some cases you may want
> >> to guarantee that only one instance(and its VCPUs) is using certain
> >> PCPUs.  By using core pinning you can optimize instance performance
> >> based on e.g. cache sharing, NUMA topology, interrupt handling, pci
> >> pass through(SR-IOV) in multi socket hosts etc.
> >
> > My 2 cents. When you talking about " performance point of view", are
> > you talking about guest performance, or overall performance? Pin PCPU
> > is sure to benefit guest performance, but possibly not for overall
> > performance, especially if the vCPU is not consume 100% of the CPU
> > resources.
> 
> It can actually be both.  If a guest has several virtual cores that both
> access the same memory, it can be highly beneficial all around if all
> the memory/cpus for that guest come from a single NUMA node on the
> host.
>   That way you reduce the cross-NUMA-node memory traffic, increasing
> overall efficiency.  Alternately, if a guest has several cores that use
> lots of memory bandwidth but don't access the same data, you might want
> to ensure that the cores are on different NUMA nodes to equalize
> utilization of the different NUMA nodes.

I think the Tuomas is talking about " exclusively dedicate PCPUs for VCPUs", in 
that situation, that pCPU can't be shared by other vCPU anymore. If this vCPU 
like cost only 50% of the PCPU usage, it's sure to be a waste of the overall 
performance. 

As to the cross NUMA node access, I'd let hypervisor, instead of cloud OS, to 
reduce the cross NUMA access as much as possible.

I'm not against such usage, it's sure to be used on data center virtualization. 
Just question if it's for cloud.


> 
> Similarly, once you start talking about doing SR-IOV networking I/O
> passthrough into a guest (for SDN/NFV stuff) for optimum efficiency it
> is beneficial to be able to steer interrupts on the physical host to the
> specific cpus on which the guest will be running.  This implies some
> form of pinning.

Still, I think hypervisor should achieve this, instead of openstack.


> 
> > I think pin CPU is common to data center virtualization, but not sure
> > if it's in scope of cloud, which provide computing power, not
> > hardware resources.
> >
> > And I think part of your purpose can be achieved through
> > https://wiki.openstack.org/wiki/CPUEntitlement and
> > https://wiki.openstack.org/wiki/InstanceResourceQuota . Especially I
> > hope a well implemented hypervisor will avoid needless vcpu migration
> > if the vcpu is very busy and required most of the pCPU's computing
> > capability (I knew Xen used to have some issue in the scheduler to
> > cause frequent vCPU migration long before).
> 
> I'm not sure the above stuff can be done with those.  It's not just
> about quantity of resources, but also about which specific resources
> will be used so that other things can be done based on that knowledge.

With the above stuff, it ensure the QoS and the compute capability for the 
guest, I think.

--jyh
 
> 
> Chris
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Core pinning

2013-11-13 Thread Jiang, Yunhong


> -Original Message-
> From: Tuomas Paappanen [mailto:tuomas.paappa...@tieto.com]
> Sent: Wednesday, November 13, 2013 4:46 AM
> To: openstack-dev@lists.openstack.org
> Subject: [openstack-dev] [nova] Core pinning
> 
> Hi all,
> 
> I would like to hear your thoughts about core pinning in Openstack.
> Currently nova(with qemu-kvm) supports usage of cpu set of PCPUs what
> can be used by instances. I didn't find blueprint, but I think this
> feature is for isolate cpus used by host from cpus used by
> instances(VCPUs).
> 
> But, from performance point of view it is better to exclusively dedicate
> PCPUs for VCPUs and emulator. In some cases you may want to guarantee
> that only one instance(and its VCPUs) is using certain PCPUs.  By using
> core pinning you can optimize instance performance based on e.g. cache
> sharing, NUMA topology, interrupt handling, pci pass through(SR-IOV) in
> multi socket hosts etc.

My 2 cents.
When you talking about " performance point of view", are you talking about 
guest performance, or overall performance? Pin PCPU is sure to benefit guest 
performance, but possibly not for overall performance, especially if the vCPU 
is not consume 100% of the CPU resources. 

I think pin CPU is common to data center virtualization, but not sure if it's 
in scope of cloud, which provide computing power, not hardware resources.

And I think part of your purpose can be achieved through 
https://wiki.openstack.org/wiki/CPUEntitlement and 
https://wiki.openstack.org/wiki/InstanceResourceQuota . Especially I hope a 
well implemented hypervisor will avoid needless vcpu migration if the vcpu is 
very busy and required most of the pCPU's computing capability (I knew Xen used 
to have some issue in the scheduler to cause frequent vCPU migration long 
before).

--jyh


> 
> We have already implemented feature like this(PoC with limitations) to
> Nova Grizzly version and would like to hear your opinion about it.
> 
> The current implementation consists of three main parts:
> - Definition of pcpu-vcpu maps for instances and instance spawning
> - (optional) Compute resource and capability advertising including free
> pcpus and NUMA topology.
> - (optional) Scheduling based on free cpus and NUMA topology.
> 
> The implementation is quite simple:
> 
> (additional/optional parts)
> Nova-computes are advertising free pcpus and NUMA topology in same
> manner than host capabilities. Instances are scheduled based on this
> information.
> 
> (core pinning)
> admin can set PCPUs for VCPUs and for emulator process, or select NUMA
> cell for instance vcpus, by adding key:value pairs to flavor's extra specs.
> 
> EXAMPLE:
> instance has 4 vcpus
> :
> vcpus:1,2,3,4 --> vcpu0 pinned to pcpu1, vcpu1 pinned to pcpu2...
> emulator:5 --> emulator pinned to pcpu5
> or
> numacell:0 --> all vcpus are pinned to pcpus in numa cell 0.
> 
> In nova-compute, core pinning information is read from extra specs and
> added to domain xml same way as cpu quota values(cputune).
> 
> 
>
>
>
>
>
> 
> 
> What do you think? Implementation alternatives? Is this worth of
> blueprint? All related comments are welcome!
> 
> Regards,
> Tuomas
> 
> 
> 
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Custom Flavor creation through Heat

2013-11-12 Thread Jiang, Yunhong


> -Original Message-
> From: Shawn Hartsock [mailto:hartso...@vmware.com]
> Sent: Tuesday, November 12, 2013 12:56 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] [heat] Custom Flavor creation through
> Heat
> 
> My concern with proliferating custom flavors is that it might play havoc
> with the underlying root-case for flavors.
> 
> My understanding of flavors is that they are used to solve the resource
> packing problem in elastic cloud scenarios. That way you know that 256
> "tiny" VMs fit cleanly into your hardware layout and so do 128 "medium"
> VMs and 64 large VMs. If you allow "flavor of the week" then the packing
> problem re-asserts itself and scheduling becomes harder.

I'm a bit surprised that the flavor is used to resolve the packing problem. I 
thought it should be handled by scheduler, although it's a NP problem.

As for custom flavor, I think at least it's against current nova assumption. 
Currently nova assume flavor should only be created by admin, who knows the 
cloud quite well.
One example is, flavor may contain extra-spec, so if an extra-spec value is 
specified in the flavor, while the corresponding scheduler filter is not 
enabled, then the extra-spec has no effect and may cause issue.

--jyh

> 
> Do I understand this right?
> 
> Given the root-use-case is to help solve VM packing problems, I would
> think that you could allow a "nonsense flavor" that would say: the Image
> provides sizing hints beyond flavors. So you would toggle a VM to be
> "nonsense flavor" and trigger different scheduling, packing, allocation
> behaviors.
> 
> tl;dr - I think that breaks flavors, but I think you should allow it by 
> allowing
> a cloud to escape flavors all together if they want.
> 
> # Shawn Hartsock
> 
> 
> - Original Message -
> > From: "Steve Baker" 
> > To: openstack-dev@lists.openstack.org
> > Sent: Tuesday, November 12, 2013 2:25:23 PM
> > Subject: Re: [openstack-dev] [nova] [heat] Custom Flavor creation
> through Heat
> >
> > On 11/13/2013 07:50 AM, Steven Dake wrote:
> > > On 11/12/2013 10:25 AM, Kodam, Vijayakumar (EXT-Tata Consultancy
> Ser -
> > > FI/Espoo) wrote:
> > >> Hi,
> > >>
> > >> In Telecom Cloud applications, the requirements for every application
> > >> are different. One application might need 10 CPUs, 10GB RAM and no
> > >> disk. Another application might need 1 CPU, 512MB RAM and 100GB
> Disk.
> > >> This varied requirements directly affects the flavors which need to
> > >> be created for different applications (virtual instances). Customer
> > >> has his own custom requirements for CPU, RAM and other hardware
> > >> requirements. So, based on the requests from the customers, we
> > >> believe that the flavor creation should be done along with the
> > >> instance creation, just before the instance is created. Most of the
> > >> flavors will be specific to that application and therefore will not
> > >> be suitable by other instances.
> > >>
> > >> The obvious way is to allow users to create flavors and boot
> > >> customized instances through Heat. As of now, users can launch
> > >> instances through heat along with predefined nova flavors only. We
> > >> have made some changes in our setup and tested it. This change
> allows
> > >> creation of customized nova flavors using heat templates. We are also
> > >> using extra-specs in the flavors for use in our private cloud
> deployment.
> > >> This gives an option to the user to mention custom requirements for
> > >> the flavor in the heat template directly along with the instance
> > >> details. There is one problem in the nova flavor creation using heat
> > >> templates. Admin privileges are required to create nova flavors.
> > >> There should be a way to allow a normal user to create flavors.
> > >>
> > >> Your comments and suggestions are most welcome on how to handle
> this
> > >> problem !!!
> > >>
> > >> Regards,
> > >> Vijaykumar Kodam
> > >>
> > > Vjaykumar,
> > >
> > > I have long believed that an OS::Nova::Flavor resource would make a
> > > good addition to Heat, but as you pointed out, this type of resource
> > > requires administrative priveleges.  I generally also believe it is
> > > bad policy to implement resources that *require* admin privs to
> > > operate, because that results in yet more resources that require
> > > admin.  We are currently solving the IAM user cases (keystone
> doesn't
> > > allow the creation of users without admin privs).
> > >
> > > It makes sense that cloud deployers would want to control who could
> > > create flavors to avoid DOS attacks against their inrastructure or
> > > prevent trusted users from creating a wacky flavor that the physical
> > > infrastructure can't support.  I'm unclear if nova offers a way to
> > > reduce permissions required for flavor creation.  One option that
> may
> > > be possible is via the keystone trusts mechanism.
> > >
> > > Steve Hardy did most of the work integrating He

Re: [openstack-dev] [nova][api] Is this a potential issue

2013-11-12 Thread Jiang, Yunhong


> -Original Message-
> From: Andrew Laski [mailto:andrew.la...@rackspace.com]
> Sent: Tuesday, November 12, 2013 12:07 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova][api] Is this a potential issue
> 
> On 11/11/13 at 05:27pm, Jiang, Yunhong wrote:
> >Resend after the HK summit, hope someone can give me hint on it.
> >
> >Thanks
> >--jyh
> >
> >> -Original Message-
> >> From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> >> Sent: Thursday, November 07, 2013 5:39 PM
> >> To: openstack-dev@lists.openstack.org
> >> Subject: [openstack-dev] [nova][api] Is this a potential issue
> >>
> >> Hi, all
> >>I'm a bit confused of followed code in ./compute/api.py, which will be
> >> invoked by api/openstack/compute/servers.py,
> _action_revert_resize().
> >>From the code seems there is a small windows between get the
> >> migration object and update migration.status. If another API request
> >> comes at this small window, it means two utility will try to revert resize
> at
> >> same time. Is this a potential issue?
> >>Currently implementation already roll back the reservation if
> >> something wrong, but not sure if we should update state to "reverting"
> as
> >> a transaction in get_by_instance_and_status()?
> 
> The migration shouldn't end up being set to 'reverting' twice because of
> the expected_task_state set and check in
> instance.save(expected_task_state=None).  The quota reservation could
> happen twice, so a rollback in the case of a failure in instance.save
> could be good.

Aha, didn't notice that's a guard. It's really cool.

--jyh

> 
> >>
> >> --jyh
> >>
> >> def revert_resize(self, context, instance):
> >> """Reverts a resize, deleting the 'new' instance in the
> process."""
> >> elevated = context.elevated()
> >> migration =
> >> migration_obj.Migration.get_by_instance_and_status(
> >> elevated, instance.uuid, 'finished')
> >>>>>>>>>>>>>>>>>>>>>>>>Here we get the migration object
> >>
> >> # reverse quota reservation for increased resource usage
> >> deltas = self._reverse_upsize_quota_delta(context, migration)
> >> reservations = self._reserve_quota_delta(context, deltas)
> >>
> >> instance.task_state = task_states.RESIZE_REVERTING
> >> instance.save(expected_task_state=None)
> >>
> >> migration.status = 'reverting' 
> >> >>>>>>>>>>>>>>Here
> >> we update the status.
> >> migration.save()
> >>
> >> ___
> >> OpenStack-dev mailing list
> >> OpenStack-dev@lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >___
> >OpenStack-dev mailing list
> >OpenStack-dev@lists.openstack.org
> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][api] Is this a potential issue

2013-11-11 Thread Jiang, Yunhong
Resend after the HK summit, hope someone can give me hint on it.

Thanks
--jyh

> -Original Message-
> From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> Sent: Thursday, November 07, 2013 5:39 PM
> To: openstack-dev@lists.openstack.org
> Subject: [openstack-dev] [nova][api] Is this a potential issue
> 
> Hi, all
>   I'm a bit confused of followed code in ./compute/api.py, which will be
> invoked by api/openstack/compute/servers.py, _action_revert_resize().
>   From the code seems there is a small windows between get the
> migration object and update migration.status. If another API request
> comes at this small window, it means two utility will try to revert resize at
> same time. Is this a potential issue?
>   Currently implementation already roll back the reservation if
> something wrong, but not sure if we should update state to "reverting" as
> a transaction in get_by_instance_and_status()?
> 
> --jyh
> 
> def revert_resize(self, context, instance):
> """Reverts a resize, deleting the 'new' instance in the process."""
> elevated = context.elevated()
> migration =
> migration_obj.Migration.get_by_instance_and_status(
> elevated, instance.uuid, 'finished')
>   >>>>>>>>>>>>>>>>>>>>>>Here we get the migration object
> 
> # reverse quota reservation for increased resource usage
> deltas = self._reverse_upsize_quota_delta(context, migration)
> reservations = self._reserve_quota_delta(context, deltas)
> 
> instance.task_state = task_states.RESIZE_REVERTING
> instance.save(expected_task_state=None)
> 
> migration.status = 'reverting'
> >>>>>>>>>>>>>>Here
> we update the status.
> migration.save()
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][api] Is this a potential issue

2013-11-07 Thread Jiang, Yunhong
Hi, all
I'm a bit confused of followed code in ./compute/api.py, which will be 
invoked by api/openstack/compute/servers.py, _action_revert_resize(). 
From the code seems there is a small windows between get the migration 
object and update migration.status. If another API request comes at this small 
window, it means two utility will try to revert resize at same time. Is this a 
potential issue?
Currently implementation already roll back the reservation if something 
wrong, but not sure if we should update state to "reverting" as a transaction 
in get_by_instance_and_status()?

--jyh

def revert_resize(self, context, instance):
"""Reverts a resize, deleting the 'new' instance in the process."""
elevated = context.elevated()
migration = migration_obj.Migration.get_by_instance_and_status(
elevated, instance.uuid, 'finished')
>>Here we get the migration object

# reverse quota reservation for increased resource usage
deltas = self._reverse_upsize_quota_delta(context, migration)
reservations = self._reserve_quota_delta(context, deltas)

instance.task_state = task_states.RESIZE_REVERTING
instance.save(expected_task_state=None)

migration.status = 'reverting'  >>Here we 
update the status.
migration.save()

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler]The database access in the scheduler filters

2013-11-04 Thread Jiang, Yunhong
I agree that this will have benefit , but how much the benefit is may highly 
depends on the type
of instance created. If most of the instance are normal instance w/o any 
special requirement, we will have no benefit at all.

Thanks
--jyh

> -Original Message-
> From: Russell Bryant [mailto:rbry...@redhat.com]
> Sent: Sunday, November 03, 2013 12:12 AM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [nova][scheduler]The database access in the
> scheduler filters
> 
> On 11/01/2013 06:39 AM, Jiang, Yunhong wrote:
> > I noticed several filters (AggregateMultiTenancyIsoaltion, ram_filter,
> type_filter, AggregateInstanceExtraSpecsFilter) have DB access in the
> host_passes(). Some will even access for each invocation.
> >
> > Just curios if this is considered a performance issue? With a 10k nodes,
> 60 VM per node, and 3 hours VM life cycle cloud, it will have more than 1
> million DB access per second. Not a small number IMHO.
> 
> On a somewhat related note, here's an idea that would be pretty easy to
> implement.
> 
> What if we added some optional metadata to scheduler filters to let them
> indicate where in the order of filters they should run?
> 
> The filters you're talking about here we would probably want to run
> last.  Other filters that could potentially efficiently eliminate a
> large number of hosts should be run first.
> 
> --
> Russell Bryant
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler]The database access in the scheduler filters

2013-11-01 Thread Jiang, Yunhong
Aha, right after replied Harsock's mail, I realized I'm correct still. Glad 
that I did graduated from the school :)

--jyh

> -Original Message-
> From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> Sent: Friday, November 01, 2013 10:32 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova][scheduler]The database access in the
> scheduler filters
> 
> As Shawn Hartsock pointed out in the reply, I made a stupid error in the
> calculation. It's in fact 55 access per second, not that big number I
> calculated.
> I thought I graduated from elementary school but seems I'm wrong. Really
> sorry for the stupid error.
> 
> --jyh
> 
> > -Original Message-
> > From: Russell Bryant [mailto:rbry...@redhat.com]
> > Sent: Friday, November 01, 2013 9:18 AM
> > To: openstack-dev@lists.openstack.org
> > Subject: Re: [openstack-dev] [nova][scheduler]The database access in
> the
> > scheduler filters
> >
> > On 11/01/2013 09:09 AM, Andrew Laski wrote:
> > > On 11/01/13 at 10:16am, John Garbutt wrote:
> > >> Its intentional. Cells is there to split up your nodes into more
> > >> manageable chunks.
> > >
> > > I don't think you mean to say that there's intentionally a performance
> > > issue.  But yes there are performance issues with the filter scheduler.
> > > Because I work on a deployment that uses cells to partition the
> workload
> > > I haven't seen them myself, but there are plenty of reports from others
> > > who have encountered them.  And it's easy to run some back of the
> > napkin
> > > calculations like was done below and see that scheduling will require a
> > > lot of resources if there's no partitioning.
> > >
> > >>
> > >> There are quite a few design summit sessions on looking into
> > >> alternative approaches to our current scheduler.
> > >>
> > >> While I would love a single scheduler to make everyone happy, I am
> > >> thinking we might end up with several scheduler, each with slightly
> > >> different properties, and you pick one depending on what you want
> to
> > >> do with your cloud.
> > >
> > > +1.  We have the ability to drop in different schedulers right now, but
> > > there's only one really useful scheduler in the tree.  There has been
> > > talk of making a more performant scheduler which schedules in a
> 'good
> > > enough' fashion through some approximation algorithm.  I would love
> > to
> > > see that get introduced as another scheduler and not as a rework of
> the
> > > filter scheduler.  I suppose the chance scheduler could technically
> > > count for that, but I'm under the impression that it isn't used beyond
> > > testing.
> >
> > Agreed.
> >
> > There's a lot of discussion happening in two different directions, it
> > seems.  One group is very interested in improving the scheduler's
> > ability to make the best decision possible using various policies.
> > Another group is concerned with massive scale and is willing to accept
> > "good enough" scheduling to get there.
> >
> > I think the filter scheduler is pretty reasonable for the best possible
> > decision approach today.  There's some stuff that could perform better.
> >  There's more policy knobs that could be added.  There's the cross
> > service issue to figure out ... but it's not bad.
> >
> > I'm very interested in a new "good enough" scheduler.  I liked the idea
> > of running a bunch of schedulers that each only look at a subset of your
> > infrastructure and pick something that's good enough.  I'm interested to
> > hear other ideas in the session we have on this topic (rethinking
> > scheduler design).
> >
> > Of course, you get a lot of the massive scale benefits by going to
> > cells, too.  If cells is our answer here, I really want to see more
> > people stepping up to help with the cells code.  There are still some
> > feature gaps to fill.  We should also be looking at the road to getting
> > back to only having one way to deploy nova (cells).  Having both cells
> > vs non-cells options really isn't ideal long term.
> >
> > --
> > Russell Bryant
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler]The database access in the scheduler filters

2013-11-01 Thread Jiang, Yunhong
Shawn, yes, there is 56 VM access every second, and for each VM access, the 
scheduler will invoke filter for each host, that means, for each VM access, the 
filter function will be invoked 10k times.
So 56 * 10k = 560k, yes, half of 1M, but still big number.

--jyh

> -Original Message-
> From: Shawn Hartsock [mailto:hartso...@vmware.com]
> Sent: Friday, November 01, 2013 8:20 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova][scheduler]The database access in the
> scheduler filters
> 
> 
> 
> - Original Message -
> > From: "Yunhong Jiang" 
> > To: openstack-dev@lists.openstack.org
> > Sent: Thursday, October 31, 2013 6:39:29 PM
> > Subject: [openstack-dev] [nova][scheduler]The database access in the
>   scheduler filters
> >
> > I noticed several filters (AggregateMultiTenancyIsoaltion, ram_filter,
> > type_filter, AggregateInstanceExtraSpecsFilter) have DB access in the
> > host_passes(). Some will even access for each invocation.
> >
> > Just curios if this is considered a performance issue? With a 10k nodes,
> 60
> > VM per node, and 3 hours VM life cycle cloud, it will have more than 1
> > million DB access per second. Not a small number IMHO.
> >
> > Thanks
> > --jyh
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> 
> Sorry if I'm dumb, but please try to explain things to me. I don't think I
> follow...
> 
> 10k nodes, 60 VM per node... is 600k VM in the whole cloud. A 3 hour life
> cycle for a VM means every hour 1/3 the nodes turn over so 200k VM
> are created/deleted per hour ... divide by 60 for ... 3,333.333 per minute
> or ... divide by 60 for ... 55.5 VM creations/deletions per second ...
> 
> ... did I do that math right? So where's the million DB accesses per second
> come from? Are the rules fired for every VM on every access so that 600k
> VM + 1 new VM means the rules fire 600k + 1 times? What? Sorry... really
> confused.
> 
> # Shawn Hartsock
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler]The database access in the scheduler filters

2013-11-01 Thread Jiang, Yunhong
Yes, you are right .. :(

> -Original Message-
> From: Shawn Hartsock [mailto:hartso...@vmware.com]
> Sent: Friday, November 01, 2013 8:20 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova][scheduler]The database access in the
> scheduler filters
> 
> 
> 
> - Original Message -
> > From: "Yunhong Jiang" 
> > To: openstack-dev@lists.openstack.org
> > Sent: Thursday, October 31, 2013 6:39:29 PM
> > Subject: [openstack-dev] [nova][scheduler]The database access in the
>   scheduler filters
> >
> > I noticed several filters (AggregateMultiTenancyIsoaltion, ram_filter,
> > type_filter, AggregateInstanceExtraSpecsFilter) have DB access in the
> > host_passes(). Some will even access for each invocation.
> >
> > Just curios if this is considered a performance issue? With a 10k nodes,
> 60
> > VM per node, and 3 hours VM life cycle cloud, it will have more than 1
> > million DB access per second. Not a small number IMHO.
> >
> > Thanks
> > --jyh
> >
> > ___
> > OpenStack-dev mailing list
> > OpenStack-dev@lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> 
> Sorry if I'm dumb, but please try to explain things to me. I don't think I
> follow...
> 
> 10k nodes, 60 VM per node... is 600k VM in the whole cloud. A 3 hour life
> cycle for a VM means every hour 1/3 the nodes turn over so 200k VM
> are created/deleted per hour ... divide by 60 for ... 3,333.333 per minute
> or ... divide by 60 for ... 55.5 VM creations/deletions per second ...
> 
> ... did I do that math right? So where's the million DB accesses per second
> come from? Are the rules fired for every VM on every access so that 600k
> VM + 1 new VM means the rules fire 600k + 1 times? What? Sorry... really
> confused.
> 
> # Shawn Hartsock
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler]The database access in the scheduler filters

2013-11-01 Thread Jiang, Yunhong
As Shawn Hartsock pointed out in the reply, I made a stupid error in the 
calculation. It's in fact 55 access per second, not that big number I 
calculated. 
I thought I graduated from elementary school but seems I'm wrong. Really sorry 
for the stupid error.

--jyh

> -Original Message-
> From: Russell Bryant [mailto:rbry...@redhat.com]
> Sent: Friday, November 01, 2013 9:18 AM
> To: openstack-dev@lists.openstack.org
> Subject: Re: [openstack-dev] [nova][scheduler]The database access in the
> scheduler filters
> 
> On 11/01/2013 09:09 AM, Andrew Laski wrote:
> > On 11/01/13 at 10:16am, John Garbutt wrote:
> >> Its intentional. Cells is there to split up your nodes into more
> >> manageable chunks.
> >
> > I don't think you mean to say that there's intentionally a performance
> > issue.  But yes there are performance issues with the filter scheduler.
> > Because I work on a deployment that uses cells to partition the workload
> > I haven't seen them myself, but there are plenty of reports from others
> > who have encountered them.  And it's easy to run some back of the
> napkin
> > calculations like was done below and see that scheduling will require a
> > lot of resources if there's no partitioning.
> >
> >>
> >> There are quite a few design summit sessions on looking into
> >> alternative approaches to our current scheduler.
> >>
> >> While I would love a single scheduler to make everyone happy, I am
> >> thinking we might end up with several scheduler, each with slightly
> >> different properties, and you pick one depending on what you want to
> >> do with your cloud.
> >
> > +1.  We have the ability to drop in different schedulers right now, but
> > there's only one really useful scheduler in the tree.  There has been
> > talk of making a more performant scheduler which schedules in a 'good
> > enough' fashion through some approximation algorithm.  I would love
> to
> > see that get introduced as another scheduler and not as a rework of the
> > filter scheduler.  I suppose the chance scheduler could technically
> > count for that, but I'm under the impression that it isn't used beyond
> > testing.
> 
> Agreed.
> 
> There's a lot of discussion happening in two different directions, it
> seems.  One group is very interested in improving the scheduler's
> ability to make the best decision possible using various policies.
> Another group is concerned with massive scale and is willing to accept
> "good enough" scheduling to get there.
> 
> I think the filter scheduler is pretty reasonable for the best possible
> decision approach today.  There's some stuff that could perform better.
>  There's more policy knobs that could be added.  There's the cross
> service issue to figure out ... but it's not bad.
> 
> I'm very interested in a new "good enough" scheduler.  I liked the idea
> of running a bunch of schedulers that each only look at a subset of your
> infrastructure and pick something that's good enough.  I'm interested to
> hear other ideas in the session we have on this topic (rethinking
> scheduler design).
> 
> Of course, you get a lot of the massive scale benefits by going to
> cells, too.  If cells is our answer here, I really want to see more
> people stepping up to help with the cells code.  There are still some
> feature gaps to fill.  We should also be looking at the road to getting
> back to only having one way to deploy nova (cells).  Having both cells
> vs non-cells options really isn't ideal long term.
> 
> --
> Russell Bryant
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][scheduler]The database access in the scheduler filters

2013-10-31 Thread Jiang, Yunhong
I noticed several filters (AggregateMultiTenancyIsoaltion, ram_filter, 
type_filter, AggregateInstanceExtraSpecsFilter) have DB access in the 
host_passes(). Some will even access for each invocation.

Just curios if this is considered a performance issue? With a 10k nodes, 60 VM 
per node, and 3 hours VM life cycle cloud, it will have more than 1 million DB 
access per second. Not a small number IMHO.

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong


> -Original Message-
> From: Isaku Yamahata [mailto:isaku.yamah...@gmail.com]
> Sent: Tuesday, October 29, 2013 8:24 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Cc: isaku.yamah...@gmail.com; Itzik Brown
> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
> 
> Hi Yunhong.
> 
> On Tue, Oct 29, 2013 at 08:22:40PM +,
> "Jiang, Yunhong"  wrote:
> 
> > > * describe resource external to nova that is attached to VM in the API
> > > (block device mapping and/or vif references)
> > > * ideally the nova scheduler needs to be aware of the local capacity,
> > > and how that relates to the above information (relates to the cross
> > > service scheduling issues)
> >
> > I think this possibly a bit different. For volume, it's sure managed by
> Cinder, but for PCI devices, currently
> > It ;s managed by nova. So we possibly need nova to translate the
> information (possibly before nova scheduler).
> >
> > > * state of the device should be stored by Neutron/Cinder
> > > (attached/detached, capacity, IP, etc), but still exposed to the
> > > "scheduler"
> >
> > I'm not sure if we can keep the state of the device in Neutron. Currently
> nova manage all PCI devices.
> 
> Yes, with the current implementation, nova manages PCI devices and it
> works.
> That's great. It will remain so in Icehouse cycle (maybe also J?).
> 
> But how about long term direction?
> Neutron should know/manage such network related resources on
> compute nodes?

So you mean the PCI device management will be spited between Nova and Neutron? 
For example, non-NIC device owned by nova and NIC device owned by neutron?

There have been so many discussion of the scheduler enhancement, like 
https://etherpad.openstack.org/p/grizzly-split-out-scheduling , so possibly 
that's the right direction? Let's wait for the summit discussion.

> The implementation in Nova will be moved into Neutron like what Cinder
> did?
> any opinions/thoughts?
> It seems that not so many Neutron developers are interested in PCI
> passthrough at the moment, though.
> 
> There are use cases for this, I think.
> For example, some compute nodes use OVS plugin, another nodes LB
> plugin.
> (Right now it may not possible easily, but it will be with ML2 plugin and
> mechanism driver). User wants their VMs to run on nodes with OVS plugin
> for
> some reason(e.g. performance difference).
> Such usage would be handled similarly.
> 
> Thanks,
> ---
> Isaku Yamahata
> 
> 
> >
> > Thanks
> > --jyh
> >
> >
> > > * connection params get given to Nova from Neutron/Cinder
> > > * nova still has the vif driver or volume driver to make the final
> connection
> > > * the disk should be formatted/expanded, and network info injected in
> > > the same way as before (cloud-init, config drive, DHCP, etc)
> > >
> > > John
> > >
> > > On 29 October 2013 10:17, Irena Berezovsky
> 
> > > wrote:
> > > > Hi Jiang, Robert,
> > > >
> > > > IRC meeting option works for me.
> > > >
> > > > If I understand your question below, you are looking for a way to tie
> up
> > > > between requested virtual network(s) and requested PCI device(s).
> The
> > > way we
> > > > did it in our solution  is to map a provider:physical_network to an
> > > > interface that represents the Physical Function. Every virtual
> network is
> > > > bound to the provider:physical_network, so the PCI device should
> be
> > > > allocated based on this mapping.  We can  map a PCI alias to the
> > > > provider:physical_network.
> > > >
> > > >
> > > >
> > > > Another topic to discuss is where the mapping between neutron
> port
> > > and PCI
> > > > device should be managed. One way to solve it, is to propagate the
> > > allocated
> > > > PCI device details to neutron on port creation.
> > > >
> > > > In case  there is no qbg/qbh support, VF networking configuration
> > > should be
> > > > applied locally on the Host.
> > > >
> > > > The question is when and how to apply networking configuration on
> the
> > > PCI
> > > > device?
> > > >
> > > > We see the following options:
> > > >
> > > > * it can be done on port creation.
> > > >
> > > > *

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong


> -Original Message-
> From: Henry Gessau [mailto:ges...@cisco.com]
> Sent: Tuesday, October 29, 2013 2:23 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
> 
> On Tue, Oct 29, at 4:31 pm, Jiang, Yunhong 
> wrote:
> 
> > Henry,why do you think the "service VM" need the entire PF instead of a
> > VF? I think the SR-IOV NIC should provide QoS and performance
> isolation.
> 
> I was speculating. I just thought it might be a good idea to leave open the
> possibility of assigning a PF to a VM if the need arises.
> 
> Neutron service VMs are a new thing. I will be following the discussions
> and
> there is a summit session for them. It remains to be seen if there is any
> desire/need for full PF ownership of NICs. But if a service VM owns the PF
> and has the right NIC driver it could do some advanced features with it.
> 
At least in current PCI implementation, if a device has no SR-IOV enabled, then 
that device will be exposed and can be assigned (is this your so-called PF?). 
If a device has SR-IOV enabled, then only VF be exposed and the PF is hidden 
from resource tracker. The reason is, when SR-IOV enabled, the PF is mostly 
used to configure and management the VFs, and it will be security issue to 
expose the PF to a guest.

I'm not sure if you are talking about the PF, are you talking about the PF w/ 
or w/o SR-IOV enabled. 

I totally agree that assign a PCI NIC to service VM have a lot of benefit from 
both performance and isolation point of view.

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong
Henry,why do you think the "service VM" need the entire PF instead of a VF? I 
think the SR-IOV NIC should provide QoS and performance isolation.

As to assign entire PCI device to a guest, that should be ok since usually PF 
and VF has different device ID, the tricky thing is, at least for some PCI 
devices, you can't configure that some NIC will have SR-IOV enabled while 
others not.

Thanks
--jyh

> -Original Message-
> From: Henry Gessau [mailto:ges...@cisco.com]
> Sent: Tuesday, October 29, 2013 8:10 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network
> support
> 
> Lots of great info and discussion going on here.
> 
> One additional thing I would like to mention is regarding PF and VF usage.
> 
> Normally VFs will be assigned to instances, and the PF will either not be
> used at all, or maybe some agent in the host of the compute node might
> have
> access to the PF for something (management?).
> 
> There is a neutron design track around the development of "service VMs".
> These are dedicated instances that run neutron services like routers,
> firewalls, etc. It is plausible that a service VM would like to use PCI
> passthrough and get the entire PF. This would allow it to have complete
> control over a physical link, which I think will be wanted in some cases.
> 
> --
> Henry
> 
> On Tue, Oct 29, at 10:23 am, Irena Berezovsky 
> wrote:
> 
> > Hi,
> >
> > I would like to share some details regarding the support provided by
> > Mellanox plugin. It enables networking via SRIOV pass-through devices
> or
> > macvtap interfaces.  It plugin is available here:
> >
> https://github.com/openstack/neutron/tree/master/neutron/plugins/mln
> x.
> >
> > To support either PCI pass-through device and macvtap interface type of
> > vNICs, we set neutron port profile:vnic_type according to the required
> VIF
> > type and then use the created port to 'nova boot' the VM.
> >
> > To  overcome the missing scheduler awareness for PCI devices which
> was not
> > part of the Havana release yet, we
> >
> > have an additional service (embedded switch Daemon) that runs on each
> > compute node.
> >
> > This service manages the SRIOV resources allocation,  answers vNICs
> > discovery queries and applies VLAN/MAC configuration using standard
> Linux
> > APIs (code is here:
> https://github.com/mellanox-openstack/mellanox-eswitchd
> > ).  The embedded switch Daemon serves as a glue layer between VIF
> Driver and
> > Neutron Agent.
> >
> > In the Icehouse Release when SRIOV resources allocation is already part
> of
> > the Nova, we plan to eliminate the need in embedded switch daemon
> service.
> > So what is left to figure out is how to tie up between neutron port and
> PCI
> > device and invoke networking configuration.
> >
> >
> >
> > In our case what we have is actually the Hardware VEB that is not
> programmed
> > via either 802.1Qbg or 802.1Qbh, but configured locally by Neutron
> Agent. We
> > also support both Ethernet and InfiniBand physical network L2
> technology.
> > This means that we apply different configuration commands  to set
> > configuration on VF.
> >
> >
> >
> > I guess what we have to figure out is how to support the generic case for
> > the PCI device networking support, for HW VEB, 802.1Qbg and
> 802.1Qbh cases.
> >
> >
> >
> > BR,
> >
> > Irena
> >
> >
> >
> > *From:*Robert Li (baoli) [mailto:ba...@cisco.com]
> > *Sent:* Tuesday, October 29, 2013 3:31 PM
> > *To:* Jiang, Yunhong; Irena Berezovsky;
> prashant.upadhy...@aricent.com;
> > chris.frie...@windriver.com; He, Yongli; Itzik Brown
> > *Cc:* OpenStack Development Mailing List; Brian Bowen (brbowen);
> Kyle
> > Mestery (kmestery); Sandhya Dasu (sadasu)
> > *Subject:* Re: [openstack-dev] [nova] [neutron] PCI pass-through
> network support
> >
> >
> >
> > Hi Yunhong,
> >
> >
> >
> > I haven't looked at Mellanox in much detail. I think that we'll get more
> > details from Irena down the road. Regarding your question, I can only
> answer
> > based on my experience with Cisco's VM-FEX. In a nutshell:
> >
> >  -- a vNIC is connected to an external switch. Once the host is
> booted
> > up, all the PFs and VFs provisioned on the vNIC will be created, as well as
> > all the corresponding ethernet interfaces .
> >
> >  -- As far as Neutron is

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong
Your explanation of the virtual network and physical network is quite clear and 
should work well. We need change nova code to achieve it, including get the 
physical network for the virtual network, passing the physical network 
requirement to the filter properties etc.

For your port method, so you mean we are sure to passing network id to 'nova 
boot' and nova will create the port during VM boot, am I right?  Also, how can 
nova knows that it need allocate the PCI device for the port? I'd suppose that 
in SR-IOV NIC environment, user don't need specify the PCI requirement. 
Instead, the PCI requirement should come from the network configuration and 
image property. Or you think user still need passing flavor with pci request?

--jyh


From: Irena Berezovsky [mailto:ire...@mellanox.com]
Sent: Tuesday, October 29, 2013 3:17 AM
To: Jiang, Yunhong; Robert Li (baoli); prashant.upadhy...@aricent.com; 
chris.frie...@windriver.com; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Jiang, Robert,
IRC meeting option works for me.
If I understand your question below, you are looking for a way to tie up 
between requested virtual network(s) and requested PCI device(s). The way we 
did it in our solution  is to map a provider:physical_network to an interface 
that represents the Physical Function. Every virtual network is bound to the 
provider:physical_network, so the PCI device should be allocated based on this 
mapping.  We can  map a PCI alias to the provider:physical_network.

Another topic to discuss is where the mapping between neutron port and PCI 
device should be managed. One way to solve it, is to propagate the allocated 
PCI device details to neutron on port creation.
In case  there is no qbg/qbh support, VF networking configuration should be 
applied locally on the Host.
The question is when and how to apply networking configuration on the PCI 
device?
We see the following options:

* it can be done on port creation.

* It can be done when nova VIF driver is called for vNIC plugging. This 
will require to  have all networking configuration available to the VIF driver 
or send request to the neutron server to obtain it.

* It can be done by  having a dedicated L2 neutron agent on each Host 
that scans for allocated PCI devices  and then retrieves networking 
configuration from the server and configures the device. The agent will be also 
responsible for managing update requests coming from the neutron server.


For macvtap vNIC type assignment, the networking configuration can be applied 
by a dedicated L2 neutron agent.

BR,
Irena

From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
Sent: Tuesday, October 29, 2013 9:04 AM

To: Robert Li (baoli); Irena Berezovsky; 
prashant.upadhy...@aricent.com<mailto:prashant.upadhy...@aricent.com>; 
chris.frie...@windriver.com<mailto:chris.frie...@windriver.com>; He, Yongli; 
Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network support

Robert, is it possible to have a IRC meeting? I'd prefer to IRC meeting because 
it's more openstack style and also can keep the minutes clearly.

To your flow, can you give more detailed example. For example, I can consider 
user specify the instance with -nic option specify a network id, and then how 
nova device the requirement to the PCI device? I assume the network id should 
define the switches that the device can connect to , but how is that 
information translated to the PCI property requirement? Will this translation 
happen before the nova scheduler make host decision?

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, October 28, 2013 12:22 PM
To: Irena Berezovsky; 
prashant.upadhy...@aricent.com<mailto:prashant.upadhy...@aricent.com>; Jiang, 
Yunhong; chris.frie...@windriver.com<mailto:chris.frie...@windriver.com>; He, 
Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Irena,

Thank you very much for your comments. See inline.

--Robert

On 10/27/13 3:48 AM, "Irena Berezovsky" 
mailto:ire...@mellanox.com>> wrote:

Hi Robert,
Thank you very much for sharing the information regarding your efforts. Can you 
please share your idea of the end to end flow? How do you suggest  to bind Nova 
and Neutron?

The end to end flow is actually encompassed in the blueprints in a nutshell. I 
will reiterate it in below. The binding between Nova and Neutron occurs with 
the neutron v2 API that nova invokes in order to provision the neutron 
servic

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong

> * describe resource external to nova that is attached to VM in the API
> (block device mapping and/or vif references)
> * ideally the nova scheduler needs to be aware of the local capacity,
> and how that relates to the above information (relates to the cross
> service scheduling issues)

I think this possibly a bit different. For volume, it's sure managed by Cinder, 
but for PCI devices, currently
It ;s managed by nova. So we possibly need nova to translate the information 
(possibly before nova scheduler).

> * state of the device should be stored by Neutron/Cinder
> (attached/detached, capacity, IP, etc), but still exposed to the
> "scheduler"

I'm not sure if we can keep the state of the device in Neutron. Currently nova 
manage all PCI devices.

Thanks
--jyh


> * connection params get given to Nova from Neutron/Cinder
> * nova still has the vif driver or volume driver to make the final connection
> * the disk should be formatted/expanded, and network info injected in
> the same way as before (cloud-init, config drive, DHCP, etc)
> 
> John
> 
> On 29 October 2013 10:17, Irena Berezovsky 
> wrote:
> > Hi Jiang, Robert,
> >
> > IRC meeting option works for me.
> >
> > If I understand your question below, you are looking for a way to tie up
> > between requested virtual network(s) and requested PCI device(s). The
> way we
> > did it in our solution  is to map a provider:physical_network to an
> > interface that represents the Physical Function. Every virtual network is
> > bound to the provider:physical_network, so the PCI device should be
> > allocated based on this mapping.  We can  map a PCI alias to the
> > provider:physical_network.
> >
> >
> >
> > Another topic to discuss is where the mapping between neutron port
> and PCI
> > device should be managed. One way to solve it, is to propagate the
> allocated
> > PCI device details to neutron on port creation.
> >
> > In case  there is no qbg/qbh support, VF networking configuration
> should be
> > applied locally on the Host.
> >
> > The question is when and how to apply networking configuration on the
> PCI
> > device?
> >
> > We see the following options:
> >
> > * it can be done on port creation.
> >
> > * It can be done when nova VIF driver is called for vNIC
> plugging.
> > This will require to  have all networking configuration available to the
> VIF
> > driver or send request to the neutron server to obtain it.
> >
> > * It can be done by  having a dedicated L2 neutron agent on
> each
> > Host that scans for allocated PCI devices  and then retrieves networking
> > configuration from the server and configures the device. The agent will
> be
> > also responsible for managing update requests coming from the neutron
> > server.
> >
> >
> >
> > For macvtap vNIC type assignment, the networking configuration can be
> > applied by a dedicated L2 neutron agent.
> >
> >
> >
> > BR,
> >
> > Irena
> >
> >
> >
> > From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> > Sent: Tuesday, October 29, 2013 9:04 AM
> >
> >
> > To: Robert Li (baoli); Irena Berezovsky;
> prashant.upadhy...@aricent.com;
> > chris.frie...@windriver.com; He, Yongli; Itzik Brown
> >
> >
> > Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle
> Mestery
> > (kmestery); Sandhya Dasu (sadasu)
> > Subject: RE: [openstack-dev] [nova] [neutron] PCI pass-through network
> > support
> >
> >
> >
> > Robert, is it possible to have a IRC meeting? I'd prefer to IRC meeting
> > because it's more openstack style and also can keep the minutes
> clearly.
> >
> >
> >
> > To your flow, can you give more detailed example. For example, I can
> > consider user specify the instance with -nic option specify a network id,
> > and then how nova device the requirement to the PCI device? I assume
> the
> > network id should define the switches that the device can connect to ,
> but
> > how is that information translated to the PCI property requirement? Will
> > this translation happen before the nova scheduler make host decision?
> >
> >
> >
> > Thanks
> >
> > --jyh
> >
> >
> >
> > From: Robert Li (baoli) [mailto:ba...@cisco.com]
> > Sent: Monday, October 28, 2013 12:22 PM
> > To: Irena Berezovsky; prashant.upadhy...@aricent.com; Jiang, Yunhong;
> > chris.frie...@windriver.com; He, Yongli; Itzik Brown
> > Cc:

Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

2013-10-29 Thread Jiang, Yunhong
Robert, is it possible to have a IRC meeting? I'd prefer to IRC meeting because 
it's more openstack style and also can keep the minutes clearly.

To your flow, can you give more detailed example. For example, I can consider 
user specify the instance with -nic option specify a network id, and then how 
nova device the requirement to the PCI device? I assume the network id should 
define the switches that the device can connect to , but how is that 
information translated to the PCI property requirement? Will this translation 
happen before the nova scheduler make host decision?

Thanks
--jyh

From: Robert Li (baoli) [mailto:ba...@cisco.com]
Sent: Monday, October 28, 2013 12:22 PM
To: Irena Berezovsky; prashant.upadhy...@aricent.com; Jiang, Yunhong; 
chris.frie...@windriver.com; He, Yongli; Itzik Brown
Cc: OpenStack Development Mailing List; Brian Bowen (brbowen); Kyle Mestery 
(kmestery); Sandhya Dasu (sadasu)
Subject: Re: [openstack-dev] [nova] [neutron] PCI pass-through network support

Hi Irena,

Thank you very much for your comments. See inline.

--Robert

On 10/27/13 3:48 AM, "Irena Berezovsky" 
mailto:ire...@mellanox.com>> wrote:

Hi Robert,
Thank you very much for sharing the information regarding your efforts. Can you 
please share your idea of the end to end flow? How do you suggest  to bind Nova 
and Neutron?

The end to end flow is actually encompassed in the blueprints in a nutshell. I 
will reiterate it in below. The binding between Nova and Neutron occurs with 
the neutron v2 API that nova invokes in order to provision the neutron 
services. The vif driver is responsible for plugging in an instance onto the 
networking setup that neutron has created on the host.

Normally, one will invoke "nova boot" api with the -nic options to specify the 
nic with which the instance will be connected to the network. It currently 
allows net-id, fixed ip and/or port-id to be specified for the option. However, 
it doesn't allow one to specify special networking requirements for the 
instance. Thanks to the nova pci-passthrough work, one can specify PCI 
passthrough device(s) in the nova flavor. But it doesn't provide means to tie 
up these PCI devices in the case of ethernet adpators with networking services. 
Therefore the idea is actually simple as indicated by the blueprint titles, to 
provide means to tie up SRIOV devices with neutron services. A work flow would 
roughly look like this for 'nova boot':

  -- Specifies networking requirements in the -nic option. Specifically for 
SRIOV, allow the following to be specified in addition to the existing required 
information:
   . PCI alias
   . direct pci-passthrough/macvtap
   . port profileid that is compliant with 802.1Qbh

The above information is optional. In the absence of them, the existing 
behavior remains.

 -- if special networking requirements exist, Nova api creates PCI requests 
in the nova instance type for scheduling purpose

 -- Nova scheduler schedules the instance based on the requested flavor 
plus the PCI requests that are created for networking.

 -- Nova compute invokes neutron services with PCI passthrough information 
if any

 --  Neutron performs its normal operations based on the request, such as 
allocating a port, assigning ip addresses, etc. Specific to SRIOV, it should 
validate the information such as profileid, and stores them in its db. It's 
also possible to associate a port profileid with a neutron network so that port 
profileid becomes optional in the -nic option. Neutron returns  nova the port 
information, especially for PCI passthrough related information in the port 
binding object. Currently, the port binding object contains the following 
information:
  binding:vif_type
  binding:host_id
  binding:profile
  binding:capabilities

-- nova constructs the domain xml and plug in the instance by calling the 
vif driver. The vif driver can build up the interface xml based on the port 
binding information.




The blueprints you registered make sense. On Nova side, there is a need to bind 
between requested virtual network and PCI device/interface to be allocated as 
vNIC.
On the Neutron side, there is a need to  support networking configuration of 
the vNIC. Neutron should be able to identify the PCI device/macvtap interface 
in order to apply configuration. I think it makes sense to provide neutron 
integration via dedicated Modular Layer 2 Mechanism Driver to allow PCI 
pass-through vNIC support along with other networking technologies.

I haven't sorted through this yet. A neutron port could be associated with a 
PCI device or not, which is a common feature, IMHO. However, a ML2 driver may 
be needed specific to a particular SRIOV technology.


During the Havana Release, we introduced Mellanox Neutron plugin that enables 
networking via SRIOV pass-throug

[openstack-dev] One question to AggregateRamFilter

2013-10-25 Thread Jiang, Yunhong
Hi, stackers,

When reading code related to the resource tracker, I noticed AggregateRamFilter 
as in https://review.openstack.org/#/c/33828/. 

I'm not sure if it's better to use per node configuration of ram ration, 
instead of depends on the host aggregate? Currently we have to have DB call for 
each scheduler call and is really a performance issue. Also, if any instance is 
scheduled to a host before the host aggregate is created/setup, a wrong ram 
ratio can cause trouble to host like OOM.

With per node configuration, I'd add an column in DB to indicate 
memory_mb_limit, and this information will be provided by resource tracker. The 
benefits of the change are:
a) The host have better idea of the memory limit usable. And we can even 
provide other method to calculate in resource tracker other than ration.
b) it makes the flow more clean. Currently the resource tracker make claims 
decision with 'limits' passing from scheduler, that's a bit strange IMHO. I'd 
think scheduler makes the scheduler decision, instead of the resource 
calculation, while resource tracker provide resource information.

I think the shortcoming of the per node configuration is, not so easy to 
change. But such information should mostly related to host configuration like 
swap size etc and should be more static and deployment setup should be ok.

Any idea?

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] One question to the fakelibvirt in nova test

2013-10-15 Thread Jiang, Yunhong
I thought it will be used when use local environment for ./run_test.sh, 
but./run_test.sh -N will have several failure and seems not supported anymore.

Thanks for any input/suggestions.

--jyh

> -Original Message-
> From: Jiang, Yunhong [mailto:yunhong.ji...@intel.com]
> Sent: Tuesday, October 15, 2013 11:04 AM
> To: openstack-dev@lists.openstack.org
> Subject: [openstack-dev] One question to the fakelibvirt in nova test
> 
> Hi, stackers,
>   I have a question to followed code in
> nova/tests/virt/libvirt/test_libvirt.py. My question is, when will the 'import
> libvirt' success and the fake libvirt not used?
> 
> try:
> import libvirt
> except ImportError:
> import nova.tests.virt.libvirt.fakelibvirt as libvirt
> libvirt_driver.libvirt = libvirt
> 
> Thanks
> --jyh
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] One question to the fakelibvirt in nova test

2013-10-15 Thread Jiang, Yunhong
Hi, stackers,
I have a question to followed code in 
nova/tests/virt/libvirt/test_libvirt.py. My question is, when will the 'import 
libvirt' success and the fake libvirt not used?

try:
import libvirt
except ImportError:
import nova.tests.virt.libvirt.fakelibvirt as libvirt
libvirt_driver.libvirt = libvirt

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [scheduler] blueprint for host/hypervisor location information

2013-10-02 Thread Jiang, Yunhong
IMHO, this seems fit better of the availability since location is more about 
availability like power supply, network switch availability etc.

Or, what's the boundary of host aggregate and availability zone? Seems host 
aggregate is so magic that it can cover every requirement to separate/group the 
host, but also, because it's so magic, it's sure to require some other 
mechanism to help it to do the real separation/group.

Thanks
--jyh

From: Mike Spreitzer [mailto:mspre...@us.ibm.com]
Sent: Tuesday, October 01, 2013 6:53 AM
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] [nova] [scheduler] blueprint for host/hypervisor 
location information

Maybe the answer is hiding in plain sight: host aggregates.  This is a concept 
we already have, and it allows identification of arbitrary groupings for 
arbitrary purposes.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [pci passthrough] how to fill instance_type_extra_specs for a pci passthrough?

2013-09-13 Thread Jiang, Yunhong
I created a wiki page at https://wiki.openstack.org/wiki/Pci_passthrough , and 
I think Irena has updated it also.

Thanks
--jyh

> -Original Message-
> From: David Kang [mailto:dk...@isi.edu]
> Sent: Friday, September 13, 2013 1:07 PM
> To: OpenStack Development Mailing List
> Subject: Re: [openstack-dev] [nova] [pci passthrough] how to fill
> instance_type_extra_specs for a pci passthrough?
> 
> > From: "David Kang" 
> > To: "OpenStack Development Mailing List"
> 
> > Sent: Friday, September 13, 2013 4:03:24 PM
> > Subject: [nova] [pci passthrough] how to fill instance_type_extra_specs
> for a pci passthrough?
> 
>  Sorry for the last empty mail.
> I cannot find a good document for how to describe pci passthrough in
> nova.conf.
> 
>  As an example, if I have the following entries in nova.conf, how should
> the instance_type_extra_specs must be?
> (The following entries are just for a test.)
> 
> pci_alias={"name":"test", "product_id":"7190", "vendor_id":"8086",
> "device_type":"ACCEL"}
> pci_passthrough_whitelist=[{"vendor_id":"8086","product_id":"7190"}]
> 
>  I'll appreciate any advice and/or pointer for the document.
> 
>  Thanks,
>  David
> 
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [pci device passthrough] fails with "NameError: global name '_' is not defined"

2013-09-11 Thread Jiang, Yunhong
Sorry for slow response, I'm out of office to IDF, I will have a look on it 
today.

Thanks
--jyh

> -Original Message-
> From: David Kang [mailto:dk...@isi.edu]
> Sent: Wednesday, September 11, 2013 6:11 AM
> To: OpenStack Development Mailing List
> Subject: Re: [openstack-dev] [nova] [pci device passthrough] fails with
> "NameError: global name '_' is not defined"
> 
> 
> 
> - Original Message -
> > From: "yongli he" 
> > To: "OpenStack Development Mailing List"
> 
> > Sent: Wednesday, September 11, 2013 4:41:13 AM
> > Subject: Re: [openstack-dev] [nova] [pci device passthrough] fails with
> "NameError: global name '_' is not defined"
> > 于 2013年09月11日 05:38, David Kang 写道:
> > >
> > > - Original Message -
> > >> From: "Russell Bryant" 
> > >> To: "David Kang" 
> > >> Cc: "OpenStack Development Mailing List"
> > >> 
> > >> Sent: Tuesday, September 10, 2013 5:17:15 PM
> > >> Subject: Re: [openstack-dev] [nova] [pci device passthrough] fails
> > >> with "NameError: global name '_' is not defined"
> > >> On 09/10/2013 05:03 PM, David Kang wrote:
> > >>> - Original Message -
> >  From: "Russell Bryant" 
> >  To: "OpenStack Development Mailing List"
> >  
> >  Cc: "David Kang" 
> >  Sent: Tuesday, September 10, 2013 4:42:41 PM
> >  Subject: Re: [openstack-dev] [nova] [pci device passthrough]
> >  fails
> >  with "NameError: global name '_' is not defined"
> >  On 09/10/2013 03:56 PM, David Kang wrote:
> > >   Hi,
> > >
> > >I'm trying to test pci device passthrough feature.
> > > Havana3 is installed using Packstack on CentOS 6.4.
> > > Nova-compute dies right after start with error "NameError:
> > > global
> > > name '_' is not defined".
> > > I'm not sure if it is due to misconfiguration of nova.conf or
> > > bug.
> > > Any help will be appreciated.
> > >
> > > Here is the info:
> > >
> > > /etc/nova/nova.conf:
> > > pci_alias={"name":"test", "product_id":"7190",
> > > "vendor_id":"8086",
> > > "device_type":"ACCEL"}
> > >
> > >
> pci_passthrough_whitelist=[{"vendor_id":"8086","product_id":"7190"}]
> > >
> > >   With that configuration, nova-compute fails with the following
> > >   log:
> > >
> > >File
> > >
> "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py
> ",
> > >line 461, in _process_data
> > >  **args)
> > >
> > >File
> > >
> "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/dispatch
> er.py",
> > >line 172, in dispatch
> > >  result = getattr(proxyobj, method)(ctxt, **kwargs)
> > >
> > >File
> > >
> "/usr/lib/python2.6/site-packages/nova/conductor/manager.py",
> > >line 567, in object_action
> > >  result = getattr(objinst, objmethod)(context, *args,
> > >  **kwargs)
> > >
> > >File "/usr/lib/python2.6/site-packages/nova/objects/base.py",
> > >line
> > >141, in wrapper
> > >  return fn(self, ctxt, *args, **kwargs)
> > >
> > >File
> > >
> "/usr/lib/python2.6/site-packages/nova/objects/pci_device.py",
> > >line 242, in save
> > >  self._from_db_object(context, self, db_pci)
> > >
> > > NameError: global name '_' is not defined
> > > 2013-09-10 12:52:23.774 14749 TRACE
> > > nova.openstack.common.threadgroup Traceback (most recent
> call
> > > last):
> > > 2013-09-10 12:52:23.774 14749 TRACE
> > > nova.openstack.common.threadgroup File
> > >
> "/usr/lib/python2.6/site-packages/nova/openstack/common/threadgroup.
> py",
> > > line 117, in wait
> > > 2013-09-10 12:52:23.774 14749 TRACE
> > > nova.openstack.common.threadgroup x.wait()
> > > 2013-09-10 12:52:23.774 14749 TRACE
> > > nova.openstack.common.threadgroup File
> > >
> "/usr/lib/python2.6/site-packages/nova/openstack/common/threadgroup.
> py",
> > > line 49, in wait
> > > 2013-09-10 12:52:23.774 14749 TRACE
> > > nova.openstack.common.threadgroup return self.thread.wait()
> > > 2013-09-10 12:52:23.774 14749 TRACE
> > > nova.openstack.common.threadgroup File
> > > "/usr/lib/python2.6/site-packages/eventlet/greenthread.py", line
> > > 166, in wait
> > > 2013-09-10 12:52:23.774 14749 TRACE
> > > nova.openstack.common.threadgroup return
> self._exit_event.wait()
> > > 2013-09-10 12:52:23.774 14749 TRACE
> > > nova.openstack.common.threadgroup File
> > > "/usr/lib/python2.6/site-packages/eventlet/event.py", line 116,
> > > in
> > > wait
> > > 2013-09-10 12:52:23.774 14749 TRACE
> > > nova.openstack.common.threadgroup return
> hubs.get_hub().switch()
> > > 2013-09-10 12:52:23.774 14749 TRACE
> > > nova.openstack.common.threadgroup File
> > > "/usr/lib/python2.6/site-packages/eventlet/hubs/hub.py", line
> > > 177,
> > > in switch
> > > 2013-09-10 12:52:23.774

Re: [openstack-dev] [Nova] Scheduler support for PCI passthrough

2013-08-28 Thread Jiang, Yunhong
Gary
 Firstly, thanks for your review very much.
 The pci_stats is calculated in the resource tracker in the compute 
node and is also saved in compute_node, I think currently the scheduler depends 
on the information provided by the compute_node table so this method should fit 
into current framework.
 Please notice that the scheduler only decide the host that can meet 
the requirement, and it's the resource tracker in compute node that will do the 
real device allocation. So if scheduler does not get the latest information, it 
may either can't find the host, or, it finds a host with wrong information and 
the re-try mechanism should work. Anyway, this is same to other compute nodes 
information like free_ram or free_vcpus, right?

 But you does remind me one thing, that if a hot plug happens , after 
the resource tracker select the device and before the instance is really 
created. Possibly the virt driver need check the requirement before create the 
domain. But this race condition chain will not end. After all, there are window 
between the virt driver check and the instance creation, and no idea how can we 
guarantee this.

Thanks
--jyh

From: Gary Kotton [mailto:gkot...@vmware.com]
Sent: Wednesday, August 28, 2013 2:19 AM
To: OpenStack Development Mailing List (openstack-dev@lists.openstack.org)
Subject: [openstack-dev] [Nova] Scheduler support for PCI passthrough

Hi,
Whilst reviewing the code I think that I have stumbled on an issue (I hope that 
I am mistaken). The change set (https://review.openstack.org/#/c/35749/) 
expects pci stats to be returned from the host. There are a number of issues 
here that I have concern with an would like to know what the process is for 
addressing the fact that the compute node may not provide these statistics, for 
example, this could be due to the fact that the driver has not been updated to 
return the pci stats or this could be if the scheduler has been upgraded prior 
to the compute node (what is the process for the upgrade).
Thanks
Gary
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Frustrations with review wait times

2013-08-27 Thread Jiang, Yunhong


> -Original Message-
> From: Michael Still [mailto:mi...@stillhq.com]
> Sent: Tuesday, August 27, 2013 7:45 PM
> To: OpenStack Development Mailing List
> Subject: Re: [openstack-dev] [Nova] Frustrations with review wait times
> 
> [Concerns over review wait times in the nova project]
> 
> I think that we're also seeing the fact that nova-core's are also
> developers. nova-core members have the same feature freeze deadline,
> and that means that to a certain extent we need to stop reviewing in
> order to get our own code ready by the deadline.
> 

+1. Nova cores are very kind and helpful, but I think they are really 
overloaded because they are both core developers and core reviewers.

-jyh


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] The PCI support blueprint

2013-07-22 Thread Jiang, Yunhong
Hi, Boris
I'm a surprised that you want to postpone the PCI support 
(https://blueprints.launchpad.net/nova/+spec/pci-passthrough-base) to I 
release. You and our team have been working on this for a long time, and the 
patches has been reviewed several rounds. And we have been waiting for your DB 
layer patch for two weeks without any update.

Can you give more reason why it's pushed to I release? If you are out 
of bandwidth, we are sure to take it and push it to H release!

Is it because you want to base your DB layer on your 'A simple way to 
improve nova scheduler'? That really does not make sense to me. Firstly, that 
proposal is still under early discussion and get several different voices 
already, secondly, PCI support is far more than DB layer, it includes resource 
tracker, scheduler filter, libvirt support enhancement etc. Even if we will 
change the scheduler that way after I release, we need only change the DB 
layer, and I don't think that's a big effort!

Thanks
--jyh

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] A simple way to improve nova scheduler

2013-07-22 Thread Jiang, Yunhong
The thing laggy is, currently resource tracker will update the usage 
information whenever resource changes, not only in periodic tasks. If you 
really want to get the current result with periodic update, you have to do some 
in-memory management and you even need sync between different scheduler 
controller, as stated in  
http://lists.openstack.org/pipermail/openstack-dev/2013-June/010490.html .

-jyh

From: Boris Pavlovic [mailto:bo...@pavlovic.me]
Sent: Monday, July 22, 2013 5:17 AM
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] A simple way to improve nova scheduler

Joe,

>> Speaking of Chris Beherns  "Relying on anything but the DB for current 
>> memory free, etc, is just too laggy... so we need to stick with it, IMO." 
>> http://lists.openstack.org/pipermail/openstack-dev/2013-June/010485.html

It doesn't scale, use tons of resources, works slow and is hard to extend.
Also the mechanism of getting free and used memory is done by virt layer.
And only thing that could be laggy is rpc (but it is used also by compute node 
update)


>> * How do you bring a new scheduler up in an existing deployment and make it 
>> get the full state of the system?

You should wait for a one periodic task time. And you will get full information 
about all compute nodes.

>> *  Broadcasting RPC updates from compute nodes to the scheduler means every 
>> scheduler has to process  the same RPC message.  And if a deployment hits 
>> the point where the number of compute updates is consuming 99 percent of the 
>> scheduler's time just adding another scheduler won't fix anything as it will 
>> get bombarded too.


If we are speaking about numbers. You are able to see our doc, where they are 
counted.
If we have 10k nodes it will make only 150rpc calls/sec (which means nothing 
for cpu). By the way we way we will remove 150 calls/s from conductor. One more 
thing currently in 10nodes deployment I think we will spend almost all time fro 
waiting DB (compute_nodes_get_all()). And also when we are calling this method 
in this moment we should process all data for 60 sec. (So in this case in 
numbers we are doing on scheduler side 60*request_pro_sec of our approach. 
Which means if we get more then 1 request pro sec we will do more CPU load.)


>> Also OpenStack is already deeply invested in using the central DB model for 
>> the state of the 'world' and while I am not against changing that, I think 
>> we should evaluate that switch in a larger context.

Step by step. As first step we could just remove compute_node_get_all method. 
Which will make our openstack much scalable and fast.


By the way see one more time answers on your comments in doc.

Best regards,
Boris Pavlovic

Mirantis Inc.




On Sat, Jul 20, 2013 at 3:14 AM, Joe Gordon 
mailto:joe.gord...@gmail.com>> wrote:


On Fri, Jul 19, 2013 at 3:13 PM, Sandy Walsh 
mailto:sandy.wa...@rackspace.com>> wrote:


On 07/19/2013 05:36 PM, Boris Pavlovic wrote:
> Sandy,
>
> I don't think that we have such problems here.
> Because scheduler doesn't pool compute_nodes.
> The situation is another compute_nodes notify scheduler about their
> state. (instead of updating their state in DB)
>
> So for example if scheduler send request to compute_node, compute_node
> is able to run rpc call to schedulers immediately (not after 60sec).
>
> So there is almost no races.
There are races that occur between the eventlet request threads. This is
why the scheduler has been switched to single threaded and we can only
run one scheduler.

This problem may have been eliminated with the work that Chris Behrens
and Brian Elliott were doing, but I'm not sure.


Speaking of Chris Beherns  "Relying on anything but the DB for current memory 
free, etc, is just too laggy... so we need to stick with it, IMO." 
http://lists.openstack.org/pipermail/openstack-dev/2013-June/010485.html

Although there is some elegance to the proposal here I have some concerns.

If just using RPC broadcasts from compute to schedulers to keep track of 
things, we get two issues:

* How do you bring a new scheduler up in an existing deployment and make it get 
the full state of the system?
* Broadcasting RPC updates from compute nodes to the scheduler means every 
scheduler has to process  the same RPC message.  And if a deployment hits the 
point where the number of compute updates is consuming 99 percent of the 
scheduler's time just adding another scheduler won't fix anything as it will 
get bombarded too.

Also OpenStack is already deeply invested in using the central DB model for the 
state of the 'world' and while I am not against changing that, I think we 
should evaluate that switch in a larger context.



But certainly, the old approach of having the compute node broadcast
status every N seconds is not suitable and was eliminated a long time ago.

>
>
> Best regards,
> Boris Pavlovic
>
> Mirantis Inc.
>
>
>
> On Sat, Jul 20, 2013 at 12:23 AM, Sandy Walsh 
> mailto:sandy.wa...@rackspace.com>
> 

Re: [openstack-dev] The PCI support blueprint

2013-07-22 Thread Jiang, Yunhong
As for the scalability issue, boris, are you talking about the VF number issue, 
i.e. A physical PCI devices can at most have 256 virtual functions? 

I think we have discussed this before. We should keep the compute node to 
manage the same VF functions, so that VFs belongs to the same PF will have only 
one entry in DB, with a field indicating the number of free VFs. Thus there 
will be no scalability issue because the number of PCI slot is limited.

We didn't implement this mechanism on current patch set because we agree to 
make it a  enhancement. If it's really a concern, please raise it and we will 
enhance our resource tracker for this. That's not complex task.

Thanks
--jyh

> -Original Message-
> From: Russell Bryant [mailto:rbry...@redhat.com]
> Sent: Monday, July 22, 2013 8:22 AM
> To: Jiang, Yunhong
> Cc: bo...@pavlovic.me; openstack-dev@lists.openstack.org
> Subject: Re: The PCI support blueprint
> 
> On 07/22/2013 11:17 AM, Jiang, Yunhong wrote:
> > Hi, Boris
> > I'm a surprised that you want to postpone the PCI support
> (https://blueprints.launchpad.net/nova/+spec/pci-passthrough-base) to I
> release. You and our team have been working on this for a long time, and
> the patches has been reviewed several rounds. And we have been waiting
> for your DB layer patch for two weeks without any update.
> >
> > Can you give more reason why it's pushed to I release? If you are out
> of bandwidth, we are sure to take it and push it to H release!
> >
> > Is it because you want to base your DB layer on your 'A simple way to
> improve nova scheduler'? That really does not make sense to me. Firstly,
> that proposal is still under early discussion and get several different voices
> already, secondly, PCI support is far more than DB layer, it includes
> resource tracker, scheduler filter, libvirt support enhancement etc. Even if
> we will change the scheduler that way after I release, we need only
> change the DB layer, and I don't think that's a big effort!
> 
> Boris mentioned scalability concerns with the current approach on IRC.
> I'd like more detail.
> 
> In general, if we can see a reasonable path to upgrade what we have now
> to make it better in the future, then we don't need to block it because
> of that.  If the current approach will result in a large upgrade impact
> to users to be able to make it better, that would be a reason to hold
> off.  It also depends on how serious the scalability concerns are.
> 
> --
> Russell Bryant

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-19 Thread Jiang, Yunhong
The  "lazy load" is , with lazy load, for example, the framework don't need 
fetch the PCI information if no PCI filter specified.

The discussion on 
'http://markmail.org/message/gxoqi6coscd2lhwo#query:+page:1+mid:7ksr6byyrpcgkqjv+state:results'
   gives a lot of information.

--jyh



From: Boris Pavlovic [mailto:bo...@pavlovic.me]
Sent: Friday, July 19, 2013 1:07 PM
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] [Nova] New DB column or new DB table?

Jiang,

I would like to reduce "magic"

1) We are using already RPC (because all compute nodes update are done in DB 
via conductor (which means RPC call).
So count of RPC calls and size of msg will be the same.

2) There is no lazy load when you have to fetch all data about all compute 
nodes on every request to scheduler.

3) Object models are off topic

Best regards,
Boris Pavlovic

Mirantis Inc.



On Fri, Jul 19, 2013 at 11:23 PM, Jiang, Yunhong 
mailto:yunhong.ji...@intel.com>> wrote:
Boris
   I think you in fact covered two topic, one is if use db or rpc for 
communication. This has been discussed a lot. But I didn't find the conclusion. 
From the discussion,  seems the key thing is the fan out messages. I'd suggest 
you to bring this to scheduler sub meeting.

http://eavesdrop.openstack.org/meetings/scheduler/2013/scheduler.2013-06-11-14.59.log.html
http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg00070.html
http://comments.gmane.org/gmane.comp.cloud.openstack.devel/23

   The second topic is adding extra tables to compute nodes. I think we 
need the lazy loading for the compute node, and also I think with object model, 
we can further improve it if we utilize the compute node object.

Thanks
--jyh


From: Boris Pavlovic [mailto:bo...@pavlovic.me<mailto:bo...@pavlovic.me>]
Sent: Friday, July 19, 2013 10:07 AM

To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] [Nova] New DB column or new DB table?

Hi all,

We have to much different branches about scheduler (so I have to repeat here 
also).

I am against to add some extra tables that will be joined to compute_nodes 
table on each scheduler request (or adding large text columns).
Because it make our non scalable scheduler even less scalable.

Also if we just remove DB between scheduler and compute nodes we will get 
really good improvement in all aspects (performance, db load, network traffic, 
scalability )
And also it will be easily to use another resources provider (cinder, 
ceilometer e.g..) in Nova scheduler.

And one more thing this all could be really simple implement in current Nova, 
without big changes
 
https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?usp=sharing


Best regards,
Boris Pavlovic

Mirantis Inc.

On Fri, Jul 19, 2013 at 8:44 PM, Dan Smith 
mailto:d...@danplanet.com>> wrote:
> IIUC, Ceilometer is currently a downstream consumer of data from
> Nova, but no functionality in Nova is a consumer of data from
> Ceilometer. This is good split from a security separation point of
> view, since the security of Nova is self-contained in this
> architecture.
>
> If Nova schedular becomes dependant on data from ceilometer, then now
> the security of Nova depends on the security of Ceilometer, expanding
> the attack surface. This is not good architecture IMHO.
Agreed.

> At the same time, I hear your concerns about the potential for
> duplication of stats collection functionality between Nova &
> Ceilometer. I don't think we neccessarily need to remove 100% of
> duplication. IMHO probably the key thing is for the virt drivers to
> expose a standard API for exporting the stats, and make sure that
> both ceilometer & nova schedular use the same APIs and ideally the
> same data feed, so we're not invoking the same APIs twice to get the
> same data.
I imagine there's quite a bit that could be shared, without dependency
between the two. Interfaces out of the virt drivers may be one, and the
code that boils numbers into useful values, as well as perhaps the
format of the JSON blobs that are getting shoved into the database.
Perhaps a ceilo-core library with some very simple primitives and
definitions could be carved out, which both nova and ceilometer could
import for consistency, without a runtime dependency?

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org<mailto:OpenStack-dev@lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] New DB column or new DB table?

2013-07-19 Thread Jiang, Yunhong
Boris
   I think you in fact covered two topic, one is if use db or rpc for 
communication. This has been discussed a lot. But I didn't find the conclusion. 
From the discussion,  seems the key thing is the fan out messages. I'd suggest 
you to bring this to scheduler sub meeting.

http://eavesdrop.openstack.org/meetings/scheduler/2013/scheduler.2013-06-11-14.59.log.html
http://www.mail-archive.com/openstack-dev@lists.openstack.org/msg00070.html
http://comments.gmane.org/gmane.comp.cloud.openstack.devel/23

   The second topic is adding extra tables to compute nodes. I think we 
need the lazy loading for the compute node, and also I think with object model, 
we can further improve it if we utilize the compute node object.

Thanks
--jyh


From: Boris Pavlovic [mailto:bo...@pavlovic.me]
Sent: Friday, July 19, 2013 10:07 AM
To: OpenStack Development Mailing List
Subject: Re: [openstack-dev] [Nova] New DB column or new DB table?

Hi all,

We have to much different branches about scheduler (so I have to repeat here 
also).

I am against to add some extra tables that will be joined to compute_nodes 
table on each scheduler request (or adding large text columns).
Because it make our non scalable scheduler even less scalable.

Also if we just remove DB between scheduler and compute nodes we will get 
really good improvement in all aspects (performance, db load, network traffic, 
scalability )
And also it will be easily to use another resources provider (cinder, 
ceilometer e.g..) in Nova scheduler.

And one more thing this all could be really simple implement in current Nova, 
without big changes
 
https://docs.google.com/document/d/1_DRv7it_mwalEZzLy5WO92TJcummpmWL4NWsWf0UWiQ/edit?usp=sharing


Best regards,
Boris Pavlovic

Mirantis Inc.

On Fri, Jul 19, 2013 at 8:44 PM, Dan Smith 
mailto:d...@danplanet.com>> wrote:
> IIUC, Ceilometer is currently a downstream consumer of data from
> Nova, but no functionality in Nova is a consumer of data from
> Ceilometer. This is good split from a security separation point of
> view, since the security of Nova is self-contained in this
> architecture.
>
> If Nova schedular becomes dependant on data from ceilometer, then now
> the security of Nova depends on the security of Ceilometer, expanding
> the attack surface. This is not good architecture IMHO.
Agreed.

> At the same time, I hear your concerns about the potential for
> duplication of stats collection functionality between Nova &
> Ceilometer. I don't think we neccessarily need to remove 100% of
> duplication. IMHO probably the key thing is for the virt drivers to
> expose a standard API for exporting the stats, and make sure that
> both ceilometer & nova schedular use the same APIs and ideally the
> same data feed, so we're not invoking the same APIs twice to get the
> same data.
I imagine there's quite a bit that could be shared, without dependency
between the two. Interfaces out of the virt drivers may be one, and the
code that boils numbers into useful values, as well as perhaps the
format of the JSON blobs that are getting shoved into the database.
Perhaps a ceilo-core library with some very simple primitives and
definitions could be carved out, which both nova and ceilometer could
import for consistency, without a runtime dependency?

--Dan

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev