date:20170927


On 09/27/2017 04:55 PM, Blair Bethwaite wrote:

Hi Prema

On 28 September 2017 at 07:10, Premysl Kouril  wrote:

Hi, I work with Jakub (the op of this thread) and here is my two
cents: I think what is critical to realize is that KVM virtual
machines can have substantial memory overhead of up to 25% of memory,
allocated to KVM virtual machine itself. This overhead memory is not


I'm curious what sort of VM configuration causes such high overheads,
is this when using highly tuned virt devices with very large buffers?


For what it's worth we ran into issues a couple years back with I/O to 
RDB-backed disks in writethrough/writeback.  There was a bug that allowed a very 
large number of in-flight operations if the ceph server couldn't keep up with 
the aggregate load.  We hacked a local solution, I'm not sure if it's been dealt 
with upstream.


I think virtio networking has also caused issues, though not as bad.  (But 
noticeable when running close to the line.)


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

Hi Prema

On 28 September 2017 at 07:10, Premysl Kouril  wrote:
> Hi, I work with Jakub (the op of this thread) and here is my two
> cents: I think what is critical to realize is that KVM virtual
> machines can have substantial memory overhead of up to 25% of memory,
> allocated to KVM virtual machine itself. This overhead memory is not

I'm curious what sort of VM configuration causes such high overheads,
is this when using highly tuned virt devices with very large buffers?

> This KVM virtual machine overhead is what is causing the OOMs in our
> infrastructure and that's what we need to fix.

If you are pinning multiple guests per NUMA node in a multi-NUMA node
system then you might also have issues with uneven distribution of
system overheads across nodes, depending on how close to the sun you
are flying.

-- 
Cheers,
~Blairo

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [all][infra] Zuul v3 migration update

2017-09-27 Thread Monty Taylor


Hey everybody,

We're there. It's ready.

We've worked through all of the migration script issues and are happy 
with the results. The cutover trigger is primed and ready to go.


But as it's 21:51 UTC / 16:52 US Central it's a short day to be 
available to respond to the questions folks may have... so we're going 
to postpone one more day.


Since it's all ready to go we'll be looking at flipping the switch first 
thing in the morning. (basically as soon as the West Coast wakes up and 
is ready to go)


The project-config repo should still be considered frozen except for 
migration-related changes. Hopefully we'll be able to flip the final 
switch early tomorrow.


If you haven't yet, please see [1] for information about the transition.

[1] https://docs.openstack.org/infra/manual/zuulv3.html

Thanks,

Monty

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM


On 09/27/2017 03:10 PM, Premysl Kouril wrote:

Lastly, qemu has overhead that varies depending on what you're doing in the
guest.  In particular, there are various IO queues that can consume
significant amounts of memory.  The company that I work for put in a good
bit of effort engineering things so that they work more reliably, and part
of that was determining how much memory to reserve for the host.

Chris


Hi, I work with Jakub (the op of this thread) and here is my two
cents: I think what is critical to realize is that KVM virtual
machines can have substantial memory overhead of up to 25% of memory,
allocated to KVM virtual machine itself. This overhead memory is not
considered in nova code when calculating if the instance being
provisioned actually fits into host's available resources (only the
memory, configured in instance's flavor is considered). And this is
especially being a problem when CPU pinning is used as the memory
allocation is bounded by limits of specific NUMA node (due to the
strict memory allocation mode). This renders the global reservation
parameter reserved_host_memory_mb useless as it doesn't take NUMA into
account.

This KVM virtual machine overhead is what is causing the OOMs in our
infrastructure and that's what we need to fix.


Feel free to report a bug against nova...maybe reserved_host_memory_mb should be 
a list of per-numa-node values.


It's a bit of a hack, but if you use hugepages for all the guests you can 
control the amount of per-numa-node memory reserved for host overhead.


Since the kvm overhead memory is allocated from 4K pages (in my experience) you 
can just choose to leave some memory on each host NUMA node as 4K pages instead 
of allocating them as hugepages.


Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Premysl Kouril

> Lastly, qemu has overhead that varies depending on what you're doing in the
> guest.  In particular, there are various IO queues that can consume
> significant amounts of memory.  The company that I work for put in a good
> bit of effort engineering things so that they work more reliably, and part
> of that was determining how much memory to reserve for the host.
>
> Chris

Hi, I work with Jakub (the op of this thread) and here is my two
cents: I think what is critical to realize is that KVM virtual
machines can have substantial memory overhead of up to 25% of memory,
allocated to KVM virtual machine itself. This overhead memory is not
considered in nova code when calculating if the instance being
provisioned actually fits into host's available resources (only the
memory, configured in instance's flavor is considered). And this is
especially being a problem when CPU pinning is used as the memory
allocation is bounded by limits of specific NUMA node (due to the
strict memory allocation mode). This renders the global reservation
parameter reserved_host_memory_mb useless as it doesn't take NUMA into
account.

This KVM virtual machine overhead is what is causing the OOMs in our
infrastructure and that's what we need to fix.

Regards,
Prema

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Newton End-Of-Life (EOL) next month (reminder #1)

2017-09-27 Thread Ben Nemec




On 09/27/2017 11:39 AM, Alex Schultz wrote:

On Tue, Sep 26, 2017 at 11:57 PM, Tony Breeds  wrote:

On Tue, Sep 26, 2017 at 10:31:59PM -0700, Emilien Macchi wrote:

On Tue, Sep 26, 2017 at 10:17 PM, Tony Breeds  wrote:

With that in mind I'd suggest that your review isn't appropriate for


If we have to give up backports that help customers to get
production-ready environments, I would consider giving up stable
policy tag which probably doesn't fit for projects like installers. In
a real world, users don't deploy master or Pike (even not Ocata) but
are still on Liberty, and most of the time Newton.


I agree the stable policy doesn't map very well to deployment projects
and that's something I'd like to address.  I admit I'm not certain *how*
to address it but it almost certainly starts with a discussion like this
;P

I've proposed a forum session to further this discussion, even if that
doesn't happen there's always the hall-way track :)



One idea would be to allow trailing projects additional trailing on
the phases as well.  Honestly 2 weeks for trailing for just GA is hard
enough. Let alone the fact that the actual end-users are 18+ months
behind.  For some deployment project like tripleo, there are sections
that should probably follow stable-policy as it exists today but
elements where there's 3rd party integration or upgrade implications
(in the case of tripleo, THT/puppet-tripleo) and they need to be more
flexible to modify things as necessary.  The word 'feature' isn't
necessarily the same for these projects than something like
nova/neutron/etc.


What proposing Giulio probably comes from the real world, the field,
who actually manage OpenStack at scale and on real environments (not
in devstack from master). If we can't have this code in-tree, we'll
probably carry this patch downstream (which is IMHO bad because of
maintenance and lack of CI). In that case, I'll vote to give up
stable:follows-policy so we can do what we need.


Rather than give up on the stable:follows policy tag it is possibly
worth looking at which portions of tripleo make that assertion.

In this specific case, there isn't anything in the bug that indicates
it comes from a user report which is all the stable team has to go on
when making these types of decisions.



We'll need to re-evaulate what stable-policy means for tripleo.  We
don't want to allow the world for backporting but we also want to
reduce the patches carried downstream for specific use cases.  I think
in the case of 3rd party integrations we need a better definition of
what that means and perhaps creating a new repository like THT-extras
that doesn't follow stable-policy while the main one does.


It's a little weird because essentially we want to provide a higher 
level of support for stable branches than most of OpenStack.  My 
understanding is that a lot of the current stable branch policy came out 
of the fact that there was a great deal of apathy toward stable branches 
in upstream OpenStack and it just wasn't possible to say we'd do more 
than critical bug and security fixes for older releases.  Maybe we need 
a stable-policy-plus tag or something for projects that can and want to 
do more.  And feel free to correct me if I'm misinterpreted the 
historical discussions on this. :-)


That said, I'm staunchly opposed to feature backports.  While I think it 
makes perfect sense to allow backports like Giulio's, I was here when we 
wasted the entire Mitaka cycle backporting things to Liberty and Kilo. 
Sure, you can say we'll just be disciplined and pick and choose what we 
backport, but I'm pretty sure we said the same thing back then.  It's a 
lot harder to say no when a customer/partner/your manager starts pushing 
for something and you have no policy to back you up.


If we need to allow feature-ish backports for third-party, then I think 
the third-party bits need to be split out into their own repo (they 
probably should have been anyway) that has a different support policy. 
I suppose we could try to implement that by convention in current tht, 
but that will likely get messy when someone wants to backport a feature 
that touches both third-party and core tht bits.


I guess maybe this is all going back to what we discussed at the PTG 
retrospective about needing better modularity in TripleO.  Instead of 
having this monolithic all-singing, all-dancing tht repo that includes 
the world, we need a well-defined interface for vendors to plug their 
bits into TripleO so they can live where they want and be managed how 
they want.


It feels a little weird to me to be arguing this side of it because I'm 
pretty sure I've argued against splitting repos in the past.  But I 
think I would not say we kick all the vendor-integration bits out if we 
do this, just that we provide the option for vendors to have their own 
repos with their own stable backport policies without having to change 
the policy for all of

[openstack-dev] September 29 Price Increase & Forum Submission Deadline - OpenStack Summit Sydney

2017-09-27 Thread Allison Price

Hi everyone,

Prices for the OpenStack Summit Sydney 
 will be increasing this Friday, 
September 29 at 11:59pm Pacific Time (September 30 at 6:59 UTC).

Register now  before the 
price increases!

Also a reminder that Friday is the deadline for Forum submissions. Submit here 
. 

All discount registration codes must be redeemed by October 27.

If you have any Summit-related questions, please contact sum...@openstack.org 
.

Cheers,
Allison

Allison Price
OpenStack Foundation
alli...@openstack.org


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Newton End-Of-Life (EOL) next month (reminder #1)

2017-09-27 Thread Emilien Macchi

On Wed, Sep 27, 2017 at 9:39 AM, Alex Schultz  wrote:
[...]
> We'll need to re-evaulate what stable-policy means for tripleo.  We
> don't want to allow the world for backporting but we also want to
> reduce the patches carried downstream for specific use cases.  I think
> in the case of 3rd party integrations we need a better definition of
> what that means and perhaps creating a new repository like THT-extras
> that doesn't follow stable-policy while the main one does.

Thanks Alex for the notes, while I agree with you, I proposed:
https://review.openstack.org/507924 in the meantime.

I'm not entirely sure about the THT-extras and the fact it would add
another layer of complexity, but I'm happy to discuss about it.

Tony, Alex, Steve, (others of course) - if you can look at the
governance change and give feedback on it, that would help.

Thanks,
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] l2gw

2017-09-27 Thread Ricardo Noriega De Soto

Hey Lajos,

Is this the exception you are encountering?

(neutron) l2-gateway-update --device name=hwvtep,interface_names=eth0,eth1
gw1
L2 Gateway 'b8ef7f98-e901-4ef5-b159-df53364ca996' still has active mappings
with one or more neutron networks.
Neutron server returns request_ids:
['req-f231dc53-cb7d-4221-ab74-fa8715f85869']

I don't see the L2GatewayInUse exception you're talking about, but I guess
it's the same situation.

We should discuss in which case the l2gw instance could be updated, and in
which cases it shouldn't.

Please, let me know!

On Wed, Aug 16, 2017 at 11:14 AM, Lajos Katona 
wrote:

> Hi,
>
> We faced an issue with l2-gw-update, which means that actually if there
> are connections for a gw the update will throw an exception
> (L2GatewayInUse), and the update is only possible after deleting first the
> connections, do the update and add the connections back.
>
> It is not exactly clear why this restriction is there in the code (at
> least I can't find it in docs or comments in the code, or review).
> As I see the check for network connections was introduced in this patch:
> https://review.openstack.org/#/c/144097 (https://review.openstack.org/
> #/c/144097/21..22/networking_l2gw/db/l2gateway/l2gateway_db.py)
>
> Could you please give me a little background why the update operation is
> not allowed on an l2gw with network connections?
>
> Thanks in advance for the help.
>
> Regards
> Lajos
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

-- 
Ricardo Noriega

Senior Software Engineer - NFV Partner Engineer | Office of Technology  |
Red Hat
irc: rnoriega @freenode
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [tripleo] Newton End-Of-Life (EOL) next month (reminder #1)

2017-09-27 Thread Alex Schultz

On Tue, Sep 26, 2017 at 11:57 PM, Tony Breeds  wrote:
> On Tue, Sep 26, 2017 at 10:31:59PM -0700, Emilien Macchi wrote:
>> On Tue, Sep 26, 2017 at 10:17 PM, Tony Breeds  
>> wrote:
>> > With that in mind I'd suggest that your review isn't appropriate for
>>
>> If we have to give up backports that help customers to get
>> production-ready environments, I would consider giving up stable
>> policy tag which probably doesn't fit for projects like installers. In
>> a real world, users don't deploy master or Pike (even not Ocata) but
>> are still on Liberty, and most of the time Newton.
>
> I agree the stable policy doesn't map very well to deployment projects
> and that's something I'd like to address.  I admit I'm not certain *how*
> to address it but it almost certainly starts with a discussion like this
> ;P
>
> I've proposed a forum session to further this discussion, even if that
> doesn't happen there's always the hall-way track :)
>

One idea would be to allow trailing projects additional trailing on
the phases as well.  Honestly 2 weeks for trailing for just GA is hard
enough. Let alone the fact that the actual end-users are 18+ months
behind.  For some deployment project like tripleo, there are sections
that should probably follow stable-policy as it exists today but
elements where there's 3rd party integration or upgrade implications
(in the case of tripleo, THT/puppet-tripleo) and they need to be more
flexible to modify things as necessary.  The word 'feature' isn't
necessarily the same for these projects than something like
nova/neutron/etc.

>> What proposing Giulio probably comes from the real world, the field,
>> who actually manage OpenStack at scale and on real environments (not
>> in devstack from master). If we can't have this code in-tree, we'll
>> probably carry this patch downstream (which is IMHO bad because of
>> maintenance and lack of CI). In that case, I'll vote to give up
>> stable:follows-policy so we can do what we need.
>
> Rather than give up on the stable:follows policy tag it is possibly
> worth looking at which portions of tripleo make that assertion.
>
> In this specific case, there isn't anything in the bug that indicates
> it comes from a user report which is all the stable team has to go on
> when making these types of decisions.
>

We'll need to re-evaulate what stable-policy means for tripleo.  We
don't want to allow the world for backporting but we also want to
reduce the patches carried downstream for specific use cases.  I think
in the case of 3rd party integrations we need a better definition of
what that means and perhaps creating a new repository like THT-extras
that doesn't follow stable-policy while the main one does.

Thanks,
-Alex

> Yours Tony.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [all][infra] Zuul v3 migration update

2017-09-27 Thread Jay Pipes


On 09/27/2017 03:49 AM, Flavio Percoco wrote:
Just wanted to say thanks to all of you for the hard work. I can only 
imagine

how hard it must be to do this migration without causing downtimes.


+1000

Thank you so much for the hard work the infra team has put into making 
this migration as painless for the community as possible. Your efforts 
have certainly not gone unnoticed.


All the best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][DIB] how create triplo overcloud image with latest kernel?

2017-09-27 Thread Yolanda Robla Mota

If you need a guideline about how to build TripleO images with DIB, i have
that blogpost:
http://teknoarticles.blogspot.com.es/2017/07/build-and-use-security-hardened-images.html

This if for security hardened images, but your replace
"overcloud-hardened-images" by "overcloud-images", it will build the
default one. You can specify the base image you want to use, as well as
enable any repo you have, that may take the latest kernel.

Hope it helps!

On Wed, Sep 27, 2017 at 5:21 PM, Brad P. Crochet  wrote:

>
> On Tue, Sep 26, 2017 at 2:58 PM Ben Nemec  wrote:
>
>>
>>
>> On 09/26/2017 05:43 AM, Moshe Levi wrote:
>> > Hi all,
>> >
>> > As part of the OVS Hardware Offload [1] [2],  we need to create new
>> > Centos/Redhat 7 image  with latest kernel/ovs/iproute.
>> >
>> > We tried to use virsh-customize to install the packages and we were able
>> > to update iproute and ovs, but for the kernel there is no space.
>> >
>> > We also tried with virsh-customize to uninstall the old kenrel but we no
>> > luck.
>> >
>> > Is other ways to replace kernel  package in existing image?
>>
>> Do you have to use an existing image?  The easiest way to do this would
>> be to create a DIB element that installs what you want and just include
>> that in the image build in the first place.  I don't think that would be
>> too difficult to do now that we're keeping the image definitions in
>> simple YAML files.
>>
>>
> If it is just packages, a DIB element wouldn't even be necessary. You
> could define a new yaml that just adds the packages that you want, and add
> that to the CLI when you build the images.
>
>
>> >
>> > [1] - https://review.openstack.org/#/c/504911/
>> > > https%3A%2F%2Freview.openstack.org%2F%23%2Fc%2F504911%2F=02%7C01%
>> 7Cmoshele%40mellanox.com%7Cc801dab0778b428e226508d504c97ecf%
>> 7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636420185839119329=
>> 6oEmh0LJacV3WPGGp3wW%2BhL3nPDxRh%2BzNPY67X09Blc%3D=0>
>> >
>> >
>> > [2] - https://review.openstack.org/#/c/502313/
>> > > https%3A%2F%2Freview.openstack.org%2F%23%2Fc%2F502313%2F=02%7C01%
>> 7Cmoshele%40mellanox.com%7Cc801dab0778b428e226508d504c97ecf%
>> 7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636420185839119329=
>> EsydZ9EsUjkYcF92Gys569SJEvQ%2B%2Fu6uV8WAQJ0YMfc%3D=0>
>> >
>> >
>> >
>> >
>> > 
>> __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
>> unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
>> unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> --
> Brad P. Crochet, RHCA, RHCE, RHCVA, RHCDS
> Principal Software Engineer
> (c) 704.236.9385 <(704)%20236-9385>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 

Yolanda Robla Mota

Principal Software Engineer, RHCE

Red Hat



C/Avellana 213

Urb Portugal

yrobl...@redhat.comM: +34605641639


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [TripleO][DIB] how create triplo overcloud image with latest kernel?

2017-09-27 Thread Brad P. Crochet

On Tue, Sep 26, 2017 at 2:58 PM Ben Nemec  wrote:

>
>
> On 09/26/2017 05:43 AM, Moshe Levi wrote:
> > Hi all,
> >
> > As part of the OVS Hardware Offload [1] [2],  we need to create new
> > Centos/Redhat 7 image  with latest kernel/ovs/iproute.
> >
> > We tried to use virsh-customize to install the packages and we were able
> > to update iproute and ovs, but for the kernel there is no space.
> >
> > We also tried with virsh-customize to uninstall the old kenrel but we no
> > luck.
> >
> > Is other ways to replace kernel  package in existing image?
>
> Do you have to use an existing image?  The easiest way to do this would
> be to create a DIB element that installs what you want and just include
> that in the image build in the first place.  I don't think that would be
> too difficult to do now that we're keeping the image definitions in
> simple YAML files.
>
>
If it is just packages, a DIB element wouldn't even be necessary. You could
define a new yaml that just adds the packages that you want, and add that
to the CLI when you build the images.


> >
> > [1] - https://review.openstack.org/#/c/504911/
> > <
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Freview.openstack.org%2F%23%2Fc%2F504911%2F=02%7C01%7Cmoshele%40mellanox.com%7Cc801dab0778b428e226508d504c97ecf%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636420185839119329=6oEmh0LJacV3WPGGp3wW%2BhL3nPDxRh%2BzNPY67X09Blc%3D=0
> >
> >
> >
> > [2] - https://review.openstack.org/#/c/502313/
> > <
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Freview.openstack.org%2F%23%2Fc%2F502313%2F=02%7C01%7Cmoshele%40mellanox.com%7Cc801dab0778b428e226508d504c97ecf%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636420185839119329=EsydZ9EsUjkYcF92Gys569SJEvQ%2B%2Fu6uV8WAQJ0YMfc%3D=0
> >
> >
> >
> >
> >
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-- 
Brad P. Crochet, RHCA, RHCE, RHCVA, RHCDS
Principal Software Engineer
(c) 704.236.9385
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM


On 09/27/2017 08:01 AM, Blair Bethwaite wrote:

On 27 September 2017 at 23:19, Jakub Jursa  wrote:

'hw:cpu_policy=dedicated' (while NOT setting 'hw:numa_nodes') results in
libvirt pinning CPU in 'strict' memory mode

(from libvirt xml for given instance)
...
   
 
 
   
...

So yeah, the instance is not able to allocate memory from another NUMA node.


I can't recall what the docs say on this but I wouldn't be surprised
if that was a bug. Though I do think most users would want CPU & NUMA
pinning together (you haven't shared your use case but perhaps you do
too?).


Not a bug.  Once you enable CPU pinning we assume you care about performance, 
and for max performance you need NUMA affinity as well.  (And hugepages are 
beneficial too.)



I'm not quite sure what do you mean by 'memory will be locked for the
guest'. Also, aren't huge pages enabled in kernel by default?


I think that suggestion was probably referring to static hugepages,
which can be reserved (per NUMA node) at boot and then (assuming your
host is configured correctly) QEMU will be able to back guest RAM with
them.


One nice thing about static hugepages is that you pre-allocate them at startup, 
so you can decide on a per-NUMA-node basis how much 4K memory you want to leave 
for incidental host stuff and qemu overhead.  This lets you specify different 
amounts of "host-reserved" memory on different NUMA nodes.


In order to use static hugepages for the guest you need to explicitly ask for a 
page size of 2MB.  (1GB is possible as well but in most cases doesn't buy you 
much compared to 2MB.)


Lastly, qemu has overhead that varies depending on what you're doing in the 
guest.  In particular, there are various IO queues that can consume significant 
amounts of memory.  The company that I work for put in a good bit of effort 
engineering things so that they work more reliably, and part of that was 
determining how much memory to reserve for the host.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [FEMDC] IRC Meeting today 15:00 UTC

2017-09-27 Thread Paul-Andre Raymond

Below is the link to the etherpad for our meeting.



On 9/27/17, 10:01 AM, "Paul-Andre Raymond"  
wrote:

Dear all, 

A gentle reminder for our meeting today (an hour from now). 
I believe today will be a short meeting.
Draft agenda was prepared by our friends from INRIA at   
https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2017 (line 
1237)

Please feel free to add items.

Best, 

 
Paul-André
--
 




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM


On 09/27/2017 03:12 AM, Jakub Jursa wrote:



On 27.09.2017 10:40, Blair Bethwaite wrote:

On 27 September 2017 at 18:14, Stephen Finucane  wrote:

What you're probably looking for is the 'reserved_host_memory_mb' option. This
defaults to 512 (at least in the latest master) so if you up this to 4192 or
similar you should resolve the issue.


I don't see how this would help given the problem description -
reserved_host_memory_mb would only help avoid causing OOM when
launching the last guest that would otherwise fit on a host based on
Nova's simplified notion of memory capacity. It sounds like both CPU
and NUMA pinning are in play here, otherwise the host would have no
problem allocating RAM on a different NUMA node and OOM would be
avoided.


I'm not quite sure if/how OpenStack handles NUMA pinning (why is VM
being killed by OOM rather than having memory allocated on different
NUMA node). Anyway, good point, thank you, I should have a look at exact
parameters passed to QEMU when using CPU pinning.


OpenStack uses strict memory pinning when using CPU pinning and/or memory 
hugepages, so all allocations are supposed to be local.  When it can't allocate 
locally, it triggers OOM.


Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack-operators] [tc][nova][ironic][mogan] Evaluate Mogan project

2017-09-27 Thread Sean Dague

On 09/27/2017 09:31 AM, Julia Kreger wrote:
> [...]
>>> The short explanation which clicked for me (granted it's probably an
>>> oversimplification, but still) was this: Ironic provides an admin
>>> API for managing bare metal resources, while Mogan gives you a user
>>> API (suitable for public cloud use cases) to your Ironic backend. I
>>> suppose it could have been implemented in Ironic, but implementing
>>> it separately allows Ironic to be agnostic to multiple user
>>> frontends and also frees the Ironic team up from having to take on
>>> yet more work directly.
>>
>>
>> ditto!
>>
>> I had a similar question at the PTG and this was the answer that convinced
>> be
>> may be worth the effort.
>>
>> Flavio
>>
> 
> For Ironic, the question did come at the PTG up of tenant aware
> scheduling of owned hardware, as in Customer A and B are managed by
> the same ironic, only customer A's users should be able to schedule on
> to Customer A's hardware, with API access control restrictions such
> that specific customer can take action on their own hardware.
> 
> If we go down the path of supporting such views/logic, it could become
> a massive undertaking for Ironic, so there is absolutely a plus to
> something doing much of that for Ironic. Personally, I think Mogan is
> a good direction to continue to explore. That being said, we should
> improve our communication of plans/directions/perceptions between the
> teams so we don't adversely impact each other and see where we can
> help each other moving forward.

My biggest concern with Mogan is that it forks Nova, then starts
changing interfaces. Nova's got 2 really big API surfaces.

1) The user facing API, which is reasonably well documented, and under
tight control. Mogan has taken key things at 95% similarity and changed
bits. So servers includes things like a partitions parameter.
https://github.com/openstack/mogan/blob/master/api-ref/source/v1/servers.inc#request-4

This being nearly the same but slightly different ends up being really
weird. Especially as Nova evolves it's code with microversions for
things like embedded flavor info.

2) The guest facing API of metadata/config drive. This is far less
documented or tested, and while we try to be strict about adding in
information here in a versioned way, it's never seen the same attention
as the user API on either documentation or version rigor.

That's presumably getting changed, going to drift as well, which means
discovering multiple implementations that are nearly, but not exactly
the same that drift.

The point of licensing things under and Apache 2 license was to enable
folks to do all kind of experiments like this. And experiments are good.
But part of the point of experiments is to learn lessons to bring back
into the fold. Digging out of the multi year hole of "close but not
exactly the same" API differences between nova-net and neutron really
makes me want to make sure we never intentionally inflict that confusion
on folks again.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [acceleration]Cyborg Weekly team Meeting 2017.09.27

2017-09-27 Thread Zhipeng Huang

Hi Team,

Our regular meeting will start at about 30 mins later. Agenda could be
found at
https://wiki.openstack.org/wiki/Meetings/CyborgTeamMeeting#Agenda_for_next_meeting

-- 
Zhipeng (Howard) Huang

Standard Engineer
IT Standard & Patent/IT Product Line
Huawei Technologies Co,. Ltd
Email: huangzhip...@huawei.com
Office: Huawei Industrial Base, Longgang, Shenzhen

(Previous)
Research Assistant
Mobile Ad-Hoc Network Lab, Calit2
University of California, Irvine
Email: zhipe...@uci.edu
Office: Calit2 Building Room 2402

OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

On 27 September 2017 at 23:19, Jakub Jursa  wrote:
> 'hw:cpu_policy=dedicated' (while NOT setting 'hw:numa_nodes') results in
> libvirt pinning CPU in 'strict' memory mode
>
> (from libvirt xml for given instance)
> ...
>   
> 
> 
>   
> ...
>
> So yeah, the instance is not able to allocate memory from another NUMA node.

I can't recall what the docs say on this but I wouldn't be surprised
if that was a bug. Though I do think most users would want CPU & NUMA
pinning together (you haven't shared your use case but perhaps you do
too?).

> I'm not quite sure what do you mean by 'memory will be locked for the
> guest'. Also, aren't huge pages enabled in kernel by default?

I think that suggestion was probably referring to static hugepages,
which can be reserved (per NUMA node) at boot and then (assuming your
host is configured correctly) QEMU will be able to back guest RAM with
them.

You are probably thinking of THP (transparent huge pages) which are
now on by default in Linux but can be somewhat hit & miss if you have
a long running host where memory has become fragmented or the
pagecache is large - in our experience performance can be severely
degraded by just missing hugepage backing of a small fraction of guest
memory, and we have noticed behaviour from memory management where THP
allocations fail when pagecache is highly utilised despite none of it
being dirty (so should be able to be dropped immediately).

-- 
Cheers,
~Blairo

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [FEMDC] IRC Meeting today 15:00 UTC

2017-09-27 Thread Paul-Andre Raymond

Dear all, 

A gentle reminder for our meeting today (an hour from now). 
I believe today will be a short meeting.
Draft agenda was prepared by our friends from INRIA at   
https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2017 (line 
1237)

Please feel free to add items.

Best, 

 
Paul-André
--
 


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [glance] Queens PTG: Thursday summary

2017-09-27 Thread Belmiro Moreira

Not ideal because we also have the real usecase for community images.
When users start to create/use community images these different use cases
(old public and real community) will be mixed.

Cheers,
Belmiro


On Wed, 27 Sep 2017 at 15:37, Blair Bethwaite 
wrote:

> On 27 September 2017 at 22:40, Belmiro Moreira
>  wrote:
> > In the past we used the tabs but latest Horizon versions use the
> visibility
> > column/search instead.
> > The issue is that we would like the old images to continue to be
> > discoverable by everyone and have a image list that only shows the latest
> > ones.
>
> Yeah I think we hit that as well and have a patch for category
> listing. It's not something I have worked on but Sam can fill the
> gaps... or it could be that this is actually the last problem we have
> left with upgrading to a current version of the dashboard and so are
> effectively in the same boat.
>
> > We are now using the “community” visibility to hide the old images from
> the
> > default image list. But it’s not ideal.
>
> Not ideal because you don't want them discoverable at all?
>
> > I will move the old spec about image lifecycle to glance.
> > https://review.openstack.org/#/c/327980/
>
> Looks like a useful spec!
>
> --
> Cheers,
> ~Blairo
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [glance] Queens PTG: Thursday summary

On 27 September 2017 at 22:40, Belmiro Moreira
 wrote:
> In the past we used the tabs but latest Horizon versions use the visibility
> column/search instead.
> The issue is that we would like the old images to continue to be
> discoverable by everyone and have a image list that only shows the latest
> ones.

Yeah I think we hit that as well and have a patch for category
listing. It's not something I have worked on but Sam can fill the
gaps... or it could be that this is actually the last problem we have
left with upgrading to a current version of the dashboard and so are
effectively in the same boat.

> We are now using the “community” visibility to hide the old images from the
> default image list. But it’s not ideal.

Not ideal because you don't want them discoverable at all?

> I will move the old spec about image lifecycle to glance.
> https://review.openstack.org/#/c/327980/

Looks like a useful spec!

-- 
Cheers,
~Blairo

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] reset key pair during rebuilding

2017-09-27 Thread Marcus Furlong

On 27 September 2017 at 10:55, Sean Dague  wrote:
> On 09/27/2017 05:15 AM, Marcus Furlong wrote:
>> On 27 September 2017 at 09:23, Michael Still  wrote:
>>>
>>> Operationally, why would I want to inject a new keypair? The scenario I can
>>> think of is that there's data in that instance that I want, and I've lost
>>> the keypair somehow. Unless that data is on an ephemeral, its gone if we do
>>> a rebuild.
>>
>> This is quite a common scenario - staff member who started the
>> instance leaves, and you want to access data on the instance, or
>> maintain/debug the service running on the instance.
>>
>> Hitherto, I have used direct db calls to update the key, so it would
>> be nice if there was an API call to do so.
>
> But you also triggered a rebuild in the process? Or you tweaked the keys
> and did a reboot? This use case came up in the room, but then we started
> trying to figure out if the folks that mostly had it would also need it
> on reboot.

No rebuild, no.

Update the key name and reboot, or, if someone has access, re-run cloud-init.

# rm -fr /var/lib/cloud/instance/sem/
# cloud-init --single -n ssh

Have also thought about just adding the above to a cronjob in the
images to facilitate this scenario (thus avoiding a reboot if noone
has access).

Cheers,
Marcus.

-- 
Marcus Furlong

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Openstack-operators] [tc][nova][ironic][mogan] Evaluate Mogan project

2017-09-27 Thread Julia Kreger

[...]
>> The short explanation which clicked for me (granted it's probably an
>> oversimplification, but still) was this: Ironic provides an admin
>> API for managing bare metal resources, while Mogan gives you a user
>> API (suitable for public cloud use cases) to your Ironic backend. I
>> suppose it could have been implemented in Ironic, but implementing
>> it separately allows Ironic to be agnostic to multiple user
>> frontends and also frees the Ironic team up from having to take on
>> yet more work directly.
>
>
> ditto!
>
> I had a similar question at the PTG and this was the answer that convinced
> be
> may be worth the effort.
>
> Flavio
>

For Ironic, the question did come at the PTG up of tenant aware
scheduling of owned hardware, as in Customer A and B are managed by
the same ironic, only customer A's users should be able to schedule on
to Customer A's hardware, with API access control restrictions such
that specific customer can take action on their own hardware.

If we go down the path of supporting such views/logic, it could become
a massive undertaking for Ironic, so there is absolutely a plus to
something doing much of that for Ironic. Personally, I think Mogan is
a good direction to continue to explore. That being said, we should
improve our communication of plans/directions/perceptions between the
teams so we don't adversely impact each other and see where we can
help each other moving forward.

-Julia

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM



On 27.09.2017 14:46, Sahid Orentino Ferdjaoui wrote:
> On Mon, Sep 25, 2017 at 05:36:44PM +0200, Jakub Jursa wrote:
>> Hello everyone,
>>
>> We're experiencing issues with running large instances (~60GB RAM) on
>> fairly large NUMA nodes (4 CPUs, 256GB RAM) while using cpu pinning. The
>> problem is that it seems that in some extreme cases qemu/KVM can have
>> significant memory overhead (10-15%?) which nova-compute service doesn't
>> take in to the account when launching VMs. Using our configuration as an
>> example - imagine running two VMs with 30GB RAM on one NUMA node
>> (because we use cpu pinning) - therefore using 60GB out of 64GB for
>> given NUMA domain. When both VMs would consume their entire memory
>> (given 10% KVM overhead) OOM killer takes an action (despite having
>> plenty of free RAM in other NUMA nodes). (the numbers are just
>> arbitrary, the point is that nova-scheduler schedules the instance to
>> run on the node because the memory seems 'free enough', but specific
>> NUMA node can be lacking the memory reserve).
> 
> In Nova when using NUMA we do pin the memory on the host NUMA nodes
> selected during scheduling. In your case it seems that you have
> specificly requested a guest with 1 NUMA node. It will be not possible
> for the process to grab memory on an other host NUMA node but some
> other processes could be running in that host NUMA node and consume
> memory.

Yes, that is very likely the case - that some other processes consume
the memory on given NUMA node. It seems that setting flavor metadata
'hw:cpu_policy=dedicated' (while NOT setting 'hw:numa_nodes') results in
libvirt pinning CPU in 'strict' memory mode

(from libvirt xml for given instance)
...
  


  
...

So yeah, the instance is not able to allocate memory from another NUMA node.

> 
> What you need is to use Huge Pages, in such case the memory will be
> locked for the guest.

I'm not quite sure what do you mean by 'memory will be locked for the
guest'. Also, aren't huge pages enabled in kernel by default?

> 
>> Our initial solution was to use ram_allocation_ratio < 1 to ensure
>> having some reserved memory - this didn't work. Upon studying source of
>> nova, it turns out that ram_allocation_ratio is ignored when using cpu
>> pinning. (see
>> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L859
>> and
>> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L821
>> ). We're running Mitaka, but this piece of code is implemented in Ocata
>> in a same way.
>> We're considering to create a patch for taking ram_allocation_ratio in
>> to account.
>>
>> My question is - is ram_allocation_ratio ignored on purpose when using
>> cpu pinning? If yes, what is the reasoning behind it? And what would be
>> the right solution to ensure having reserved RAM on the NUMA nodes?
>>
>> Thanks.
>>
>> Regards,
>>
>> Jakub Jursa
>>
> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [nova] how does UEFI booting of VM manage per-instance copies of OVMF_VARS.fd ?

2017-09-27 Thread Waines, Greg

Hey there ... a question about UEFI booting of VMs.
i.e.

glance image-create --file cloud-2730. qcow --disk-format qcow2 
--container-format bare --property “hw-firmware-type=uefi” --name 
clear-linux-image

in order to specify that you want to use UEFI (instead of BIOS) when booting 
VMs with this image
i.e./usr/share/OVMF/OVMF_CODE.fd
  /usr/share/OVMF/OVMF_VARS.fd

and I believe you can boot into the UEFI Shell, i.e. to change UEFI variables 
in NVRAM (OVMF_VARS.fd) by
booting VM with /usr/share/OVMF/UefiShell.iso in cd ...
e.g. to changes Secure Boot keys or something like that.

My QUESTION ...

· how does NOVA manage a unique instance of OVMF_VARS.fd for each 
instance ?

oi believe OVMF_VARS.fd is suppose to just be used as a template, and
is supposed to be copied to make a unique instance for each VM that UEFI boots

ohow does NOVA manage this ?

§  e.g. is the unique instance of OVMF_VARS.fd created in
 /etc/nova/instances//  ?

o... and does this get migrated to another compute if VM is migrated ?

Greg.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [FEMDC] IRC meeting Today 15:00 UTC

2017-09-27 Thread lebre . adrien

Dear all, 

A gentle reminder of the FEDMC meeting, Today at 15:00 UTC. 
The agenda is available at: 
https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2017 (line 
1237)

Please feel free to complete it

Best,
ad_rien_
PS: Inria's members will not be able to attend our IC meeting (midterm review 
of the Discovery initiative), Paul-Andre will chair the discussion. 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Sahid Orentino Ferdjaoui

On Mon, Sep 25, 2017 at 05:36:44PM +0200, Jakub Jursa wrote:
> Hello everyone,
> 
> We're experiencing issues with running large instances (~60GB RAM) on
> fairly large NUMA nodes (4 CPUs, 256GB RAM) while using cpu pinning. The
> problem is that it seems that in some extreme cases qemu/KVM can have
> significant memory overhead (10-15%?) which nova-compute service doesn't
> take in to the account when launching VMs. Using our configuration as an
> example - imagine running two VMs with 30GB RAM on one NUMA node
> (because we use cpu pinning) - therefore using 60GB out of 64GB for
> given NUMA domain. When both VMs would consume their entire memory
> (given 10% KVM overhead) OOM killer takes an action (despite having
> plenty of free RAM in other NUMA nodes). (the numbers are just
> arbitrary, the point is that nova-scheduler schedules the instance to
> run on the node because the memory seems 'free enough', but specific
> NUMA node can be lacking the memory reserve).

In Nova when using NUMA we do pin the memory on the host NUMA nodes
selected during scheduling. In your case it seems that you have
specificly requested a guest with 1 NUMA node. It will be not possible
for the process to grab memory on an other host NUMA node but some
other processes could be running in that host NUMA node and consume
memory.

What you need is to use Huge Pages, in such case the memory will be
locked for the guest.

> Our initial solution was to use ram_allocation_ratio < 1 to ensure
> having some reserved memory - this didn't work. Upon studying source of
> nova, it turns out that ram_allocation_ratio is ignored when using cpu
> pinning. (see
> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L859
> and
> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L821
> ). We're running Mitaka, but this piece of code is implemented in Ocata
> in a same way.
> We're considering to create a patch for taking ram_allocation_ratio in
> to account.
> 
> My question is - is ram_allocation_ratio ignored on purpose when using
> cpu pinning? If yes, what is the reasoning behind it? And what would be
> the right solution to ensure having reserved RAM on the NUMA nodes?
> 
> Thanks.
> 
> Regards,
> 
> Jakub Jursa
> 

> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [glance] Queens PTG: Thursday summary

2017-09-27 Thread Belmiro Moreira

Hi Blair,
In the past we used the tabs but latest Horizon versions use the visibility
column/search instead.
The issue is that we would like the old images to continue to be
discoverable by everyone and have a image list that only shows the latest
ones.
If the images continue to be public they will be shown by the CLIs in the
default image-list. In our case the list was very long.

We are now using the “community” visibility to hide the old images from the
default image list. But it’s not ideal.
I will move the old spec about image lifecycle to glance.
https://review.openstack.org/#/c/327980/

Cheers,
Belmiro


On Wed, 27 Sep 2017 at 00:25, Blair Bethwaite 
wrote:

> Hi Belmiro,
>
>
> On 20 Sep. 2017 7:58 pm, "Belmiro Moreira" <
> moreira.belmiro.email.li...@gmail.com> wrote:
> > Discovering the latest image release is hard. So we added an image
> property "recommended"
> > that we update when a new image release is available. Also, we patched
> horizon to show
> > the "recommended" images first.
>
> There is built in support in Horizon that allows displaying multiple image
> category tabs where each takes contents from the list of images owned by a
> specific project/tenant. In the Nectar research cloud this is what we rely
> on to distinguish between "Public", "Project", "Nectar" (the base images we
> maintain), and "Contributed" (images contributed by users who wish them to
> be tested by us and effectively promoted as quality assured). When we
> update a "Nectar" or "Contributed" image the old version stays public but
> is moved into a project for deprecated images of that category, where
> eventually we can clean it up.
>
>
> > This helps our users to identify the latest image release but we
> continue to show for
> > each project the full list of public images + all personal user images.
>
> Could you use the same model as us?
>
> Cheers,
> b1airo
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] OpenStack-Ansible and Trove support

2017-09-27 Thread Jean-Philippe Evrard

Hello Michael,

On top of that, we intend to have a "role maturity" that will include
when the role was proposed and it's current maturity phase, for more
clarity, not unlike openstack project navigator.

Our os_trove role has not received many commits recently, and the
"maintenance mode" of Trove will probably impact you in the future.
Do you intend to keep a trove installation in production, or do you
want to do a PoC?

Best regards,
JP

On Wed, Sep 27, 2017 at 12:24 AM, Amy Marrich  wrote:
> Michael,
>
> There are release notes for each release that will go over what's new,
> what's on it's way out or even gone as well as bug fixes and other
> information. Here's a link to the Ocata release notes for OpenStack-Ansible
> which includes the announcement of the Trove role.
>
> https://docs.openstack.org/releasenotes/openstack-ansible/ocata.html
>
> Thanks,
>
> Amy (spotz)
>
> On Tue, Sep 26, 2017 at 6:04 PM, Michael Gale 
> wrote:
>>
>> Hello,
>>
>>Based on github and
>> https://docs.openstack.org/openstack-ansible-os_trove/latest/ it looks like
>> OpenStack-Ansible will support Trove under the Ocata release.
>>
>> Is that assumption correct? is there a better method to determine when a
>> software component will likely be included in a release?
>>
>> Michael
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] OpenStack-Ansible testing with OpenVSwitch

2017-09-27 Thread Jean-Philippe Evrard

Hello,

We currently don't have a full scenario for openvswitch for an easy
"one line" install.
It still deserves more love. You could come on our channel in
#openstack-ansible to discuss about it if you want. But the general
idea should be close to the same explained in the blog post.

Best regards,
JP

On Wed, Sep 27, 2017 at 12:13 AM, Michael Gale  wrote:
> Hello,
>
> I am trying to build a Pike All-in-One instance for OpenStack Ansible
> testing, currently I have a few OpenStack versions being deployed using the
> default Linux Bridge implementation.
>
> However I need a test environment to validate OpenVSwitch implementation, is
> there a simple method to get an AIO installed?
>
> I tried following
> https://medium.com/@travistruman/configuring-openstack-ansible-for-open-vswitch-b7e70e26009d
> however Neutron is blowing up because it can't determine the name for the
> Neutron Server. I am not sure if that is my issue or not, a reference
> implementation of OpenStack AIO with OpenVSwitch would help me a lot.
>
> Thanks
> Michael
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [vitrage] Vitrage virtual PTG

2017-09-27 Thread Afek, Ifat (Nokia - IL/Kfar Sava)

Hi,

We will hold the Vitrage virtual PTG on October 17-19. I have created an 
initial schedule draft, you are more than welcome to comment or suggest new 
topics for discussion:
https://etherpad.openstack.org/p/vitrage-ptg-queens

Best Regards,
Ifat.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [keystone] [keystoneauth] Debug data isn't sanitized - bug 1638978

2017-09-27 Thread Bhor, Dinesh

Hi Team,

There are four solutions to fix the below bug:
https://bugs.launchpad.net/keystoneauth/+bug/1638978

1) Carry a copy of mask_password() method to keystoneauth from oslo_utils [1]:
Pros:
A. keystoneauth will use already tested and used version of mask_password.

Cons:
A. keystoneauth will have to keep the version of mask_password() method sync
with oslo_utils version.
If there are any new "_SANITIZE_KEYS" added to oslo_utils mask_password
then those should be added in keystoneauth mask_password also.
B. Copying the "mask_password" will also require to copy its supporting code
[2] which is huge.

2) Use Oslo.utils mask_password() method in keystoneauth:
Pros:
A) No synching issue as described in solution #1. keystoneauth will directly
use mask_password() method from Oslo.utils.

Cons:
A) You will need oslo.utils library to use keystoneauth.
Objection by community:
- keystoneauth community don't want any dependency on any of OpenStack common
oslo libraries.
Please refer to the comment from Morgan:
https://bugs.launchpad.net/keystoneauth/+bug/1700751/comments/3

3) Add a custom logging filter in oslo logger
Please refer to POC sample here: http://paste.openstack.org/show/617093/
OpenStack core services using any OpenStack individual python-*client (for e.g
python-cinderclient used in nova service) will need to pass oslo_logger object
during it's
initialization which will do the work of masking sensitive information.
Note: In nova, oslo.logger object is not passed during cinder client
initialization
(https://github.com/openstack/nova/blob/master/nova/volume/cinder.py#L135-L141),
In this case, sensitive information will not be masked as it isn't using
Oslo.logger.

Pros:
A) No changes required in oslo.logger or any OpenStack services if
mask_password method is modified in oslo.utils.

Cons:
A) Every log message will be scanned for certain password fields degrading the
performance.
B) If consumer of keystoneauth doesn't use oslo_logger, then the sensitive
information will not be masked.
C) Will need to make changes wherever applicable to the OpenStack core services
to pass oslo.logger object during python-novaclient initialization.

4) Add mask_password formatter parameter in oslo_log:
Add "mask_password" formatter to sanitize sensitive data and pass it as a
keyword argument to the log statement.
If the mask_password is set, then only the sensitive information will be masked
at the time of logging.
The log statement will look like below:

logger.debug("'adminPass': 'Now you see me'"), mask_password=True)

Please refer to the POC code here: http://paste.openstack.org/show/618019/

Pros:
A) No changes required in oslo.logger or any OpenStack services if
mask_password method is modified in oslo.utils.

Cons:
A) If consumer of keystoneauth doesn't use oslo_logger, then the sensitive
information will not be masked.
B) If you forget to pass mask_password=True for logging messages where
sensitive information is present, then those fields won't be masked with ***.
But this can be clearly documented as suggested by Morgan and Lance.
C) This solution requires you to add a below check in keystoneauth to avoid
from an exception being raised in case logger is pure python Logger as it
doesn't accept mask_password keyword argument.

if isinstance(logger, logging.Logger):
logger.debug(' '.join(string_parts))
else:
logger.debug(' '.join(string_parts), mask_password=True)

This check assumes that the logger instance will be oslo_log only if it is not
of python default logging.Logger.
Keystoneauth community is not ready to have any dependency on any oslo-* lib,
so it seems this solution has low acceptance chances.

Please let me know your opinions about the above four approaches. Which one
should we adopt?

[1]
https://github.com/openstack/oslo.utils/blob/master/oslo_utils/strutils.py#L248-L313
[2]
https://github.com/openstack/oslo.utils/blob/6e04f882c4308ff64fa199d1b127ad225e0a30c4/oslo_utils/strutils.py#L56-L96

[Description: Description:
cid:image009.jpg@01D193F0.F70B44C0]

[Description: Description:
cid:image010.jpg@01D193F0.F70B44C0]

[Description: Description:
cid:image011.jpg@01D193F0.F70B44C0]

__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Balazs Gibizer




On Wed, Sep 27, 2017 at 11:58 AM, Jakub Jursa 
 wrote:



On 27.09.2017 11:12, Jakub Jursa wrote:



 On 27.09.2017 10:40, Blair Bethwaite wrote:
 On 27 September 2017 at 18:14, Stephen Finucane 
 wrote:
 What you're probably looking for is the 'reserved_host_memory_mb' 
option. This
 defaults to 512 (at least in the latest master) so if you up this 
to 4192 or

 similar you should resolve the issue.


 I don't see how this would help given the problem description -
 reserved_host_memory_mb would only help avoid causing OOM when
 launching the last guest that would otherwise fit on a host based 
on
 Nova's simplified notion of memory capacity. It sounds like both 
CPU

 and NUMA pinning are in play here, otherwise the host would have no
 problem allocating RAM on a different NUMA node and OOM would be
 avoided.


 I'm not quite sure if/how OpenStack handles NUMA pinning (why is VM
 being killed by OOM rather than having memory allocated on different
 NUMA node). Anyway, good point, thank you, I should have a look at 
exact

 parameters passed to QEMU when using CPU pinning.



 Jakub, your numbers sound reasonable to me, i.e., use 60 out of 
64GB


 Hm, but the question is, how to prevent having some smaller instance
 (e.g. 2GB RAM) scheduled on such NUMA node?


 when only considering QEMU overhead - however I would expect that
 might  be a problem on NUMA node0 where there will be extra 
reserved
 memory regions for kernel and devices. In such a configuration 
where

 you are wanting to pin multiple guests into each of multiple NUMA
 nodes I think you may end up needing different flavor/instance-type
 configs (using less RAM) for node0 versus other NUMA nodes. Suggest


 What do you mean using different flavor? From what I understand (
 
http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/virt-driver-numa-placement.html
 https://docs.openstack.org/nova/pike/admin/cpu-topologies.html ) it 
can

 be specified that flavor 'wants' different amount memory from its
 (virtual) NUMA nodes, but mapping vCPU <-> pCPU is more or less
 arbitrary (meaning that there is no way how to specify for NUMA 
node0 on

 physical host that it has less memory available for VM allocation)


Can't be 'reserved_huge_pages' option used to reserve memory on 
certain

NUMA nodes?
https://docs.openstack.org/ocata/config-reference/compute/config-options.html


I think the qemu memory overhead is allocated from the 4k memory pool 
so the question is if it is possible to reserve 4k pages with the 
reserved_huge_pages config option. I don't find any restriction in the 
code base about 4k pages (even if it is not considered as a large page 
by definition) so in theory you can do it. However this also means you 
have to enable NumaTopologyFilter.


Cheers,
gibi








 freshly booting one of your hypervisors and then with no guests
 running take a look at e.g. /proc/buddyinfo/ and /proc/zoneinfo to 
see

 what memory is used/available and where.



 Thanks, I'll look into it.


 Regards,

 Jakub



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM



On 27.09.2017 11:12, Jakub Jursa wrote:
> 
> 
> On 27.09.2017 10:40, Blair Bethwaite wrote:
>> On 27 September 2017 at 18:14, Stephen Finucane  wrote:
>>> What you're probably looking for is the 'reserved_host_memory_mb' option. 
>>> This
>>> defaults to 512 (at least in the latest master) so if you up this to 4192 or
>>> similar you should resolve the issue.
>>
>> I don't see how this would help given the problem description -
>> reserved_host_memory_mb would only help avoid causing OOM when
>> launching the last guest that would otherwise fit on a host based on
>> Nova's simplified notion of memory capacity. It sounds like both CPU
>> and NUMA pinning are in play here, otherwise the host would have no
>> problem allocating RAM on a different NUMA node and OOM would be
>> avoided.
> 
> I'm not quite sure if/how OpenStack handles NUMA pinning (why is VM
> being killed by OOM rather than having memory allocated on different
> NUMA node). Anyway, good point, thank you, I should have a look at exact
> parameters passed to QEMU when using CPU pinning.
> 
>>
>> Jakub, your numbers sound reasonable to me, i.e., use 60 out of 64GB
> 
> Hm, but the question is, how to prevent having some smaller instance
> (e.g. 2GB RAM) scheduled on such NUMA node?
> 
>> when only considering QEMU overhead - however I would expect that
>> might  be a problem on NUMA node0 where there will be extra reserved
>> memory regions for kernel and devices. In such a configuration where
>> you are wanting to pin multiple guests into each of multiple NUMA
>> nodes I think you may end up needing different flavor/instance-type
>> configs (using less RAM) for node0 versus other NUMA nodes. Suggest
> 
> What do you mean using different flavor? From what I understand (
> http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/virt-driver-numa-placement.html
> https://docs.openstack.org/nova/pike/admin/cpu-topologies.html ) it can
> be specified that flavor 'wants' different amount memory from its
> (virtual) NUMA nodes, but mapping vCPU <-> pCPU is more or less
> arbitrary (meaning that there is no way how to specify for NUMA node0 on
> physical host that it has less memory available for VM allocation)

Can't be 'reserved_huge_pages' option used to reserve memory on certain
NUMA nodes?
https://docs.openstack.org/ocata/config-reference/compute/config-options.html


> 
>> freshly booting one of your hypervisors and then with no guests
>> running take a look at e.g. /proc/buddyinfo/ and /proc/zoneinfo to see
>> what memory is used/available and where.
>>
> 
> Thanks, I'll look into it.
> 
> 
> Regards,
> 
> Jakub
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] reset key pair during rebuilding

2017-09-27 Thread Sean Dague

On 09/27/2017 05:15 AM, Marcus Furlong wrote:
> On 27 September 2017 at 09:23, Michael Still  wrote:
>>
>> Operationally, why would I want to inject a new keypair? The scenario I can
>> think of is that there's data in that instance that I want, and I've lost
>> the keypair somehow. Unless that data is on an ephemeral, its gone if we do
>> a rebuild.
> 
> This is quite a common scenario - staff member who started the
> instance leaves, and you want to access data on the instance, or
> maintain/debug the service running on the instance.
> 
> Hitherto, I have used direct db calls to update the key, so it would
> be nice if there was an API call to do so.

But you also triggered a rebuild in the process? Or you tweaked the keys
and did a reboot? This use case came up in the room, but then we started
trying to figure out if the folks that mostly had it would also need it
on reboot.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

[openstack-dev] [ocata] [nova-api] Nova api stopped working after a yum update

2017-09-27 Thread Avery Rozar

Hello all,
I ran "yum update" on my OpenStack controller and now any request to the
nova.api service (port 8774) results in an error in
"/var/log/nova/nova-api.log".

A simple get request,

GET /v2.1/os-hypervisors/detail HTTP/1.1
Host: host.domain.com:8774
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:54.0)
Gecko/20100101 Firefox/54.0
X-Auth-Token: 
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Content-Type: application/json
Content-Length: 0
DNT: 1
Connection: close
Upgrade-Insecure-Requests: 1


Results in and error logged to "/var/log/nova/nova-api.log

WARNING keystoneauth.identity.generic.base [-] Discovering versions from
the identity service failed when creating the password plugin. Attempting
to determine version from URL.
ERROR nova.api.openstack [-] Caught error: Could not determine a suitable
URL for the plugin
ERROR nova.api.openstack Traceback (most recent call last):
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/nova/api/openstack/__init__.py", line 88, in __call__
ERROR nova.api.openstack return req.get_response(self.application)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/request.py",
line 1299, in send
ERROR nova.api.openstack application, catch_exc_info=False)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/request.py",
line 1263, in call_application
ERROR nova.api.openstack app_iter = application(self.environ,
start_response)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/dec.py",
line 144, in __call__
ERROR nova.api.openstack return resp(environ, start_response)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/dec.py",
line 130, in __call__
ERROR nova.api.openstack resp = self.call_func(req, *args,
**self.kwargs)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/dec.py",
line 195, in call_func
ERROR nova.api.openstack return self.func(req, *args, **kwargs)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/osprofiler/web.py",
line 108, in __call__
ERROR nova.api.openstack return request.get_response(self.application)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/request.py",
line 1299, in send
ERROR nova.api.openstack application, catch_exc_info=False)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/request.py",
line 1263, in call_application
ERROR nova.api.openstack app_iter = application(self.environ,
start_response)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/dec.py",
line 130, in __call__
ERROR nova.api.openstack resp = self.call_func(req, *args,
**self.kwargs)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/dec.py",
line 195, in call_func
ERROR nova.api.openstack return self.func(req, *args, **kwargs)
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/__init__.py", line 332, in __call__
ERROR nova.api.openstack response = self.process_request(req)
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/__init__.py", line 623, in
process_request
ERROR nova.api.openstack resp = super(AuthProtocol,
self).process_request(request)
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/__init__.py", line 405, in
process_request
ERROR nova.api.openstack allow_expired=allow_expired)
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/__init__.py", line 435, in
_do_fetch_token
ERROR nova.api.openstack data = self.fetch_token(token, **kwargs)
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/__init__.py", line 762, in
fetch_token
ERROR nova.api.openstack allow_expired=allow_expired)
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/_identity.py", line 217, in
verify_token
ERROR nova.api.openstack auth_ref = self._request_strategy.verify_token(
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/_identity.py", line 168, in
_request_strategy
ERROR nova.api.openstack strategy_class = self._get_strategy_class()
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/_identity.py", line 190, in
_get_strategy_class
ERROR nova.api.openstack if self._adapter.get_endpoint(
version=klass.AUTH_VERSION):
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystoneauth1/adapter.py", line 176, in get_endpoint
ERROR nova.api.openstack return self.session.get_endpoint(auth or
self.auth, **kwargs)
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystoneauth1/session.py", line 856, in get_endpoint
ERROR

[openstack-dev] [publiccloud-wg] Reminder meeting PublicCloudWorkingGroup

2017-09-27 Thread Tobias Rydberg


Hi everyone,

Don't forget todays meeting for the PublicCloudWorkingGroup.
1400 UTC  in IRC channel #openstack-meeting-3

Etherpad and agenda: https://etherpad.openstack.org/p/publiccloud-wg

Regards,
Tobias Rydberg


smime.p7s
Description: S/MIME Cryptographic Signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [ptg] Simplification in OpenStack

2017-09-27 Thread Gyorgy Szombathelyi

Hi,

> The install docs still suggest hand configuring machines in 2017. It’s only 
> after
> people fall down that snake pit that they find projects like
> TripleO/Ansible/Puppet/Chef, and wonder why everyone doesn’t use this
> stuff.

I just wondering, too, but about a different thing: the install doc writes 
nicely
how to install and configure OpenStack as an average Linux admin would do it.
Install packages/modify config files and you're ready. These steps are not 
necessary
to be hand-executed, they can be easily automated (Ansible comes to my mind 
first, as
the most user-friendly config management tool for me). Then the sysadmin looks
at the official deployment tools: they're doing their job with exta layers, 
extra things
which are not in the install docs, like creating containers, installing 
OpenStack from git,
installing an OpenStack before installing the real OpenStack, etc...
They're just overcomplicated, to be honest. 

As an operator myself, I want a solid OpenStack installation, which I can 
manage and upgrade,
not tens of containers, or other stuff which I cannot touch unless I take the 
risk of blowing up
everything. With the traditional method (packages/config management) I can sit 
and lay back,
upgrade when I want (did it from Liberty to Ocata in real OpenStack clusters, 
that means 3 upgrades,
and the clusters are still alive), can apply updates when a pacakage is 
released, and simply I feel 
that the infra is under my control, not under some install tool. 

These were the reasons why I wrote my ansible playbook set, and I still feel it 
was a
good decision (more than 2 years OpenStack operation experience says that).
I understand, maybe some wants to be at the bleeding edge, likes to run the 
most recent git revisions,
but most of them wants a stable installation in production.

I don't know if this opinion counts, but what I would like to see stable, good 
quality OpenStack packages (I
know it is very distro-specific, but it is not the problem of OpenStack, but 
the Linux ecosystem - containers
are just a workaround and not the right solution), and simple installers which 
just install these packages
and configures them. No more, no less.

My 2 cents,
Br,
György
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] reset key pair during rebuilding

2017-09-27 Thread Marcus Furlong

On 27 September 2017 at 09:23, Michael Still  wrote:
>
> Operationally, why would I want to inject a new keypair? The scenario I can
> think of is that there's data in that instance that I want, and I've lost
> the keypair somehow. Unless that data is on an ephemeral, its gone if we do
> a rebuild.

This is quite a common scenario - staff member who started the
instance leaves, and you want to access data on the instance, or
maintain/debug the service running on the instance.

Hitherto, I have used direct db calls to update the key, so it would
be nice if there was an API call to do so.

Cheers,
Marcus.
-- 
Marcus Furlong

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

On 27.09.2017 10:40, Blair Bethwaite wrote:
> On 27 September 2017 at 18:14, Stephen Finucane  wrote:
>> What you're probably looking for is the 'reserved_host_memory_mb' option. 
>> This
>> defaults to 512 (at least in the latest master) so if you up this to 4192 or
>> similar you should resolve the issue.
> 
> I don't see how this would help given the problem description -
> reserved_host_memory_mb would only help avoid causing OOM when
> launching the last guest that would otherwise fit on a host based on
> Nova's simplified notion of memory capacity. It sounds like both CPU
> and NUMA pinning are in play here, otherwise the host would have no
> problem allocating RAM on a different NUMA node and OOM would be
> avoided.

I'm not quite sure if/how OpenStack handles NUMA pinning (why is VM
being killed by OOM rather than having memory allocated on different
NUMA node). Anyway, good point, thank you, I should have a look at exact
parameters passed to QEMU when using CPU pinning.

> 
> Jakub, your numbers sound reasonable to me, i.e., use 60 out of 64GB

Hm, but the question is, how to prevent having some smaller instance
(e.g. 2GB RAM) scheduled on such NUMA node?

> when only considering QEMU overhead - however I would expect that
> might  be a problem on NUMA node0 where there will be extra reserved
> memory regions for kernel and devices. In such a configuration where
> you are wanting to pin multiple guests into each of multiple NUMA
> nodes I think you may end up needing different flavor/instance-type
> configs (using less RAM) for node0 versus other NUMA nodes. Suggest

What do you mean using different flavor? From what I understand (
http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/virt-driver-numa-placement.html
https://docs.openstack.org/nova/pike/admin/cpu-topologies.html ) it can
be specified that flavor 'wants' different amount memory from its
(virtual) NUMA nodes, but mapping vCPU <-> pCPU is more or less
arbitrary (meaning that there is no way how to specify for NUMA node0 on
physical host that it has less memory available for VM allocation)

> freshly booting one of your hypervisors and then with no guests
> running take a look at e.g. /proc/buddyinfo/ and /proc/zoneinfo to see
> what memory is used/available and where.
> 

Thanks, I'll look into it.

Regards,

Jakub

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM



On 27.09.2017 10:14, Stephen Finucane wrote:
> On Mon, 2017-09-25 at 17:36 +0200, Jakub Jursa wrote:
>> Hello everyone,
>>
>> We're experiencing issues with running large instances (~60GB RAM) on
>> fairly large NUMA nodes (4 CPUs, 256GB RAM) while using cpu pinning. The
>> problem is that it seems that in some extreme cases qemu/KVM can have
>> significant memory overhead (10-15%?) which nova-compute service doesn't
>> take in to the account when launching VMs. Using our configuration as an
>> example - imagine running two VMs with 30GB RAM on one NUMA node
>> (because we use cpu pinning) - therefore using 60GB out of 64GB for
>> given NUMA domain. When both VMs would consume their entire memory
>> (given 10% KVM overhead) OOM killer takes an action (despite having
>> plenty of free RAM in other NUMA nodes). (the numbers are just
>> arbitrary, the point is that nova-scheduler schedules the instance to
>> run on the node because the memory seems 'free enough', but specific
>> NUMA node can be lacking the memory reserve).
>>
>> Our initial solution was to use ram_allocation_ratio < 1 to ensure
>> having some reserved memory - this didn't work. Upon studying source of
>> nova, it turns out that ram_allocation_ratio is ignored when using cpu
>> pinning. (see
>> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L859
>> and
>> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L821
>> ). We're running Mitaka, but this piece of code is implemented in Ocata
>> in a same way.
>> We're considering to create a patch for taking ram_allocation_ratio in
>> to account.
>>
>> My question is - is ram_allocation_ratio ignored on purpose when using
>> cpu pinning? If yes, what is the reasoning behind it? And what would be
>> the right solution to ensure having reserved RAM on the NUMA nodes?
> 
> Both 'ram_allocation_ratio' and 'cpu_allocation_ratio' are ignored when using
> pinned CPUs because they don't make much sense: you want a high performance VM
> and have assigned dedicated cores to the instance for this purpose, yet you're
> telling nova to over-schedule and schedule multiple instances to some of these
> same cores.

I wanted to use 'ram_allocation_ration' with value for example 0.8 to
force 'under-schedule' the host, to create a reserve on the host.

> 
> What you're probably looking for is the 'reserved_host_memory_mb' option. This
> defaults to 512 (at least in the latest master) so if you up this to 4192 or
> similar you should resolve the issue.

I'm afraid that this won't help as this option doesn't take into account
NUMA nodes (e.g. there would be 'reserved_host_memory_mb' amount of free
memory on the physical node, but not in all its NUMA nodes

> 
> Hope this helps,
> Stephen
> 

Regards,

Jakub

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

Also CC-ing os-ops as someone else may have encountered this before
and have further/better advice...

On 27 September 2017 at 18:40, Blair Bethwaite
 wrote:
> On 27 September 2017 at 18:14, Stephen Finucane  wrote:
>> What you're probably looking for is the 'reserved_host_memory_mb' option. 
>> This
>> defaults to 512 (at least in the latest master) so if you up this to 4192 or
>> similar you should resolve the issue.
>
> I don't see how this would help given the problem description -
> reserved_host_memory_mb would only help avoid causing OOM when
> launching the last guest that would otherwise fit on a host based on
> Nova's simplified notion of memory capacity. It sounds like both CPU
> and NUMA pinning are in play here, otherwise the host would have no
> problem allocating RAM on a different NUMA node and OOM would be
> avoided.
>
> Jakub, your numbers sound reasonable to me, i.e., use 60 out of 64GB
> when only considering QEMU overhead - however I would expect that
> might  be a problem on NUMA node0 where there will be extra reserved
> memory regions for kernel and devices. In such a configuration where
> you are wanting to pin multiple guests into each of multiple NUMA
> nodes I think you may end up needing different flavor/instance-type
> configs (using less RAM) for node0 versus other NUMA nodes. Suggest
> freshly booting one of your hypervisors and then with no guests
> running take a look at e.g. /proc/buddyinfo/ and /proc/zoneinfo to see
> what memory is used/available and where.
>
> --
> Cheers,
> ~Blairo



-- 
Cheers,
~Blairo

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM