Re: [openstack-dev] [tripleo] Newton End-Of-Life (EOL) next month (reminder #1)

2017-09-27 Thread Tony Breeds
On Wed, Sep 27, 2017 at 10:17:16PM -0500, Ben Nemec wrote:
> 
> 
> On 09/27/2017 08:13 PM, Tony Breeds wrote:
> > On Wed, Sep 27, 2017 at 03:35:43PM -0500, Ben Nemec wrote:
> > > It's a little weird because essentially we want to provide a higher level 
> > > of
> > > support for stable branches than most of OpenStack.  My understanding is
> > > that a lot of the current stable branch policy came out of the fact that
> > > there was a great deal of apathy toward stable branches in upstream
> > > OpenStack and it just wasn't possible to say we'd do more than critical 
> > > bug
> > > and security fixes for older releases.  Maybe we need a stable-policy-plus
> > > tag or something for projects that can and want to do more.  And feel free
> > > to correct me if I'm misinterpreted the historical discussions on this. 
> > > :-)
> > 
> > That's mostly accurate but the policy also is an indication that
> > consumers should be moving along to newer releases.  For a whole host of
> > reasons that isn't working and it's a thing that we need to address as a
> > community.
> 
> Ah, I wasn't familiar with that aspect of it.  I guess that's a valid reason
> not to continue full support of stable branches even if you theoretically
> could.
> 
> > 
> > The current policy broadly defines 3 phases[1]:
> > 
> > Phase   Time frameSummaryChanges Supported
> > I   First 6 monthsLatest release All bugfixes (that meet the
> >   criteria described below) 
> > are
> >   appropriate
> > II  6-12 months   Maintained release Only critical bugfixes and
> >  after releasesecurity patches are 
> > acceptable
> > III more than 12  Legacy release Only security patches are 
> > acceptable
> >  months after
> >  release
> > 
> > I can see a policy looked more like:
> > 
> > PhaseTime frameSummary Changes Supported
> > I0-12 months   Maintained release  All bugfixes (that meet the
> >  after release  criteria described below) 
> > are
> > appropriate
> > II  more than 12 Legacy releaseOnly security patches are 
> > acceptable
> >  months after
> >  release
> > 
> > The 12 month mark is really only there to line up with our current EOL
> > plans, if they changed then we'd need to match them.
> 
> Wouldn't that still exclude the Ceph patch we're using as an example? Newton
> is over 12 months old at this point.

Re: [openstack-dev] [tripleo] Newton End-Of-Life (EOL) next month (reminder #1)

2017-09-27 Thread Ben Nemec



On 09/27/2017 08:13 PM, Tony Breeds wrote:

On Wed, Sep 27, 2017 at 03:35:43PM -0500, Ben Nemec wrote:
  

It's a little weird because essentially we want to provide a higher level of
support for stable branches than most of OpenStack.  My understanding is
that a lot of the current stable branch policy came out of the fact that
there was a great deal of apathy toward stable branches in upstream
OpenStack and it just wasn't possible to say we'd do more than critical bug
and security fixes for older releases.  Maybe we need a stable-policy-plus
tag or something for projects that can and want to do more.  And feel free
to correct me if I'm misinterpreted the historical discussions on this. :-)


That's mostly accurate but the policy also is an indication that
consumers should be moving along to newer releases.  For a whole host of
reasons that isn't working and it's a thing that we need to address as a
community.


Ah, I wasn't familiar with that aspect of it.  I guess that's a valid 
reason not to continue full support of stable branches even if you 
theoretically could.




The current policy broadly defines 3 phases[1]:

Phase   Time frameSummaryChanges Supported
I   First 6 monthsLatest release All bugfixes (that meet the
  criteria described below) are
  appropriate
II  6-12 months   Maintained release Only critical bugfixes and
 after releasesecurity patches are 
acceptable
III more than 12  Legacy release Only security patches are 
acceptable
 months after
 release

I can see a policy looked more like:

PhaseTime frameSummary Changes Supported
I0-12 months   Maintained release  All bugfixes (that meet the
 after release  criteria described below) are
appropriate
II  more than 12 Legacy releaseOnly security patches are 
acceptable
 months after
 release

The 12 month mark is really only there to line up with our current EOL
plans, if they changed then we'd need to match them.


Wouldn't that still exclude the Ceph patch we're using as an example? 
Newton is over 12 months old at this point.





That said, I'm staunchly opposed to feature backports.  While I think it
makes perfect sense to allow backports like Giulio's,


Yup with my limited knowledge I think that review makes perfect sense to
backport.  It just doesn't match the *current* stable policy.


  

It feels a little weird to me to be arguing this side of it because I'm
pretty sure I've argued against splitting repos in the past.  But I think I
would not say we kick all the vendor-integration bits out if we do this,
just that we provide the option for vendors to have their own repos with
their own stable backport policies without having to change the policy for
all of TripleO at the same time.


If splitting the repos has good technical benefits then cool, if it's
mostly about matching policy then I think altering the policy (or
defining a new one is a better solution)
  

And I'm also open to other approaches like tweaking the cycle-trailing
definition to allow more time for this sort of thing.  Maybe we could
eliminate some of the need for feature backports if we did that.


I'm not sure I follow but sure altering the timeline within reason is a
simple thing to do.


Yeah, that was not a fully formed thought in my head when I wrote it. 
:-)  I guess I was thinking of somehow allowing more time for features 
to be done after the rest of OpenStack cuts its release, but I don't 
actually know if that would help.


One (maybe crazy) thought I had after writing all this was the 
possibility of allowing feature backports for a limited time after 
release in the deployment projects.  Say feature backports are only 
allowed up to M-1 of the next release.  I'm not at all sure I like the 
idea but it has some interesting implications, both good and bad.  Like 
no more FFE's - if you miss release you just have to do the extra work 
of backporting if you still want it in.  So there's motivation to get 
stuff done on time, but less panic around release time.  Of course, 
somebody's got to review all those backports so like I said I'm not 
convinced it's a good idea, but it's an idea. :-)


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Newton End-Of-Life (EOL) next month (reminder #1)

2017-09-27 Thread Emilien Macchi
On Wed, Sep 27, 2017 at 5:37 PM, Tony Breeds  wrote:
> On Wed, Sep 27, 2017 at 10:39:13AM -0600, Alex Schultz wrote:
>
>> One idea would be to allow trailing projects additional trailing on
>> the phases as well.  Honestly 2 weeks for trailing for just GA is hard
>> enough. Let alone the fact that the actual end-users are 18+ months
>> behind.  For some deployment project like tripleo, there are sections
>> that should probably follow stable-policy as it exists today but
>> elements where there's 3rd party integration or upgrade implications
>> (in the case of tripleo, THT/puppet-tripleo) and they need to be more
>> flexible to modify things as necessary.  The word 'feature' isn't
>> necessarily the same for these projects than something like
>> nova/neutron/etc.
>
> There are 2 separate aspects here:
> 1) What changes are appropriate on stable/* branches ; and
> 2) How long to stable/* stay around for.
>
> Looking at 1.  I totally get that deployment projects have a different
> threshold on the bugfix/feature line.  That's actually the easy part to
> fix.  The point of the stable policy is to give users some assurance
> that moving from version x.y.z -> x.Y.Z will be a smooth process.  We
> just need to capture that intent in a policy that works in the context
> of a deployment project.

It makes total sense to me. BTW we have CI coverage for upgrades from
Newton to Ocata (and Ocata to Pike is ongoing but super close; also
Pike to Queens is targeted to Queens-1 milestone) so you can see our
efforts on that front are pretty heavy.

> Looking at 2.  The stable policy doesn't say you *need* to EOL on
> Oct-11th  by default any project that asserts that tag is included but
> you're also free to opt out as long as there is a good story around CI
> and impact on human and machine resources.  We re-evaluate that from
> time to time.  As an example, group-based-policy otpted out of the
> kilo?, liberty and mitaka EOLs, recently dropped everything before
> mitaka.  I get that GBP has a different footprint in CI than tripleo
> does but it illustrates that there is scope to support your users within
> the current policy.

Again, it makes a lot of sense here. We don't want to burn too much CI
resources and keep strict minimum - also make sure we don't burn any
external team (e.g. stable-maint).

> I'm still advocating for crafting a more appropriate policy for
> deployment projects.

Cool, it's aligned with what Ben and Alex are proposing, iiuc.

>> >> What proposing Giulio probably comes from the real world, the field,
>> >> who actually manage OpenStack at scale and on real environments (not
>> >> in devstack from master). If we can't have this code in-tree, we'll
>> >> probably carry this patch downstream (which is IMHO bad because of
>> >> maintenance and lack of CI). In that case, I'll vote to give up
>> >> stable:follows-policy so we can do what we need.
>> >
>> > Rather than give up on the stable:follows policy tag it is possibly
>> > worth looking at which portions of tripleo make that assertion.
>> >
>> > In this specific case, there isn't anything in the bug that indicates
>> > it comes from a user report which is all the stable team has to go on
>> > when making these types of decisions.
>> >
>>
>> We'll need to re-evaulate what stable-policy means for tripleo.  We
>> don't want to allow the world for backporting but we also want to
>> reduce the patches carried downstream for specific use cases.  I think
>> in the case of 3rd party integrations we need a better definition of
>> what that means and perhaps creating a new repository like THT-extras
>> that doesn't follow stable-policy while the main one does.
>
> Right, I don't pretend to understand the ins-and-outs of tripleo but yes
> I think we're mostly agreeing on that point.
>
> https://review.openstack.org/#/c/507924/ buys everyone the space to make
> that evaluation.
>
> Yours Tony.

Thanks Tony for being open on the ideas; I find our discussion very
productive despite the fact we want to give up the tag for now.

So as a summary:

1) We discuss on 507924 to figure out yes/no we give up the tag and
which repos we do it.
2) Someone to propose an amendment to the existing stable policy or
propose a new policy.
3) Figure out if we can postpone TripleO Newton EOL and make sure
we're doing it right (e.g. having CI jobs working, not burning
anything etc).
4) On the long term, figure out how to break down THT (we'll probably
want a blueprint for that and some folks working on it).

Thanks,
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [tripleo] plans on testing minor updates?

2017-09-27 Thread Emilien Macchi
I was reviewing https://review.openstack.org/#/c/487496/ and
https://review.openstack.org/#/c/487488/ when I realized that we still
didn't have any test coverage for minor updates.
We never had this coverage AFICT but this is not a reason to not push
forward it.

During Ocata and Pike, we saw that having upgrade jobs were extremely
useful to actually test the workflow that our users are supposed to do
in production, I see zero reason to not doing the same for minor
updates.
I don't want to be the bad guy here but i've -2 the 2 patches until we
find some consensus here (sorry matbu, it's not against you or your
code in specific, but more generally speaking about re: implementing
features without CI coverage).

I'm really willing to help and start to work on tripleo-quickstart
roles this week, if someone agrees to pair with me - so we could make
progress and have that coverage. Even if the new job would fail,
that's OK we know the process might work (or not, TBH, I haven't tried
it, probably shardy and some other folks know more about it). Once we
have the workflow in place, then iterate into matbu's patches and make
it work in CI so we can ship it and be proud to have the feature
tested.
That's IMHO how we should write our software.

If there is any feedback on this, please let us know here, otherwise
I'll keep my -2 until we've got this coverage in place. Also please
someone (maybe matbu?) raise your hand if you want to pair up and do
this quickly.

Thanks,
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Disk Image Builder for redhat 7.4

2017-09-27 Thread Amrith Kumar
As Tony says, there's a base image that you can use for RHEL 7.4. Yes, you
can install Oracle onto the image using dib.

In saying that I only mean that it is possible. I make no statement about
the supportability of that solution by any vendors involved.

To do it, you would create elements (the basic unit of abstraction) in dib.

You would, for example have an element with an install.d phase that would
install the Oracle package just the same way you would if you did it by
hand.

Then you'd invoke a command like

disk-image-create rhel7 vm your-oracle-element -o oracle-image.qcow2 ...
maybe a couple of other options for good measure ...




-amrith


On Tue, Sep 26, 2017 at 6:43 PM, Tony Breeds 
wrote:

> On Tue, Sep 26, 2017 at 10:19:45PM +0530, Amit Singla wrote:
> > Hi,
> >
> > Could you tell me how I can create qcow2 image for rhel 7.4 by disk image
> > builder and I want also to install oracle 12.2 on that image with DIB. Is
> > it possible?
>
> For the RHEL 7.4 side of things there is a rhel7 dib target, that starts
> with a guest image supplied by Red Hat and customises it for your needs.
>
> No idea about oracle 12.2.
>
> Yours Tony.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] OpenStack-Ansible testing with OpenVSwitch

2017-09-27 Thread Michael Gale
Hello JP,

Ok, I will do some more testing against the blog post and then hit up
the #openstack-ansible channel.

I need to finish a presentation on SFC first which is why I am looking into
OpenVSwitch.

Thanks
Michael

On Wed, Sep 27, 2017 at 6:25 AM, Jean-Philippe Evrard <
jean-phili...@evrard.me> wrote:

> Hello,
>
> We currently don't have a full scenario for openvswitch for an easy
> "one line" install.
> It still deserves more love. You could come on our channel in
> #openstack-ansible to discuss about it if you want. But the general
> idea should be close to the same explained in the blog post.
>
> Best regards,
> JP
>
> On Wed, Sep 27, 2017 at 12:13 AM, Michael Gale 
> wrote:
> > Hello,
> >
> > I am trying to build a Pike All-in-One instance for OpenStack Ansible
> > testing, currently I have a few OpenStack versions being deployed using
> the
> > default Linux Bridge implementation.
> >
> > However I need a test environment to validate OpenVSwitch
> implementation, is
> > there a simple method to get an AIO installed?
> >
> > I tried following
> > https://medium.com/@travistruman/configuring-openstack-ansible-for-open-
> vswitch-b7e70e26009d
> > however Neutron is blowing up because it can't determine the name for the
> > Neutron Server. I am not sure if that is my issue or not, a reference
> > implementation of OpenStack AIO with OpenVSwitch would help me a lot.
> >
> > Thanks
> > Michael
> >
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 

“The Man who says he can, and the man who says he can not.. Are both
correct”
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] OpenStack-Ansible and Trove support

2017-09-27 Thread Michael Gale
Hello JP,

At this point in time I am only looking for a PoC environment, if it is
part of the AIO that is perfect for now.

I would also like to communicate within my organization that we could start
using Trove after X release cycle. We currently have no dependency on it's
availability.

Michael

On Wed, Sep 27, 2017 at 6:29 AM, Jean-Philippe Evrard <
jean-phili...@evrard.me> wrote:

> Hello Michael,
>
> On top of that, we intend to have a "role maturity" that will include
> when the role was proposed and it's current maturity phase, for more
> clarity, not unlike openstack project navigator.
>
> Our os_trove role has not received many commits recently, and the
> "maintenance mode" of Trove will probably impact you in the future.
> Do you intend to keep a trove installation in production, or do you
> want to do a PoC?
>
> Best regards,
> JP
>
> On Wed, Sep 27, 2017 at 12:24 AM, Amy Marrich  wrote:
> > Michael,
> >
> > There are release notes for each release that will go over what's new,
> > what's on it's way out or even gone as well as bug fixes and other
> > information. Here's a link to the Ocata release notes for
> OpenStack-Ansible
> > which includes the announcement of the Trove role.
> >
> > https://docs.openstack.org/releasenotes/openstack-ansible/ocata.html
> >
> > Thanks,
> >
> > Amy (spotz)
> >
> > On Tue, Sep 26, 2017 at 6:04 PM, Michael Gale 
> > wrote:
> >>
> >> Hello,
> >>
> >>Based on github and
> >> https://docs.openstack.org/openstack-ansible-os_trove/latest/ it looks
> like
> >> OpenStack-Ansible will support Trove under the Ocata release.
> >>
> >> Is that assumption correct? is there a better method to determine when a
> >> software component will likely be included in a release?
> >>
> >> Michael
> >>
> >> 
> __
> >> OpenStack Development Mailing List (not for usage questions)
> >> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >>
> >
> >
> > 
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
> unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 

“The Man who says he can, and the man who says he can not.. Are both
correct”
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] reset key pair during rebuilding

2017-09-27 Thread LIU Yulong
On Wed, Sep 27, 2017 at 4:23 PM, Michael Still  wrote:

> One thing I'd like to explore is what the functional difference between a
> rebuild and a delete / create cycle is. With a rebuild you get to keep your
> IP I suppose, but that could also be true of floating IPs for a delete /
> create as well.
>
>
The neutron port which was used by the VM does not changed, so the floating
IP will not need to be recreate.



> Operationally, why would I want to inject a new keypair? The scenario I
> can think of is that there's data in that instance that I want, and I've
> lost the keypair somehow. Unless that data is on an ephemeral, its gone if
> we do a rebuild.
>
>

"The old VM was using a wrong image, I want to change it. Bad things
happened in the VM, I want reinstall the OS. Oh, I lost my old private key.
I can reset the image, but I can't login it." -- A cloud user's whisper.
Rebuild is try to recreate, a new param added to the existing rebuild API
meets the renew purpose.



Michael
>
>
> On Wed, Sep 27, 2017 at 4:05 PM, LIU Yulong 
> wrote:
>
>> On Wed, Sep 27, 2017 at 10:29 AM, Matt Riedemann 
>> wrote:
>>
>>> On 9/23/2017 8:58 AM, LIU Yulong wrote:
>>>
 Hi nova developers,

 This mail is proposed to reconsider the key pair resetting of instance.
 The nova queens PTG discuss is here: https://etherpad.openstack.org
 /p/nova-ptg-queens  L498.
 And there are now two proposals.

 1. SPEC 1: https://review.openstack.org/#/c/375221/ <
 https://review.openstack.org/#/c/375221/> started by me (liuyulong)
 since sep 2016.

 This spec will allow setting the new key_name for the instance
 during rebuild API. That’s a very simple and well-understood approach:

   * Make consistent with rebuild API properties, such as name, imageRef,
 metadata, adminPass etc.
   * Rebuild API is something like `recreating`, this is the right way to
 do key pair updating. For keypair-login-only VM, this is the key
 point.
   * This does not involve to other APIs like reboot/unshelve etc.

>>>
>>> This was one of the issues I brought up in IRC, is that if we just
>>> implemented this for the rebuild API, then someone could also ask that we
>>> do it for things like reboot, cold migrate/resize, unshelve, etc. Anything
>>> that involves re-creating the guest.
>>>
>>> IMHO, rebuild has its own meaning is that we are going to recreate a VM.
>> So those inputs such as name, key, password should have a chance to be
>> reset in this `rebuild` interface. Unlike reboot, cold migrate/resize,
>> unshelve, those actions does not have such potential implication. If
>> anything else involved, you are expanding those actions (reboot, cold
>> migrate/resize, unshelve).
>>
>>
>>
>>>   * Easy to use, only one API.

>>>
>>> Until someone says we should also do it for the other APIs, as noted
>>> above.
>>>
>>> This could not be acceptable. Other APIs does not have such `recreating`
>> background. For rebuild, you are going to renew an instance, so those
>> params for instance creation should have chance to be reset.
>>
>>
>>>
 By the way, here is the patch (https://review.openstack.org/#/c/379128/
 ) which has implemented this
 spec. And it stays there more than one year too.

>>>
>>> It's been open because the spec was never approved. Just a procedural
>>> issue.
>>>
>>>
 2. SPEC 2 : https://review.openstack.org/#/c/506552/ <
 https://review.openstack.org/#/c/506552/> propose by Kevin_zheng.

 This spec supposed to add a new updating API for one instance’s key
 pair. This one has one foreseeable advantage for this is to do instance
 running key injection.

 But it may cause some issues:

   * This approach needs to update the instance key pair first (one step,
 API call). And then do a reboot/rebuild or any actions causing the
 vm restart (second step, another API call). Firstly, this is waste,
 it use two API calls. Secondly, if updating key pair was done, and
 the reboot was not. That may result an inconsistent between instance
 DB key pair and guest VM inside key. Cloud user may confused to
 choose which key should be used to login.

>>>
>>> 1. I don't think multiple API calls is a problem. Any GUI or
>>> orchestration tool can stitch these APIs together for what appears to be a
>>> single operation for the end user. Furthermore, with multiple options about
>>> what to do after the instance.key_name is updated, something like a GUI
>>> could present the user with the option to picking if they want to reboot or
>>> rebuild after the key is updated.
>>>
>>> We provided a discontinuous API, so we should take responsibilities for
>> it. This inconsistent between instance DB key pair and guest VM 

Re: [openstack-dev] [nova] reset key pair during rebuilding

2017-09-27 Thread LIU Yulong
On Wed, Sep 27, 2017 at 5:15 PM, Marcus Furlong  wrote:

> On 27 September 2017 at 09:23, Michael Still  wrote:
> >
> > Operationally, why would I want to inject a new keypair? The scenario I
> can
> > think of is that there's data in that instance that I want, and I've lost
> > the keypair somehow. Unless that data is on an ephemeral, its gone if we
> do
> > a rebuild.
>
> This is quite a common scenario - staff member who started the
> instance leaves, and you want to access data on the instance, or
> maintain/debug the service running on the instance.
>
>

I can think of several ways to solve this problem:
1) reset the password by using the admin_pass API (if available)
2) libguestfs the instance dist directly



> Hitherto, I have used direct db calls to update the key, so it would
> be nice if there was an API call to do so.
>
>

These are some tricks for unusual scenarios. Nova API needs to stay robust
and general.



> Cheers,
> Marcus.
> --
> Marcus Furlong
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Newton End-Of-Life (EOL) next month (reminder #1)

2017-09-27 Thread Tony Breeds
On Wed, Sep 27, 2017 at 03:35:43PM -0500, Ben Nemec wrote:
 
> It's a little weird because essentially we want to provide a higher level of
> support for stable branches than most of OpenStack.  My understanding is
> that a lot of the current stable branch policy came out of the fact that
> there was a great deal of apathy toward stable branches in upstream
> OpenStack and it just wasn't possible to say we'd do more than critical bug
> and security fixes for older releases.  Maybe we need a stable-policy-plus
> tag or something for projects that can and want to do more.  And feel free
> to correct me if I'm misinterpreted the historical discussions on this. :-)

That's mostly accurate but the policy also is an indication that
consumers should be moving along to newer releases.  For a whole host of
reasons that isn't working and it's a thing that we need to address as a
community.

The current policy broadly defines 3 phases[1]:

Phase   Time frameSummaryChanges Supported
I   First 6 monthsLatest release All bugfixes (that meet the
 criteria described below) are
 appropriate
II  6-12 months   Maintained release Only critical bugfixes and
after releasesecurity patches are acceptable
III more than 12  Legacy release Only security patches are 
acceptable
months after
release

I can see a policy looked more like:

PhaseTime frameSummary Changes Supported
I0-12 months   Maintained release  All bugfixes (that meet the
after release  criteria described below) are
   appropriate
II  more than 12 Legacy releaseOnly security patches are 
acceptable
months after
release

The 12 month mark is really only there to line up with our current EOL
plans, if they changed then we'd need to match them.

> That said, I'm staunchly opposed to feature backports.  While I think it
> makes perfect sense to allow backports like Giulio's,

Yup with my limited knowledge I think that review makes perfect sense to
backport.  It just doesn't match the *current* stable policy.


 
> It feels a little weird to me to be arguing this side of it because I'm
> pretty sure I've argued against splitting repos in the past.  But I think I
> would not say we kick all the vendor-integration bits out if we do this,
> just that we provide the option for vendors to have their own repos with
> their own stable backport policies without having to change the policy for
> all of TripleO at the same time.

If splitting the repos has good technical benefits then cool, if it's
mostly about matching policy then I think altering the policy (or
defining a new one is a better solution)
 
> And I'm also open to other approaches like tweaking the cycle-trailing
> definition to allow more time for this sort of thing.  Maybe we could
> eliminate some of the need for feature backports if we did that.

I'm not sure I follow but sure altering the timeline within reason is a
simple thing to do.

Yours Tony.

[1] https://docs.openstack.org/project-team-guide/stable-branches.html


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [octavia] haproxy fails to receive datagram

2017-09-27 Thread Michael Johnson
Hi Yipei,

I ran this scenario today using octavia and had success.  I'm not sure
what could be different.
I see you are using neutron-lbaas.  I will build a devstack with
neutron-lbaas enabled and try that, but I can't think of what would
impact this test case by going through the neutron-lbaas path.

Michael


On Tue, Sep 26, 2017 at 7:27 PM, Yipei Niu  wrote:
> Hi, Michael,
>
> The instructions are listed as follows.
>
> First, create a net1.
> $ neutron net-create net1
> $ neutron subnet-create net1 10.0.1.0/24 --name subnet1
>
> Second, boot two vms in net1
> $ nova boot --flavor 1 --image $image_id --nic net-id=$net1_id vm1
> $ nova boot --flavor 1 --image $image_id --nic net-id=$net1_id vm2
>
> Third, logon to the two vms, respectively. Here take vm1 as an example.
> $ MYIP=$(ifconfig eth0|grep 'inet addr'|awk -F: '{print $2}'| awk '{print
> $1}')
> $ while true; do echo -e "HTTP/1.0 200 OK\r\n\r\nWelcome to $MYIP" | sudo nc
> -l -p 80 ; done&
>
> Fourth, exit vms and update the default security group shared by the vms by
> adding a rule of allowing traffic to port 80.
> $ neutron security-group-rule-create --direction ingress --protocol tcp
> --port-range-min 80 --port-range-max 80 --remote-ip-refix 0.0.0.0/0
> $default_security_group
> Note: make sure "sudo ip netns exec $qdhcp-net1_id curl -v $vm_ip" works. In
> other words, make sure the vms can accept HTTP requests and return its IP,
> respectively.
>
> Fifth, create a lb, a listener, and a pool. Then add the two vms to the pool
> as members.
> $ neutron lbaas-loadbalancer-create --name lb1 subnet1
> $ neutron lbaas-listener-create --loadbalancer lb1 --protocol HTTP
> --protocol-port 80 --name listener1
> $ neutron lbaas-pool-create --lb-algorithm ROUND_ROBIN --listener listener1
> --protocol HTTP --name pool1
> $ neutron baas-member-create --subnet subnet1 --address $vm1_ip
> --protocol-port 80 pool1
> $ neutron baas-member-create --subnet subnet1 --address $vm2_ip
> --protocol-port 80 pool1
>
> Finally, try "sudo ip netns qdhcp-net1_id curl -v $VIP" to see whether lbaas
> works.
>
> Best regards,
> Yipei
>
> On Wed, Sep 27, 2017 at 1:30 AM, Yipei Niu  wrote:
>>
>> Hi, Michael,
>>
>> I think the octavia is the latest, since I pull the up-to-date repo of
>> octavia manually to my server before installation.
>>
>> Anyway, I run "sudo ip netns exec amphora-haproxy ip route show table 1"
>> in the amphora, and find that the route table exists. The info is listed as
>> follows.
>>
>> default via 10.0.1.1 dev eth1 onlink
>>
>> I think it may not be the source.
>>
>> Best regards,
>> Yipei
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Newton End-Of-Life (EOL) next month (reminder #1)

2017-09-27 Thread Tony Breeds
On Wed, Sep 27, 2017 at 10:39:13AM -0600, Alex Schultz wrote:

> One idea would be to allow trailing projects additional trailing on
> the phases as well.  Honestly 2 weeks for trailing for just GA is hard
> enough. Let alone the fact that the actual end-users are 18+ months
> behind.  For some deployment project like tripleo, there are sections
> that should probably follow stable-policy as it exists today but
> elements where there's 3rd party integration or upgrade implications
> (in the case of tripleo, THT/puppet-tripleo) and they need to be more
> flexible to modify things as necessary.  The word 'feature' isn't
> necessarily the same for these projects than something like
> nova/neutron/etc.

There are 2 separate aspects here:
1) What changes are appropriate on stable/* branches ; and 
2) How long to stable/* stay around for.

Looking at 1.  I totally get that deployment projects have a different
threshold on the bugfix/feature line.  That's actually the easy part to
fix.  The point of the stable policy is to give users some assurance
that moving from version x.y.z -> x.Y.Z will be a smooth process.  We
just need to capture that intent in a policy that works in the context
of a deployment project.

Looking at 2.  The stable policy doesn't say you *need* to EOL on
Oct-11th  by default any project that asserts that tag is included but
you're also free to opt out as long as there is a good story around CI
and impact on human and machine resources.  We re-evaluate that from
time to time.  As an example, group-based-policy otpted out of the
kilo?, liberty and mitaka EOLs, recently dropped everything before
mitaka.  I get that GBP has a different footprint in CI than tripleo
does but it illustrates that there is scope to support your users within
the current policy.

I'm still advocating for crafting a more appropriate policy for
deployment projects.
 
> >> What proposing Giulio probably comes from the real world, the field,
> >> who actually manage OpenStack at scale and on real environments (not
> >> in devstack from master). If we can't have this code in-tree, we'll
> >> probably carry this patch downstream (which is IMHO bad because of
> >> maintenance and lack of CI). In that case, I'll vote to give up
> >> stable:follows-policy so we can do what we need.
> >
> > Rather than give up on the stable:follows policy tag it is possibly
> > worth looking at which portions of tripleo make that assertion.
> >
> > In this specific case, there isn't anything in the bug that indicates
> > it comes from a user report which is all the stable team has to go on
> > when making these types of decisions.
> >
> 
> We'll need to re-evaulate what stable-policy means for tripleo.  We
> don't want to allow the world for backporting but we also want to
> reduce the patches carried downstream for specific use cases.  I think
> in the case of 3rd party integrations we need a better definition of
> what that means and perhaps creating a new repository like THT-extras
> that doesn't follow stable-policy while the main one does.

Right, I don't pretend to understand the ins-and-outs of tripleo but yes
I think we're mostly agreeing on that point.

https://review.openstack.org/#/c/507924/ buys everyone the space to make
that evaluation.

Yours Tony.


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra][mogan] Need help for replacing the current master

2017-09-27 Thread Davanum Srinivas
Clark,

I'd like to avoid the ACL update which will make it different from
other projects. Since we don't expect to do this again, can you please
help do this?

Thanks,
Dims

On Wed, Sep 27, 2017 at 7:55 PM, Clark Boylan  wrote:
> On Tue, Sep 26, 2017, at 05:57 PM, Zhenguo Niu wrote:
>> Thanks Clark Boylan,
>>
>> We have frozen the Mogan repo since this mail sent out, and there's no
>> need
>> to update the replacement master. So please help out when you got time.
>
> I mentioned this to dims on IRC today, but should write it here as well
> for broader reach. It looks like https://github.com/dims/mogan is a fast
> forwardable change on top of 7744129c83839ab36801856f283fb165d71af32e.
> Also its less than ten commits ahead of current mogan master (7744129).
> For this reason I think you can just push those commits up to Gerrit and
> review them normally.
>
> The only gotcha with this is you may need to update the Gerrit ACLs to
> allow merge commit pushes.
>
> Clark
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Davanum Srinivas :: https://twitter.com/dims

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [infra][mogan] Need help for replacing the current master

2017-09-27 Thread Clark Boylan
On Tue, Sep 26, 2017, at 05:57 PM, Zhenguo Niu wrote:
> Thanks Clark Boylan,
> 
> We have frozen the Mogan repo since this mail sent out, and there's no
> need
> to update the replacement master. So please help out when you got time.

I mentioned this to dims on IRC today, but should write it here as well
for broader reach. It looks like https://github.com/dims/mogan is a fast
forwardable change on top of 7744129c83839ab36801856f283fb165d71af32e.
Also its less than ten commits ahead of current mogan master (7744129).
For this reason I think you can just push those commits up to Gerrit and
review them normally.

The only gotcha with this is you may need to update the Gerrit ACLs to
allow merge commit pushes.

Clark

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron]OVS connection tracking cleanup

2017-09-27 Thread Ajay Kalambur (akalambu)
Also the weird part with this conntrack deletion I perform a conntrack –L to 
view the table I see no entry for any of the entries its trying to delete. 
Those entries are all removed anyways when Vms are cleaned up from the look of 
it. So it looks like all those conntrack deletions were pretty much no-ops
Ajay


From: Ajay Kalambur >
Date: Tuesday, September 12, 2017 at 9:30 AM
To: "OpenStack Development Mailing List (not for usage questions)" 
>
Cc: "Ian Wells (iawells)" >
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Hi Kevin
Sure will log a bug
Also does the config change involve having both these lines in the neutron.conf 
file?
[agent]
root_helper = sudo neutron-rootwrap /etc/neutron/rootwrap.conf
root_helper_daemon = sudo neutron-rootwrap-daemon /etc/neutron/rootwrap.conf

If I have only the second line I see the exception below on neutron openvswitch 
agent bring up:

2017-09-12 09:23:03.633 35 DEBUG neutron.agent.linux.utils 
[req-0f8fe685-66bd-44d7-beac-bb4c24f0ccfa - - - - -] Running command: ['ps', 
'--ppid', '103', '-o', 'pid='] create_process 
/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:89
2017-09-12 09:23:03.762 35 ERROR ryu.lib.hub 
[req-0f8fe685-66bd-44d7-beac-bb4c24f0ccfa - - - - -] hub: uncaught exception: 
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ryu/lib/hub.py", line 54, in _launch
return func(*args, **kwargs)
  File 
"/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ovs_ryuapp.py",
 line 42, in agent_main_wrapper
ovs_agent.main(bridge_classes)
  File 
"/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 2184, in main
agent.daemon_loop()
  File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 154, in 
wrapper
return f(*args, **kwargs)
  File 
"/usr/lib/python2.7/site-packages/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_neutron_agent.py",
 line 2100, in daemon_loop
self.ovsdb_monitor_respawn_interval) as pm:
  File "/usr/lib64/python2.7/contextlib.py", line 17, in __enter__
return self.gen.next()
  File "/usr/lib/python2.7/site-packages/neutron/agent/linux/polling.py", line 
35, in get_polling_manager
pm.start()
  File "/usr/lib/python2.7/site-packages/neutron/agent/linux/polling.py", line 
57, in start
while not self.is_active():
  File "/usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py", 
line 100, in is_active
self.pid, self.cmd_without_namespace)
  File "/usr/lib/python2.7/site-packages/neutron/agent/linux/async_process.py", 
line 159, in pid
run_as_root=self.run_as_root)
  File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 
297, in get_root_helper_child_pid
pid = find_child_pids(pid)[0]
  File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 
179, in find_child_pids
log_fail_as_error=False)
  File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 
128, in execute
_stdout, _stderr = obj.communicate(_process_input)
  File "/usr/lib64/python2.7/subprocess.py", line 800, in communicate
return self._communicate(input)
  File "/usr/lib64/python2.7/subprocess.py", line 1403, in _communicate
stdout, stderr = self._communicate_with_select(input)
  File "/usr/lib64/python2.7/subprocess.py", line 1504, in 
_communicate_with_select
rlist, wlist, xlist = select.select(read_set, write_set, [])
  File "/usr/lib/python2.7/site-packages/eventlet/green/select.py", line 86, in 
select
return hub.switch()
  File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 294, in 
switch
return self.greenlet.switch()
Timeout: 5 seconds

2017-09-12 09:23:03.860 35 INFO oslo_rootwrap.client [-] Stopping rootwrap 
daemon process with pid=95


Ajay



From: Kevin Benton >
Reply-To: "OpenStack Development Mailing List (not for usage questions)" 
>
Date: Monday, September 11, 2017 at 1:12 PM
To: "OpenStack Development Mailing List (not for usage questions)" 
>
Cc: "Ian Wells (iawells)" >
Subject: Re: [openstack-dev] [neutron]OVS connection tracking cleanup

Can you start a bug on launchpad and upload the conntrack attachment to the bug?

Switching to the rootwrap daemon should also help significantly.

On Mon, Sep 11, 2017 at 12:32 PM, Ajay Kalambur (akalambu) 
> wrote:
Hi Kevin
The information you asked for
For 1 compute node with 45 Vms here is the number of connection tracking 
entries getting deleted
cat 

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Chris Friesen

On 09/27/2017 04:55 PM, Blair Bethwaite wrote:

Hi Prema

On 28 September 2017 at 07:10, Premysl Kouril  wrote:

Hi, I work with Jakub (the op of this thread) and here is my two
cents: I think what is critical to realize is that KVM virtual
machines can have substantial memory overhead of up to 25% of memory,
allocated to KVM virtual machine itself. This overhead memory is not


I'm curious what sort of VM configuration causes such high overheads,
is this when using highly tuned virt devices with very large buffers?


For what it's worth we ran into issues a couple years back with I/O to 
RDB-backed disks in writethrough/writeback.  There was a bug that allowed a very 
large number of in-flight operations if the ceph server couldn't keep up with 
the aggregate load.  We hacked a local solution, I'm not sure if it's been dealt 
with upstream.


I think virtio networking has also caused issues, though not as bad.  (But 
noticeable when running close to the line.)


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Blair Bethwaite
Hi Prema

On 28 September 2017 at 07:10, Premysl Kouril  wrote:
> Hi, I work with Jakub (the op of this thread) and here is my two
> cents: I think what is critical to realize is that KVM virtual
> machines can have substantial memory overhead of up to 25% of memory,
> allocated to KVM virtual machine itself. This overhead memory is not

I'm curious what sort of VM configuration causes such high overheads,
is this when using highly tuned virt devices with very large buffers?

> This KVM virtual machine overhead is what is causing the OOMs in our
> infrastructure and that's what we need to fix.

If you are pinning multiple guests per NUMA node in a multi-NUMA node
system then you might also have issues with uneven distribution of
system overheads across nodes, depending on how close to the sun you
are flying.

-- 
Cheers,
~Blairo

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [all][infra] Zuul v3 migration update

2017-09-27 Thread Monty Taylor

Hey everybody,

We're there. It's ready.

We've worked through all of the migration script issues and are happy 
with the results. The cutover trigger is primed and ready to go.


But as it's 21:51 UTC / 16:52 US Central it's a short day to be 
available to respond to the questions folks may have... so we're going 
to postpone one more day.


Since it's all ready to go we'll be looking at flipping the switch first 
thing in the morning. (basically as soon as the West Coast wakes up and 
is ready to go)


The project-config repo should still be considered frozen except for 
migration-related changes. Hopefully we'll be able to flip the final 
switch early tomorrow.


If you haven't yet, please see [1] for information about the transition.

[1] https://docs.openstack.org/infra/manual/zuulv3.html

Thanks,

Monty

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Chris Friesen

On 09/27/2017 03:10 PM, Premysl Kouril wrote:

Lastly, qemu has overhead that varies depending on what you're doing in the
guest.  In particular, there are various IO queues that can consume
significant amounts of memory.  The company that I work for put in a good
bit of effort engineering things so that they work more reliably, and part
of that was determining how much memory to reserve for the host.

Chris


Hi, I work with Jakub (the op of this thread) and here is my two
cents: I think what is critical to realize is that KVM virtual
machines can have substantial memory overhead of up to 25% of memory,
allocated to KVM virtual machine itself. This overhead memory is not
considered in nova code when calculating if the instance being
provisioned actually fits into host's available resources (only the
memory, configured in instance's flavor is considered). And this is
especially being a problem when CPU pinning is used as the memory
allocation is bounded by limits of specific NUMA node (due to the
strict memory allocation mode). This renders the global reservation
parameter reserved_host_memory_mb useless as it doesn't take NUMA into
account.

This KVM virtual machine overhead is what is causing the OOMs in our
infrastructure and that's what we need to fix.


Feel free to report a bug against nova...maybe reserved_host_memory_mb should be 
a list of per-numa-node values.


It's a bit of a hack, but if you use hugepages for all the guests you can 
control the amount of per-numa-node memory reserved for host overhead.


Since the kvm overhead memory is allocated from 4K pages (in my experience) you 
can just choose to leave some memory on each host NUMA node as 4K pages instead 
of allocating them as hugepages.


Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Premysl Kouril
> Lastly, qemu has overhead that varies depending on what you're doing in the
> guest.  In particular, there are various IO queues that can consume
> significant amounts of memory.  The company that I work for put in a good
> bit of effort engineering things so that they work more reliably, and part
> of that was determining how much memory to reserve for the host.
>
> Chris

Hi, I work with Jakub (the op of this thread) and here is my two
cents: I think what is critical to realize is that KVM virtual
machines can have substantial memory overhead of up to 25% of memory,
allocated to KVM virtual machine itself. This overhead memory is not
considered in nova code when calculating if the instance being
provisioned actually fits into host's available resources (only the
memory, configured in instance's flavor is considered). And this is
especially being a problem when CPU pinning is used as the memory
allocation is bounded by limits of specific NUMA node (due to the
strict memory allocation mode). This renders the global reservation
parameter reserved_host_memory_mb useless as it doesn't take NUMA into
account.

This KVM virtual machine overhead is what is causing the OOMs in our
infrastructure and that's what we need to fix.

Regards,
Prema

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Newton End-Of-Life (EOL) next month (reminder #1)

2017-09-27 Thread Ben Nemec



On 09/27/2017 11:39 AM, Alex Schultz wrote:

On Tue, Sep 26, 2017 at 11:57 PM, Tony Breeds  wrote:

On Tue, Sep 26, 2017 at 10:31:59PM -0700, Emilien Macchi wrote:

On Tue, Sep 26, 2017 at 10:17 PM, Tony Breeds  wrote:

With that in mind I'd suggest that your review isn't appropriate for


If we have to give up backports that help customers to get
production-ready environments, I would consider giving up stable
policy tag which probably doesn't fit for projects like installers. In
a real world, users don't deploy master or Pike (even not Ocata) but
are still on Liberty, and most of the time Newton.


I agree the stable policy doesn't map very well to deployment projects
and that's something I'd like to address.  I admit I'm not certain *how*
to address it but it almost certainly starts with a discussion like this
;P

I've proposed a forum session to further this discussion, even if that
doesn't happen there's always the hall-way track :)



One idea would be to allow trailing projects additional trailing on
the phases as well.  Honestly 2 weeks for trailing for just GA is hard
enough. Let alone the fact that the actual end-users are 18+ months
behind.  For some deployment project like tripleo, there are sections
that should probably follow stable-policy as it exists today but
elements where there's 3rd party integration or upgrade implications
(in the case of tripleo, THT/puppet-tripleo) and they need to be more
flexible to modify things as necessary.  The word 'feature' isn't
necessarily the same for these projects than something like
nova/neutron/etc.


What proposing Giulio probably comes from the real world, the field,
who actually manage OpenStack at scale and on real environments (not
in devstack from master). If we can't have this code in-tree, we'll
probably carry this patch downstream (which is IMHO bad because of
maintenance and lack of CI). In that case, I'll vote to give up
stable:follows-policy so we can do what we need.


Rather than give up on the stable:follows policy tag it is possibly
worth looking at which portions of tripleo make that assertion.

In this specific case, there isn't anything in the bug that indicates
it comes from a user report which is all the stable team has to go on
when making these types of decisions.



We'll need to re-evaulate what stable-policy means for tripleo.  We
don't want to allow the world for backporting but we also want to
reduce the patches carried downstream for specific use cases.  I think
in the case of 3rd party integrations we need a better definition of
what that means and perhaps creating a new repository like THT-extras
that doesn't follow stable-policy while the main one does.


It's a little weird because essentially we want to provide a higher 
level of support for stable branches than most of OpenStack.  My 
understanding is that a lot of the current stable branch policy came out 
of the fact that there was a great deal of apathy toward stable branches 
in upstream OpenStack and it just wasn't possible to say we'd do more 
than critical bug and security fixes for older releases.  Maybe we need 
a stable-policy-plus tag or something for projects that can and want to 
do more.  And feel free to correct me if I'm misinterpreted the 
historical discussions on this. :-)


That said, I'm staunchly opposed to feature backports.  While I think it 
makes perfect sense to allow backports like Giulio's, I was here when we 
wasted the entire Mitaka cycle backporting things to Liberty and Kilo. 
Sure, you can say we'll just be disciplined and pick and choose what we 
backport, but I'm pretty sure we said the same thing back then.  It's a 
lot harder to say no when a customer/partner/your manager starts pushing 
for something and you have no policy to back you up.


If we need to allow feature-ish backports for third-party, then I think 
the third-party bits need to be split out into their own repo (they 
probably should have been anyway) that has a different support policy. 
I suppose we could try to implement that by convention in current tht, 
but that will likely get messy when someone wants to backport a feature 
that touches both third-party and core tht bits.


I guess maybe this is all going back to what we discussed at the PTG 
retrospective about needing better modularity in TripleO.  Instead of 
having this monolithic all-singing, all-dancing tht repo that includes 
the world, we need a well-defined interface for vendors to plug their 
bits into TripleO so they can live where they want and be managed how 
they want.


It feels a little weird to me to be arguing this side of it because I'm 
pretty sure I've argued against splitting repos in the past.  But I 
think I would not say we kick all the vendor-integration bits out if we 
do this, just that we provide the option for vendors to have their own 
repos with their own stable backport policies without having to change 
the policy for all of 

[openstack-dev] September 29 Price Increase & Forum Submission Deadline - OpenStack Summit Sydney

2017-09-27 Thread Allison Price
Hi everyone,

Prices for the OpenStack Summit Sydney 
 will be increasing this Friday, 
September 29 at 11:59pm Pacific Time (September 30 at 6:59 UTC).

Register now  before the 
price increases!

Also a reminder that Friday is the deadline for Forum submissions. Submit here 
. 

All discount registration codes must be redeemed by October 27.

If you have any Summit-related questions, please contact sum...@openstack.org 
.

Cheers,
Allison

Allison Price
OpenStack Foundation
alli...@openstack.org


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Newton End-Of-Life (EOL) next month (reminder #1)

2017-09-27 Thread Emilien Macchi
On Wed, Sep 27, 2017 at 9:39 AM, Alex Schultz  wrote:
[...]
> We'll need to re-evaulate what stable-policy means for tripleo.  We
> don't want to allow the world for backporting but we also want to
> reduce the patches carried downstream for specific use cases.  I think
> in the case of 3rd party integrations we need a better definition of
> what that means and perhaps creating a new repository like THT-extras
> that doesn't follow stable-policy while the main one does.

Thanks Alex for the notes, while I agree with you, I proposed:
https://review.openstack.org/507924 in the meantime.

I'm not entirely sure about the THT-extras and the fact it would add
another layer of complexity, but I'm happy to discuss about it.

Tony, Alex, Steve, (others of course) - if you can look at the
governance change and give feedback on it, that would help.

Thanks,
-- 
Emilien Macchi

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] l2gw

2017-09-27 Thread Ricardo Noriega De Soto
Hey Lajos,

Is this the exception you are encountering?

(neutron) l2-gateway-update --device name=hwvtep,interface_names=eth0,eth1
gw1
L2 Gateway 'b8ef7f98-e901-4ef5-b159-df53364ca996' still has active mappings
with one or more neutron networks.
Neutron server returns request_ids:
['req-f231dc53-cb7d-4221-ab74-fa8715f85869']

I don't see the L2GatewayInUse exception you're talking about, but I guess
it's the same situation.

We should discuss in which case the l2gw instance could be updated, and in
which cases it shouldn't.

Please, let me know!



On Wed, Aug 16, 2017 at 11:14 AM, Lajos Katona 
wrote:

> Hi,
>
> We faced an issue with l2-gw-update, which means that actually if there
> are connections for a gw the update will throw an exception
> (L2GatewayInUse), and the update is only possible after deleting first the
> connections, do the update and add the connections back.
>
> It is not exactly clear why this restriction is there in the code (at
> least I can't find it in docs or comments in the code, or review).
> As I see the check for network connections was introduced in this patch:
> https://review.openstack.org/#/c/144097 (https://review.openstack.org/
> #/c/144097/21..22/networking_l2gw/db/l2gateway/l2gateway_db.py)
>
> Could you please give me a little background why the update operation is
> not allowed on an l2gw with network connections?
>
> Thanks in advance for the help.
>
> Regards
> Lajos
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Ricardo Noriega

Senior Software Engineer - NFV Partner Engineer | Office of Technology  |
Red Hat
irc: rnoriega @freenode
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tripleo] Newton End-Of-Life (EOL) next month (reminder #1)

2017-09-27 Thread Alex Schultz
On Tue, Sep 26, 2017 at 11:57 PM, Tony Breeds  wrote:
> On Tue, Sep 26, 2017 at 10:31:59PM -0700, Emilien Macchi wrote:
>> On Tue, Sep 26, 2017 at 10:17 PM, Tony Breeds  
>> wrote:
>> > With that in mind I'd suggest that your review isn't appropriate for
>>
>> If we have to give up backports that help customers to get
>> production-ready environments, I would consider giving up stable
>> policy tag which probably doesn't fit for projects like installers. In
>> a real world, users don't deploy master or Pike (even not Ocata) but
>> are still on Liberty, and most of the time Newton.
>
> I agree the stable policy doesn't map very well to deployment projects
> and that's something I'd like to address.  I admit I'm not certain *how*
> to address it but it almost certainly starts with a discussion like this
> ;P
>
> I've proposed a forum session to further this discussion, even if that
> doesn't happen there's always the hall-way track :)
>

One idea would be to allow trailing projects additional trailing on
the phases as well.  Honestly 2 weeks for trailing for just GA is hard
enough. Let alone the fact that the actual end-users are 18+ months
behind.  For some deployment project like tripleo, there are sections
that should probably follow stable-policy as it exists today but
elements where there's 3rd party integration or upgrade implications
(in the case of tripleo, THT/puppet-tripleo) and they need to be more
flexible to modify things as necessary.  The word 'feature' isn't
necessarily the same for these projects than something like
nova/neutron/etc.

>> What proposing Giulio probably comes from the real world, the field,
>> who actually manage OpenStack at scale and on real environments (not
>> in devstack from master). If we can't have this code in-tree, we'll
>> probably carry this patch downstream (which is IMHO bad because of
>> maintenance and lack of CI). In that case, I'll vote to give up
>> stable:follows-policy so we can do what we need.
>
> Rather than give up on the stable:follows policy tag it is possibly
> worth looking at which portions of tripleo make that assertion.
>
> In this specific case, there isn't anything in the bug that indicates
> it comes from a user report which is all the stable team has to go on
> when making these types of decisions.
>

We'll need to re-evaulate what stable-policy means for tripleo.  We
don't want to allow the world for backporting but we also want to
reduce the patches carried downstream for specific use cases.  I think
in the case of 3rd party integrations we need a better definition of
what that means and perhaps creating a new repository like THT-extras
that doesn't follow stable-policy while the main one does.

Thanks,
-Alex

> Yours Tony.
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][infra] Zuul v3 migration update

2017-09-27 Thread Jay Pipes

On 09/27/2017 03:49 AM, Flavio Percoco wrote:
Just wanted to say thanks to all of you for the hard work. I can only 
imagine

how hard it must be to do this migration without causing downtimes.


+1000

Thank you so much for the hard work the infra team has put into making 
this migration as painless for the community as possible. Your efforts 
have certainly not gone unnoticed.


All the best,
-jay

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][DIB] how create triplo overcloud image with latest kernel?

2017-09-27 Thread Yolanda Robla Mota
If you need a guideline about how to build TripleO images with DIB, i have
that blogpost:
http://teknoarticles.blogspot.com.es/2017/07/build-and-use-security-hardened-images.html

This if for security hardened images, but your replace
"overcloud-hardened-images" by "overcloud-images", it will build the
default one. You can specify the base image you want to use, as well as
enable any repo you have, that may take the latest kernel.

Hope it helps!

On Wed, Sep 27, 2017 at 5:21 PM, Brad P. Crochet  wrote:

>
> On Tue, Sep 26, 2017 at 2:58 PM Ben Nemec  wrote:
>
>>
>>
>> On 09/26/2017 05:43 AM, Moshe Levi wrote:
>> > Hi all,
>> >
>> > As part of the OVS Hardware Offload [1] [2],  we need to create new
>> > Centos/Redhat 7 image  with latest kernel/ovs/iproute.
>> >
>> > We tried to use virsh-customize to install the packages and we were able
>> > to update iproute and ovs, but for the kernel there is no space.
>> >
>> > We also tried with virsh-customize to uninstall the old kenrel but we no
>> > luck.
>> >
>> > Is other ways to replace kernel  package in existing image?
>>
>> Do you have to use an existing image?  The easiest way to do this would
>> be to create a DIB element that installs what you want and just include
>> that in the image build in the first place.  I don't think that would be
>> too difficult to do now that we're keeping the image definitions in
>> simple YAML files.
>>
>>
> If it is just packages, a DIB element wouldn't even be necessary. You
> could define a new yaml that just adds the packages that you want, and add
> that to the CLI when you build the images.
>
>
>> >
>> > [1] - https://review.openstack.org/#/c/504911/
>> > > https%3A%2F%2Freview.openstack.org%2F%23%2Fc%2F504911%2F=02%7C01%
>> 7Cmoshele%40mellanox.com%7Cc801dab0778b428e226508d504c97ecf%
>> 7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636420185839119329=
>> 6oEmh0LJacV3WPGGp3wW%2BhL3nPDxRh%2BzNPY67X09Blc%3D=0>
>> >
>> >
>> > [2] - https://review.openstack.org/#/c/502313/
>> > > https%3A%2F%2Freview.openstack.org%2F%23%2Fc%2F502313%2F=02%7C01%
>> 7Cmoshele%40mellanox.com%7Cc801dab0778b428e226508d504c97ecf%
>> 7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636420185839119329=
>> EsydZ9EsUjkYcF92Gys569SJEvQ%2B%2Fu6uV8WAQJ0YMfc%3D=0>
>> >
>> >
>> >
>> >
>> > 
>> __
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
>> unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:
>> unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> --
> Brad P. Crochet, RHCA, RHCE, RHCVA, RHCDS
> Principal Software Engineer
> (c) 704.236.9385 <(704)%20236-9385>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>


-- 

Yolanda Robla Mota

Principal Software Engineer, RHCE

Red Hat



C/Avellana 213

Urb Portugal

yrobl...@redhat.comM: +34605641639


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [TripleO][DIB] how create triplo overcloud image with latest kernel?

2017-09-27 Thread Brad P. Crochet
On Tue, Sep 26, 2017 at 2:58 PM Ben Nemec  wrote:

>
>
> On 09/26/2017 05:43 AM, Moshe Levi wrote:
> > Hi all,
> >
> > As part of the OVS Hardware Offload [1] [2],  we need to create new
> > Centos/Redhat 7 image  with latest kernel/ovs/iproute.
> >
> > We tried to use virsh-customize to install the packages and we were able
> > to update iproute and ovs, but for the kernel there is no space.
> >
> > We also tried with virsh-customize to uninstall the old kenrel but we no
> > luck.
> >
> > Is other ways to replace kernel  package in existing image?
>
> Do you have to use an existing image?  The easiest way to do this would
> be to create a DIB element that installs what you want and just include
> that in the image build in the first place.  I don't think that would be
> too difficult to do now that we're keeping the image definitions in
> simple YAML files.
>
>
If it is just packages, a DIB element wouldn't even be necessary. You could
define a new yaml that just adds the packages that you want, and add that
to the CLI when you build the images.


> >
> > [1] - https://review.openstack.org/#/c/504911/
> > <
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Freview.openstack.org%2F%23%2Fc%2F504911%2F=02%7C01%7Cmoshele%40mellanox.com%7Cc801dab0778b428e226508d504c97ecf%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636420185839119329=6oEmh0LJacV3WPGGp3wW%2BhL3nPDxRh%2BzNPY67X09Blc%3D=0
> >
> >
> >
> > [2] - https://review.openstack.org/#/c/502313/
> > <
> https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Freview.openstack.org%2F%23%2Fc%2F502313%2F=02%7C01%7Cmoshele%40mellanox.com%7Cc801dab0778b428e226508d504c97ecf%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636420185839119329=EsydZ9EsUjkYcF92Gys569SJEvQ%2B%2Fu6uV8WAQJ0YMfc%3D=0
> >
> >
> >
> >
> >
> >
> __
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-- 
Brad P. Crochet, RHCA, RHCE, RHCVA, RHCDS
Principal Software Engineer
(c) 704.236.9385
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Chris Friesen

On 09/27/2017 08:01 AM, Blair Bethwaite wrote:

On 27 September 2017 at 23:19, Jakub Jursa  wrote:

'hw:cpu_policy=dedicated' (while NOT setting 'hw:numa_nodes') results in
libvirt pinning CPU in 'strict' memory mode

(from libvirt xml for given instance)
...
   
 
 
   
...

So yeah, the instance is not able to allocate memory from another NUMA node.


I can't recall what the docs say on this but I wouldn't be surprised
if that was a bug. Though I do think most users would want CPU & NUMA
pinning together (you haven't shared your use case but perhaps you do
too?).


Not a bug.  Once you enable CPU pinning we assume you care about performance, 
and for max performance you need NUMA affinity as well.  (And hugepages are 
beneficial too.)



I'm not quite sure what do you mean by 'memory will be locked for the
guest'. Also, aren't huge pages enabled in kernel by default?


I think that suggestion was probably referring to static hugepages,
which can be reserved (per NUMA node) at boot and then (assuming your
host is configured correctly) QEMU will be able to back guest RAM with
them.


One nice thing about static hugepages is that you pre-allocate them at startup, 
so you can decide on a per-NUMA-node basis how much 4K memory you want to leave 
for incidental host stuff and qemu overhead.  This lets you specify different 
amounts of "host-reserved" memory on different NUMA nodes.


In order to use static hugepages for the guest you need to explicitly ask for a 
page size of 2MB.  (1GB is possible as well but in most cases doesn't buy you 
much compared to 2MB.)


Lastly, qemu has overhead that varies depending on what you're doing in the 
guest.  In particular, there are various IO queues that can consume significant 
amounts of memory.  The company that I work for put in a good bit of effort 
engineering things so that they work more reliably, and part of that was 
determining how much memory to reserve for the host.


Chris

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [FEMDC] IRC Meeting today 15:00 UTC

2017-09-27 Thread Paul-Andre Raymond
Below is the link to the etherpad for our meeting.



On 9/27/17, 10:01 AM, "Paul-Andre Raymond"  
wrote:

Dear all, 

A gentle reminder for our meeting today (an hour from now). 
I believe today will be a short meeting.
Draft agenda was prepared by our friends from INRIA at   
https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2017 (line 
1237)

Please feel free to add items.

Best, 

 
Paul-André
--
 




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Chris Friesen

On 09/27/2017 03:12 AM, Jakub Jursa wrote:



On 27.09.2017 10:40, Blair Bethwaite wrote:

On 27 September 2017 at 18:14, Stephen Finucane  wrote:

What you're probably looking for is the 'reserved_host_memory_mb' option. This
defaults to 512 (at least in the latest master) so if you up this to 4192 or
similar you should resolve the issue.


I don't see how this would help given the problem description -
reserved_host_memory_mb would only help avoid causing OOM when
launching the last guest that would otherwise fit on a host based on
Nova's simplified notion of memory capacity. It sounds like both CPU
and NUMA pinning are in play here, otherwise the host would have no
problem allocating RAM on a different NUMA node and OOM would be
avoided.


I'm not quite sure if/how OpenStack handles NUMA pinning (why is VM
being killed by OOM rather than having memory allocated on different
NUMA node). Anyway, good point, thank you, I should have a look at exact
parameters passed to QEMU when using CPU pinning.


OpenStack uses strict memory pinning when using CPU pinning and/or memory 
hugepages, so all allocations are supposed to be local.  When it can't allocate 
locally, it triggers OOM.


Chris


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [tc][nova][ironic][mogan] Evaluate Mogan project

2017-09-27 Thread Sean Dague
On 09/27/2017 09:31 AM, Julia Kreger wrote:
> [...]
>>> The short explanation which clicked for me (granted it's probably an
>>> oversimplification, but still) was this: Ironic provides an admin
>>> API for managing bare metal resources, while Mogan gives you a user
>>> API (suitable for public cloud use cases) to your Ironic backend. I
>>> suppose it could have been implemented in Ironic, but implementing
>>> it separately allows Ironic to be agnostic to multiple user
>>> frontends and also frees the Ironic team up from having to take on
>>> yet more work directly.
>>
>>
>> ditto!
>>
>> I had a similar question at the PTG and this was the answer that convinced
>> be
>> may be worth the effort.
>>
>> Flavio
>>
> 
> For Ironic, the question did come at the PTG up of tenant aware
> scheduling of owned hardware, as in Customer A and B are managed by
> the same ironic, only customer A's users should be able to schedule on
> to Customer A's hardware, with API access control restrictions such
> that specific customer can take action on their own hardware.
> 
> If we go down the path of supporting such views/logic, it could become
> a massive undertaking for Ironic, so there is absolutely a plus to
> something doing much of that for Ironic. Personally, I think Mogan is
> a good direction to continue to explore. That being said, we should
> improve our communication of plans/directions/perceptions between the
> teams so we don't adversely impact each other and see where we can
> help each other moving forward.

My biggest concern with Mogan is that it forks Nova, then starts
changing interfaces. Nova's got 2 really big API surfaces.

1) The user facing API, which is reasonably well documented, and under
tight control. Mogan has taken key things at 95% similarity and changed
bits. So servers includes things like a partitions parameter.
https://github.com/openstack/mogan/blob/master/api-ref/source/v1/servers.inc#request-4

This being nearly the same but slightly different ends up being really
weird. Especially as Nova evolves it's code with microversions for
things like embedded flavor info.

2) The guest facing API of metadata/config drive. This is far less
documented or tested, and while we try to be strict about adding in
information here in a versioned way, it's never seen the same attention
as the user API on either documentation or version rigor.

That's presumably getting changed, going to drift as well, which means
discovering multiple implementations that are nearly, but not exactly
the same that drift.


The point of licensing things under and Apache 2 license was to enable
folks to do all kind of experiments like this. And experiments are good.
But part of the point of experiments is to learn lessons to bring back
into the fold. Digging out of the multi year hole of "close but not
exactly the same" API differences between nova-net and neutron really
makes me want to make sure we never intentionally inflict that confusion
on folks again.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [acceleration]Cyborg Weekly team Meeting 2017.09.27

2017-09-27 Thread Zhipeng Huang
Hi Team,

Our regular meeting will start at about 30 mins later. Agenda could be
found at
https://wiki.openstack.org/wiki/Meetings/CyborgTeamMeeting#Agenda_for_next_meeting

-- 
Zhipeng (Howard) Huang

Standard Engineer
IT Standard & Patent/IT Product Line
Huawei Technologies Co,. Ltd
Email: huangzhip...@huawei.com
Office: Huawei Industrial Base, Longgang, Shenzhen

(Previous)
Research Assistant
Mobile Ad-Hoc Network Lab, Calit2
University of California, Irvine
Email: zhipe...@uci.edu
Office: Calit2 Building Room 2402

OpenStack, OPNFV, OpenDaylight, OpenCompute Aficionado
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Blair Bethwaite
On 27 September 2017 at 23:19, Jakub Jursa  wrote:
> 'hw:cpu_policy=dedicated' (while NOT setting 'hw:numa_nodes') results in
> libvirt pinning CPU in 'strict' memory mode
>
> (from libvirt xml for given instance)
> ...
>   
> 
> 
>   
> ...
>
> So yeah, the instance is not able to allocate memory from another NUMA node.

I can't recall what the docs say on this but I wouldn't be surprised
if that was a bug. Though I do think most users would want CPU & NUMA
pinning together (you haven't shared your use case but perhaps you do
too?).

> I'm not quite sure what do you mean by 'memory will be locked for the
> guest'. Also, aren't huge pages enabled in kernel by default?

I think that suggestion was probably referring to static hugepages,
which can be reserved (per NUMA node) at boot and then (assuming your
host is configured correctly) QEMU will be able to back guest RAM with
them.

You are probably thinking of THP (transparent huge pages) which are
now on by default in Linux but can be somewhat hit & miss if you have
a long running host where memory has become fragmented or the
pagecache is large - in our experience performance can be severely
degraded by just missing hugepage backing of a small fraction of guest
memory, and we have noticed behaviour from memory management where THP
allocations fail when pagecache is highly utilised despite none of it
being dirty (so should be able to be dropped immediately).

-- 
Cheers,
~Blairo

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [FEMDC] IRC Meeting today 15:00 UTC

2017-09-27 Thread Paul-Andre Raymond
Dear all, 

A gentle reminder for our meeting today (an hour from now). 
I believe today will be a short meeting.
Draft agenda was prepared by our friends from INRIA at   
https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2017 (line 
1237)

Please feel free to add items.

Best, 

 
Paul-André
--
 


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [glance] Queens PTG: Thursday summary

2017-09-27 Thread Belmiro Moreira
Not ideal because we also have the real usecase for community images.
When users start to create/use community images these different use cases
(old public and real community) will be mixed.

Cheers,
Belmiro


On Wed, 27 Sep 2017 at 15:37, Blair Bethwaite 
wrote:

> On 27 September 2017 at 22:40, Belmiro Moreira
>  wrote:
> > In the past we used the tabs but latest Horizon versions use the
> visibility
> > column/search instead.
> > The issue is that we would like the old images to continue to be
> > discoverable by everyone and have a image list that only shows the latest
> > ones.
>
> Yeah I think we hit that as well and have a patch for category
> listing. It's not something I have worked on but Sam can fill the
> gaps... or it could be that this is actually the last problem we have
> left with upgrading to a current version of the dashboard and so are
> effectively in the same boat.
>
> > We are now using the “community” visibility to hide the old images from
> the
> > default image list. But it’s not ideal.
>
> Not ideal because you don't want them discoverable at all?
>
> > I will move the old spec about image lifecycle to glance.
> > https://review.openstack.org/#/c/327980/
>
> Looks like a useful spec!
>
> --
> Cheers,
> ~Blairo
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [glance] Queens PTG: Thursday summary

2017-09-27 Thread Blair Bethwaite
On 27 September 2017 at 22:40, Belmiro Moreira
 wrote:
> In the past we used the tabs but latest Horizon versions use the visibility
> column/search instead.
> The issue is that we would like the old images to continue to be
> discoverable by everyone and have a image list that only shows the latest
> ones.

Yeah I think we hit that as well and have a patch for category
listing. It's not something I have worked on but Sam can fill the
gaps... or it could be that this is actually the last problem we have
left with upgrading to a current version of the dashboard and so are
effectively in the same boat.

> We are now using the “community” visibility to hide the old images from the
> default image list. But it’s not ideal.

Not ideal because you don't want them discoverable at all?

> I will move the old spec about image lifecycle to glance.
> https://review.openstack.org/#/c/327980/

Looks like a useful spec!

-- 
Cheers,
~Blairo

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] reset key pair during rebuilding

2017-09-27 Thread Marcus Furlong
On 27 September 2017 at 10:55, Sean Dague  wrote:
> On 09/27/2017 05:15 AM, Marcus Furlong wrote:
>> On 27 September 2017 at 09:23, Michael Still  wrote:
>>>
>>> Operationally, why would I want to inject a new keypair? The scenario I can
>>> think of is that there's data in that instance that I want, and I've lost
>>> the keypair somehow. Unless that data is on an ephemeral, its gone if we do
>>> a rebuild.
>>
>> This is quite a common scenario - staff member who started the
>> instance leaves, and you want to access data on the instance, or
>> maintain/debug the service running on the instance.
>>
>> Hitherto, I have used direct db calls to update the key, so it would
>> be nice if there was an API call to do so.
>
> But you also triggered a rebuild in the process? Or you tweaked the keys
> and did a reboot? This use case came up in the room, but then we started
> trying to figure out if the folks that mostly had it would also need it
> on reboot.

No rebuild, no.

Update the key name and reboot, or, if someone has access, re-run cloud-init.

# rm -fr /var/lib/cloud/instance/sem/
# cloud-init --single -n ssh

Have also thought about just adding the above to a cronjob in the
images to facilitate this scenario (thus avoiding a reboot if noone
has access).

Cheers,
Marcus.

-- 
Marcus Furlong

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [tc][nova][ironic][mogan] Evaluate Mogan project

2017-09-27 Thread Julia Kreger
[...]
>> The short explanation which clicked for me (granted it's probably an
>> oversimplification, but still) was this: Ironic provides an admin
>> API for managing bare metal resources, while Mogan gives you a user
>> API (suitable for public cloud use cases) to your Ironic backend. I
>> suppose it could have been implemented in Ironic, but implementing
>> it separately allows Ironic to be agnostic to multiple user
>> frontends and also frees the Ironic team up from having to take on
>> yet more work directly.
>
>
> ditto!
>
> I had a similar question at the PTG and this was the answer that convinced
> be
> may be worth the effort.
>
> Flavio
>

For Ironic, the question did come at the PTG up of tenant aware
scheduling of owned hardware, as in Customer A and B are managed by
the same ironic, only customer A's users should be able to schedule on
to Customer A's hardware, with API access control restrictions such
that specific customer can take action on their own hardware.

If we go down the path of supporting such views/logic, it could become
a massive undertaking for Ironic, so there is absolutely a plus to
something doing much of that for Ironic. Personally, I think Mogan is
a good direction to continue to explore. That being said, we should
improve our communication of plans/directions/perceptions between the
teams so we don't adversely impact each other and see where we can
help each other moving forward.

-Julia

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Jakub Jursa


On 27.09.2017 14:46, Sahid Orentino Ferdjaoui wrote:
> On Mon, Sep 25, 2017 at 05:36:44PM +0200, Jakub Jursa wrote:
>> Hello everyone,
>>
>> We're experiencing issues with running large instances (~60GB RAM) on
>> fairly large NUMA nodes (4 CPUs, 256GB RAM) while using cpu pinning. The
>> problem is that it seems that in some extreme cases qemu/KVM can have
>> significant memory overhead (10-15%?) which nova-compute service doesn't
>> take in to the account when launching VMs. Using our configuration as an
>> example - imagine running two VMs with 30GB RAM on one NUMA node
>> (because we use cpu pinning) - therefore using 60GB out of 64GB for
>> given NUMA domain. When both VMs would consume their entire memory
>> (given 10% KVM overhead) OOM killer takes an action (despite having
>> plenty of free RAM in other NUMA nodes). (the numbers are just
>> arbitrary, the point is that nova-scheduler schedules the instance to
>> run on the node because the memory seems 'free enough', but specific
>> NUMA node can be lacking the memory reserve).
> 
> In Nova when using NUMA we do pin the memory on the host NUMA nodes
> selected during scheduling. In your case it seems that you have
> specificly requested a guest with 1 NUMA node. It will be not possible
> for the process to grab memory on an other host NUMA node but some
> other processes could be running in that host NUMA node and consume
> memory.

Yes, that is very likely the case - that some other processes consume
the memory on given NUMA node. It seems that setting flavor metadata
'hw:cpu_policy=dedicated' (while NOT setting 'hw:numa_nodes') results in
libvirt pinning CPU in 'strict' memory mode

(from libvirt xml for given instance)
...
  


  
...

So yeah, the instance is not able to allocate memory from another NUMA node.

> 
> What you need is to use Huge Pages, in such case the memory will be
> locked for the guest.

I'm not quite sure what do you mean by 'memory will be locked for the
guest'. Also, aren't huge pages enabled in kernel by default?

> 
>> Our initial solution was to use ram_allocation_ratio < 1 to ensure
>> having some reserved memory - this didn't work. Upon studying source of
>> nova, it turns out that ram_allocation_ratio is ignored when using cpu
>> pinning. (see
>> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L859
>> and
>> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L821
>> ). We're running Mitaka, but this piece of code is implemented in Ocata
>> in a same way.
>> We're considering to create a patch for taking ram_allocation_ratio in
>> to account.
>>
>> My question is - is ram_allocation_ratio ignored on purpose when using
>> cpu pinning? If yes, what is the reasoning behind it? And what would be
>> the right solution to ensure having reserved RAM on the NUMA nodes?
>>
>> Thanks.
>>
>> Regards,
>>
>> Jakub Jursa
>>
> 
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] how does UEFI booting of VM manage per-instance copies of OVMF_VARS.fd ?

2017-09-27 Thread Waines, Greg
Hey there ... a question about UEFI booting of VMs.
i.e.

glance image-create --file cloud-2730. qcow --disk-format qcow2 
--container-format bare --property “hw-firmware-type=uefi” --name 
clear-linux-image

in order to specify that you want to use UEFI (instead of BIOS) when booting 
VMs with this image
i.e./usr/share/OVMF/OVMF_CODE.fd
  /usr/share/OVMF/OVMF_VARS.fd

and I believe you can boot into the UEFI Shell, i.e. to change UEFI variables 
in NVRAM (OVMF_VARS.fd) by
booting VM with /usr/share/OVMF/UefiShell.iso in cd ...
e.g. to changes Secure Boot keys or something like that.

My QUESTION ...

· how does NOVA manage a unique instance of OVMF_VARS.fd for each 
instance ?

oi believe OVMF_VARS.fd is suppose to just be used as a template, and
is supposed to be copied to make a unique instance for each VM that UEFI boots

ohow does NOVA manage this ?

§  e.g. is the unique instance of OVMF_VARS.fd created in
 /etc/nova/instances//  ?

o... and does this get migrated to another compute if VM is migrated ?

Greg.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [FEMDC] IRC meeting Today 15:00 UTC

2017-09-27 Thread lebre . adrien
Dear all, 

A gentle reminder of the FEDMC meeting, Today at 15:00 UTC. 
The agenda is available at: 
https://etherpad.openstack.org/p/massively_distributed_ircmeetings_2017 (line 
1237)

Please feel free to complete it

Best,
ad_rien_
PS: Inria's members will not be able to attend our IC meeting (midterm review 
of the Discovery initiative), Paul-Andre will chair the discussion. 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Sahid Orentino Ferdjaoui
On Mon, Sep 25, 2017 at 05:36:44PM +0200, Jakub Jursa wrote:
> Hello everyone,
> 
> We're experiencing issues with running large instances (~60GB RAM) on
> fairly large NUMA nodes (4 CPUs, 256GB RAM) while using cpu pinning. The
> problem is that it seems that in some extreme cases qemu/KVM can have
> significant memory overhead (10-15%?) which nova-compute service doesn't
> take in to the account when launching VMs. Using our configuration as an
> example - imagine running two VMs with 30GB RAM on one NUMA node
> (because we use cpu pinning) - therefore using 60GB out of 64GB for
> given NUMA domain. When both VMs would consume their entire memory
> (given 10% KVM overhead) OOM killer takes an action (despite having
> plenty of free RAM in other NUMA nodes). (the numbers are just
> arbitrary, the point is that nova-scheduler schedules the instance to
> run on the node because the memory seems 'free enough', but specific
> NUMA node can be lacking the memory reserve).

In Nova when using NUMA we do pin the memory on the host NUMA nodes
selected during scheduling. In your case it seems that you have
specificly requested a guest with 1 NUMA node. It will be not possible
for the process to grab memory on an other host NUMA node but some
other processes could be running in that host NUMA node and consume
memory.

What you need is to use Huge Pages, in such case the memory will be
locked for the guest.

> Our initial solution was to use ram_allocation_ratio < 1 to ensure
> having some reserved memory - this didn't work. Upon studying source of
> nova, it turns out that ram_allocation_ratio is ignored when using cpu
> pinning. (see
> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L859
> and
> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L821
> ). We're running Mitaka, but this piece of code is implemented in Ocata
> in a same way.
> We're considering to create a patch for taking ram_allocation_ratio in
> to account.
> 
> My question is - is ram_allocation_ratio ignored on purpose when using
> cpu pinning? If yes, what is the reasoning behind it? And what would be
> the right solution to ensure having reserved RAM on the NUMA nodes?
> 
> Thanks.
> 
> Regards,
> 
> Jakub Jursa
> 

> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [glance] Queens PTG: Thursday summary

2017-09-27 Thread Belmiro Moreira
Hi Blair,
In the past we used the tabs but latest Horizon versions use the visibility
column/search instead.
The issue is that we would like the old images to continue to be
discoverable by everyone and have a image list that only shows the latest
ones.
If the images continue to be public they will be shown by the CLIs in the
default image-list. In our case the list was very long.

We are now using the “community” visibility to hide the old images from the
default image list. But it’s not ideal.
I will move the old spec about image lifecycle to glance.
https://review.openstack.org/#/c/327980/

Cheers,
Belmiro


On Wed, 27 Sep 2017 at 00:25, Blair Bethwaite 
wrote:

> Hi Belmiro,
>
>
> On 20 Sep. 2017 7:58 pm, "Belmiro Moreira" <
> moreira.belmiro.email.li...@gmail.com> wrote:
> > Discovering the latest image release is hard. So we added an image
> property "recommended"
> > that we update when a new image release is available. Also, we patched
> horizon to show
> > the "recommended" images first.
>
> There is built in support in Horizon that allows displaying multiple image
> category tabs where each takes contents from the list of images owned by a
> specific project/tenant. In the Nectar research cloud this is what we rely
> on to distinguish between "Public", "Project", "Nectar" (the base images we
> maintain), and "Contributed" (images contributed by users who wish them to
> be tested by us and effectively promoted as quality assured). When we
> update a "Nectar" or "Contributed" image the old version stays public but
> is moved into a project for deprecated images of that category, where
> eventually we can clean it up.
>
>
> > This helps our users to identify the latest image release but we
> continue to show for
> > each project the full list of public images + all personal user images.
>
> Could you use the same model as us?
>
> Cheers,
> b1airo
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] OpenStack-Ansible and Trove support

2017-09-27 Thread Jean-Philippe Evrard
Hello Michael,

On top of that, we intend to have a "role maturity" that will include
when the role was proposed and it's current maturity phase, for more
clarity, not unlike openstack project navigator.

Our os_trove role has not received many commits recently, and the
"maintenance mode" of Trove will probably impact you in the future.
Do you intend to keep a trove installation in production, or do you
want to do a PoC?

Best regards,
JP

On Wed, Sep 27, 2017 at 12:24 AM, Amy Marrich  wrote:
> Michael,
>
> There are release notes for each release that will go over what's new,
> what's on it's way out or even gone as well as bug fixes and other
> information. Here's a link to the Ocata release notes for OpenStack-Ansible
> which includes the announcement of the Trove role.
>
> https://docs.openstack.org/releasenotes/openstack-ansible/ocata.html
>
> Thanks,
>
> Amy (spotz)
>
> On Tue, Sep 26, 2017 at 6:04 PM, Michael Gale 
> wrote:
>>
>> Hello,
>>
>>Based on github and
>> https://docs.openstack.org/openstack-ansible-os_trove/latest/ it looks like
>> OpenStack-Ansible will support Trove under the Ocata release.
>>
>> Is that assumption correct? is there a better method to determine when a
>> software component will likely be included in a release?
>>
>> Michael
>>
>> __
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] OpenStack-Ansible testing with OpenVSwitch

2017-09-27 Thread Jean-Philippe Evrard
Hello,

We currently don't have a full scenario for openvswitch for an easy
"one line" install.
It still deserves more love. You could come on our channel in
#openstack-ansible to discuss about it if you want. But the general
idea should be close to the same explained in the blog post.

Best regards,
JP

On Wed, Sep 27, 2017 at 12:13 AM, Michael Gale  wrote:
> Hello,
>
> I am trying to build a Pike All-in-One instance for OpenStack Ansible
> testing, currently I have a few OpenStack versions being deployed using the
> default Linux Bridge implementation.
>
> However I need a test environment to validate OpenVSwitch implementation, is
> there a simple method to get an AIO installed?
>
> I tried following
> https://medium.com/@travistruman/configuring-openstack-ansible-for-open-vswitch-b7e70e26009d
> however Neutron is blowing up because it can't determine the name for the
> Neutron Server. I am not sure if that is my issue or not, a reference
> implementation of OpenStack AIO with OpenVSwitch would help me a lot.
>
> Thanks
> Michael
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [vitrage] Vitrage virtual PTG

2017-09-27 Thread Afek, Ifat (Nokia - IL/Kfar Sava)
Hi,

We will hold the Vitrage virtual PTG on October 17-19. I have created an 
initial schedule draft, you are more than welcome to comment or suggest new 
topics for discussion:
https://etherpad.openstack.org/p/vitrage-ptg-queens

Best Regards,
Ifat.

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [keystone] [keystoneauth] Debug data isn't sanitized - bug 1638978

2017-09-27 Thread Bhor, Dinesh
Hi Team,

There are four solutions to fix the below bug:
https://bugs.launchpad.net/keystoneauth/+bug/1638978

1) Carry a copy of mask_password() method to keystoneauth from oslo_utils [1]:
Pros:
A. keystoneauth will use already tested and used version of mask_password.

Cons:
A. keystoneauth will have to keep the version of mask_password() method sync 
with oslo_utils version.
 If there are any new "_SANITIZE_KEYS" added to oslo_utils mask_password 
then those should be added in keystoneauth mask_password also.
B. Copying the "mask_password" will also require to copy its supporting code 
[2] which is huge.


2) Use Oslo.utils mask_password() method in keystoneauth:
Pros:
A) No synching issue as described in solution #1. keystoneauth will directly 
use mask_password() method from Oslo.utils.

Cons:
A) You will need oslo.utils library to use keystoneauth.
Objection by community:
- keystoneauth community don't want any dependency on any of OpenStack common 
oslo libraries.
Please refer to the comment from Morgan: 
https://bugs.launchpad.net/keystoneauth/+bug/1700751/comments/3


3) Add a custom logging filter in oslo logger
Please refer to POC sample here: http://paste.openstack.org/show/617093/
OpenStack core services using any OpenStack individual python-*client (for e.g 
python-cinderclient used in nova service) will need to pass oslo_logger object 
during it's
initialization which will do the work of masking sensitive information.
Note: In nova, oslo.logger object is not passed during cinder client 
initialization 
(https://github.com/openstack/nova/blob/master/nova/volume/cinder.py#L135-L141),
In this case, sensitive information will not be masked as it isn't using 
Oslo.logger.

Pros:
A) No changes required in oslo.logger or any OpenStack services if 
mask_password method is modified in oslo.utils.

Cons:
A) Every log message will be scanned for certain password fields degrading the 
performance.
B) If consumer of keystoneauth doesn't use oslo_logger, then the sensitive 
information will not be masked.
C) Will need to make changes wherever applicable to the OpenStack core services 
to pass oslo.logger object during python-novaclient initialization.


4) Add mask_password formatter parameter in oslo_log:
Add "mask_password" formatter to sanitize sensitive data and pass it as a 
keyword argument to the log statement.
If the mask_password is set, then only the sensitive information will be masked 
at the time of logging.
The log statement will look like below:

logger.debug("'adminPass': 'Now you see me'"), mask_password=True)

Please refer to the POC code here: http://paste.openstack.org/show/618019/

Pros:
A) No changes required in oslo.logger or any OpenStack services if 
mask_password method is modified in oslo.utils.

Cons:
A) If consumer of keystoneauth doesn't use oslo_logger, then the sensitive 
information will not be masked.
B) If you forget to pass mask_password=True for logging messages where 
sensitive information is present, then those fields won't be masked with ***.
 But this can be clearly documented as suggested by Morgan and Lance.
C) This solution requires you to add a below check in keystoneauth to avoid 
from an exception being raised in case logger is pure python Logger as it
  doesn't accept mask_password keyword argument.

if isinstance(logger, logging.Logger):
logger.debug(' '.join(string_parts))
else:
logger.debug(' '.join(string_parts), mask_password=True)

This check assumes that the logger instance will be oslo_log only if it is not 
of python default logging.Logger.
Keystoneauth community is not ready to have any dependency on any oslo-* lib, 
so it seems this solution has low acceptance chances.

Please let me know your opinions about the above four approaches. Which one 
should we adopt?

[1] 
https://github.com/openstack/oslo.utils/blob/master/oslo_utils/strutils.py#L248-L313
[2] 
https://github.com/openstack/oslo.utils/blob/6e04f882c4308ff64fa199d1b127ad225e0a30c4/oslo_utils/strutils.py#L56-L96

Thanks and Regards,
Dinesh Bhor | App. Software Dev. Cnslt.
dinesh.b...@nttdata.com | VOIP. 8833.8395I | 
nttdata.com/americas
NTT DATA, Inc.
Consulting | Digital | Managed Services | Industry Solutions
Learn more:
[Description: Description: 
cid:image005.jpg@01D193F0.F70B44C0]

[Description: Description: 
cid:image009.jpg@01D193F0.F70B44C0]

[Description: Description: 
cid:image010.jpg@01D193F0.F70B44C0]

[Description: Description: 
cid:image011.jpg@01D193F0.F70B44C0]



__
Disclaimer: This email and any attachments are sent in strictest confidence
for the sole use of the addressee and may contain legally privileged,
confidential, and proprietary data. If you are not the 

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Balazs Gibizer



On Wed, Sep 27, 2017 at 11:58 AM, Jakub Jursa 
 wrote:



On 27.09.2017 11:12, Jakub Jursa wrote:



 On 27.09.2017 10:40, Blair Bethwaite wrote:
 On 27 September 2017 at 18:14, Stephen Finucane 
 wrote:
 What you're probably looking for is the 'reserved_host_memory_mb' 
option. This
 defaults to 512 (at least in the latest master) so if you up this 
to 4192 or

 similar you should resolve the issue.


 I don't see how this would help given the problem description -
 reserved_host_memory_mb would only help avoid causing OOM when
 launching the last guest that would otherwise fit on a host based 
on
 Nova's simplified notion of memory capacity. It sounds like both 
CPU

 and NUMA pinning are in play here, otherwise the host would have no
 problem allocating RAM on a different NUMA node and OOM would be
 avoided.


 I'm not quite sure if/how OpenStack handles NUMA pinning (why is VM
 being killed by OOM rather than having memory allocated on different
 NUMA node). Anyway, good point, thank you, I should have a look at 
exact

 parameters passed to QEMU when using CPU pinning.



 Jakub, your numbers sound reasonable to me, i.e., use 60 out of 
64GB


 Hm, but the question is, how to prevent having some smaller instance
 (e.g. 2GB RAM) scheduled on such NUMA node?


 when only considering QEMU overhead - however I would expect that
 might  be a problem on NUMA node0 where there will be extra 
reserved
 memory regions for kernel and devices. In such a configuration 
where

 you are wanting to pin multiple guests into each of multiple NUMA
 nodes I think you may end up needing different flavor/instance-type
 configs (using less RAM) for node0 versus other NUMA nodes. Suggest


 What do you mean using different flavor? From what I understand (
 
http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/virt-driver-numa-placement.html
 https://docs.openstack.org/nova/pike/admin/cpu-topologies.html ) it 
can

 be specified that flavor 'wants' different amount memory from its
 (virtual) NUMA nodes, but mapping vCPU <-> pCPU is more or less
 arbitrary (meaning that there is no way how to specify for NUMA 
node0 on

 physical host that it has less memory available for VM allocation)


Can't be 'reserved_huge_pages' option used to reserve memory on 
certain

NUMA nodes?
https://docs.openstack.org/ocata/config-reference/compute/config-options.html


I think the qemu memory overhead is allocated from the 4k memory pool 
so the question is if it is possible to reserve 4k pages with the 
reserved_huge_pages config option. I don't find any restriction in the 
code base about 4k pages (even if it is not considered as a large page 
by definition) so in theory you can do it. However this also means you 
have to enable NumaTopologyFilter.


Cheers,
gibi








 freshly booting one of your hypervisors and then with no guests
 running take a look at e.g. /proc/buddyinfo/ and /proc/zoneinfo to 
see

 what memory is used/available and where.



 Thanks, I'll look into it.


 Regards,

 Jakub



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 
openstack-dev-requ...@lists.openstack.org?subject:unsubscribe

http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Jakub Jursa


On 27.09.2017 11:12, Jakub Jursa wrote:
> 
> 
> On 27.09.2017 10:40, Blair Bethwaite wrote:
>> On 27 September 2017 at 18:14, Stephen Finucane  wrote:
>>> What you're probably looking for is the 'reserved_host_memory_mb' option. 
>>> This
>>> defaults to 512 (at least in the latest master) so if you up this to 4192 or
>>> similar you should resolve the issue.
>>
>> I don't see how this would help given the problem description -
>> reserved_host_memory_mb would only help avoid causing OOM when
>> launching the last guest that would otherwise fit on a host based on
>> Nova's simplified notion of memory capacity. It sounds like both CPU
>> and NUMA pinning are in play here, otherwise the host would have no
>> problem allocating RAM on a different NUMA node and OOM would be
>> avoided.
> 
> I'm not quite sure if/how OpenStack handles NUMA pinning (why is VM
> being killed by OOM rather than having memory allocated on different
> NUMA node). Anyway, good point, thank you, I should have a look at exact
> parameters passed to QEMU when using CPU pinning.
> 
>>
>> Jakub, your numbers sound reasonable to me, i.e., use 60 out of 64GB
> 
> Hm, but the question is, how to prevent having some smaller instance
> (e.g. 2GB RAM) scheduled on such NUMA node?
> 
>> when only considering QEMU overhead - however I would expect that
>> might  be a problem on NUMA node0 where there will be extra reserved
>> memory regions for kernel and devices. In such a configuration where
>> you are wanting to pin multiple guests into each of multiple NUMA
>> nodes I think you may end up needing different flavor/instance-type
>> configs (using less RAM) for node0 versus other NUMA nodes. Suggest
> 
> What do you mean using different flavor? From what I understand (
> http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/virt-driver-numa-placement.html
> https://docs.openstack.org/nova/pike/admin/cpu-topologies.html ) it can
> be specified that flavor 'wants' different amount memory from its
> (virtual) NUMA nodes, but mapping vCPU <-> pCPU is more or less
> arbitrary (meaning that there is no way how to specify for NUMA node0 on
> physical host that it has less memory available for VM allocation)

Can't be 'reserved_huge_pages' option used to reserve memory on certain
NUMA nodes?
https://docs.openstack.org/ocata/config-reference/compute/config-options.html


> 
>> freshly booting one of your hypervisors and then with no guests
>> running take a look at e.g. /proc/buddyinfo/ and /proc/zoneinfo to see
>> what memory is used/available and where.
>>
> 
> Thanks, I'll look into it.
> 
> 
> Regards,
> 
> Jakub
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] reset key pair during rebuilding

2017-09-27 Thread Sean Dague
On 09/27/2017 05:15 AM, Marcus Furlong wrote:
> On 27 September 2017 at 09:23, Michael Still  wrote:
>>
>> Operationally, why would I want to inject a new keypair? The scenario I can
>> think of is that there's data in that instance that I want, and I've lost
>> the keypair somehow. Unless that data is on an ephemeral, its gone if we do
>> a rebuild.
> 
> This is quite a common scenario - staff member who started the
> instance leaves, and you want to access data on the instance, or
> maintain/debug the service running on the instance.
> 
> Hitherto, I have used direct db calls to update the key, so it would
> be nice if there was an API call to do so.

But you also triggered a rebuild in the process? Or you tweaked the keys
and did a reboot? This use case came up in the room, but then we started
trying to figure out if the folks that mostly had it would also need it
on reboot.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [ocata] [nova-api] Nova api stopped working after a yum update

2017-09-27 Thread Avery Rozar
Hello all,
I ran "yum update" on my OpenStack controller and now any request to the
nova.api service (port 8774) results in an error in
"/var/log/nova/nova-api.log".

A simple get request,

GET /v2.1/os-hypervisors/detail HTTP/1.1
Host: host.domain.com:8774
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:54.0)
Gecko/20100101 Firefox/54.0
X-Auth-Token: 
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Content-Type: application/json
Content-Length: 0
DNT: 1
Connection: close
Upgrade-Insecure-Requests: 1


Results in and error logged to "/var/log/nova/nova-api.log

WARNING keystoneauth.identity.generic.base [-] Discovering versions from
the identity service failed when creating the password plugin. Attempting
to determine version from URL.
ERROR nova.api.openstack [-] Caught error: Could not determine a suitable
URL for the plugin
ERROR nova.api.openstack Traceback (most recent call last):
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/nova/api/openstack/__init__.py", line 88, in __call__
ERROR nova.api.openstack return req.get_response(self.application)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/request.py",
line 1299, in send
ERROR nova.api.openstack application, catch_exc_info=False)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/request.py",
line 1263, in call_application
ERROR nova.api.openstack app_iter = application(self.environ,
start_response)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/dec.py",
line 144, in __call__
ERROR nova.api.openstack return resp(environ, start_response)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/dec.py",
line 130, in __call__
ERROR nova.api.openstack resp = self.call_func(req, *args,
**self.kwargs)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/dec.py",
line 195, in call_func
ERROR nova.api.openstack return self.func(req, *args, **kwargs)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/osprofiler/web.py",
line 108, in __call__
ERROR nova.api.openstack return request.get_response(self.application)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/request.py",
line 1299, in send
ERROR nova.api.openstack application, catch_exc_info=False)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/request.py",
line 1263, in call_application
ERROR nova.api.openstack app_iter = application(self.environ,
start_response)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/dec.py",
line 130, in __call__
ERROR nova.api.openstack resp = self.call_func(req, *args,
**self.kwargs)
ERROR nova.api.openstack   File
"/usr/lib/python2.7/site-packages/webob/dec.py",
line 195, in call_func
ERROR nova.api.openstack return self.func(req, *args, **kwargs)
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/__init__.py", line 332, in __call__
ERROR nova.api.openstack response = self.process_request(req)
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/__init__.py", line 623, in
process_request
ERROR nova.api.openstack resp = super(AuthProtocol,
self).process_request(request)
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/__init__.py", line 405, in
process_request
ERROR nova.api.openstack allow_expired=allow_expired)
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/__init__.py", line 435, in
_do_fetch_token
ERROR nova.api.openstack data = self.fetch_token(token, **kwargs)
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/__init__.py", line 762, in
fetch_token
ERROR nova.api.openstack allow_expired=allow_expired)
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/_identity.py", line 217, in
verify_token
ERROR nova.api.openstack auth_ref = self._request_strategy.verify_token(
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/_identity.py", line 168, in
_request_strategy
ERROR nova.api.openstack strategy_class = self._get_strategy_class()
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystonemiddleware/auth_token/_identity.py", line 190, in
_get_strategy_class
ERROR nova.api.openstack if self._adapter.get_endpoint(
version=klass.AUTH_VERSION):
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystoneauth1/adapter.py", line 176, in get_endpoint
ERROR nova.api.openstack return self.session.get_endpoint(auth or
self.auth, **kwargs)
ERROR nova.api.openstack   File "/usr/lib/python2.7/site-
packages/keystoneauth1/session.py", line 856, in get_endpoint
ERROR 

[openstack-dev] [publiccloud-wg] Reminder meeting PublicCloudWorkingGroup

2017-09-27 Thread Tobias Rydberg

Hi everyone,

Don't forget todays meeting for the PublicCloudWorkingGroup.
1400 UTC  in IRC channel #openstack-meeting-3

Etherpad and agenda: https://etherpad.openstack.org/p/publiccloud-wg

Regards,
Tobias Rydberg


smime.p7s
Description: S/MIME Cryptographic Signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [ptg] Simplification in OpenStack

2017-09-27 Thread Gyorgy Szombathelyi
Hi,

> The install docs still suggest hand configuring machines in 2017. It’s only 
> after
> people fall down that snake pit that they find projects like
> TripleO/Ansible/Puppet/Chef, and wonder why everyone doesn’t use this
> stuff.

I just wondering, too, but about a different thing: the install doc writes 
nicely
how to install and configure OpenStack as an average Linux admin would do it.
Install packages/modify config files and you're ready. These steps are not 
necessary
to be hand-executed, they can be easily automated (Ansible comes to my mind 
first, as
the most user-friendly config management tool for me). Then the sysadmin looks
at the official deployment tools: they're doing their job with exta layers, 
extra things
which are not in the install docs, like creating containers, installing 
OpenStack from git,
installing an OpenStack before installing the real OpenStack, etc...
They're just overcomplicated, to be honest. 

As an operator myself, I want a solid OpenStack installation, which I can 
manage and upgrade,
not tens of containers, or other stuff which I cannot touch unless I take the 
risk of blowing up
everything. With the traditional method (packages/config management) I can sit 
and lay back,
upgrade when I want (did it from Liberty to Ocata in real OpenStack clusters, 
that means 3 upgrades,
and the clusters are still alive), can apply updates when a pacakage is 
released, and simply I feel 
that the infra is under my control, not under some install tool. 

These were the reasons why I wrote my ansible playbook set, and I still feel it 
was a
good decision (more than 2 years OpenStack operation experience says that).
I understand, maybe some wants to be at the bleeding edge, likes to run the 
most recent git revisions,
but most of them wants a stable installation in production.

I don't know if this opinion counts, but what I would like to see stable, good 
quality OpenStack packages (I
know it is very distro-specific, but it is not the problem of OpenStack, but 
the Linux ecosystem - containers
are just a workaround and not the right solution), and simple installers which 
just install these packages
and configures them. No more, no less.

My 2 cents,
Br,
György
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] reset key pair during rebuilding

2017-09-27 Thread Marcus Furlong
On 27 September 2017 at 09:23, Michael Still  wrote:
>
> Operationally, why would I want to inject a new keypair? The scenario I can
> think of is that there's data in that instance that I want, and I've lost
> the keypair somehow. Unless that data is on an ephemeral, its gone if we do
> a rebuild.

This is quite a common scenario - staff member who started the
instance leaves, and you want to access data on the instance, or
maintain/debug the service running on the instance.

Hitherto, I have used direct db calls to update the key, so it would
be nice if there was an API call to do so.

Cheers,
Marcus.
-- 
Marcus Furlong

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Jakub Jursa


On 27.09.2017 10:40, Blair Bethwaite wrote:
> On 27 September 2017 at 18:14, Stephen Finucane  wrote:
>> What you're probably looking for is the 'reserved_host_memory_mb' option. 
>> This
>> defaults to 512 (at least in the latest master) so if you up this to 4192 or
>> similar you should resolve the issue.
> 
> I don't see how this would help given the problem description -
> reserved_host_memory_mb would only help avoid causing OOM when
> launching the last guest that would otherwise fit on a host based on
> Nova's simplified notion of memory capacity. It sounds like both CPU
> and NUMA pinning are in play here, otherwise the host would have no
> problem allocating RAM on a different NUMA node and OOM would be
> avoided.

I'm not quite sure if/how OpenStack handles NUMA pinning (why is VM
being killed by OOM rather than having memory allocated on different
NUMA node). Anyway, good point, thank you, I should have a look at exact
parameters passed to QEMU when using CPU pinning.

> 
> Jakub, your numbers sound reasonable to me, i.e., use 60 out of 64GB

Hm, but the question is, how to prevent having some smaller instance
(e.g. 2GB RAM) scheduled on such NUMA node?

> when only considering QEMU overhead - however I would expect that
> might  be a problem on NUMA node0 where there will be extra reserved
> memory regions for kernel and devices. In such a configuration where
> you are wanting to pin multiple guests into each of multiple NUMA
> nodes I think you may end up needing different flavor/instance-type
> configs (using less RAM) for node0 versus other NUMA nodes. Suggest

What do you mean using different flavor? From what I understand (
http://specs.openstack.org/openstack/nova-specs/specs/juno/implemented/virt-driver-numa-placement.html
https://docs.openstack.org/nova/pike/admin/cpu-topologies.html ) it can
be specified that flavor 'wants' different amount memory from its
(virtual) NUMA nodes, but mapping vCPU <-> pCPU is more or less
arbitrary (meaning that there is no way how to specify for NUMA node0 on
physical host that it has less memory available for VM allocation)

> freshly booting one of your hypervisors and then with no guests
> running take a look at e.g. /proc/buddyinfo/ and /proc/zoneinfo to see
> what memory is used/available and where.
> 

Thanks, I'll look into it.


Regards,

Jakub

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Jakub Jursa


On 27.09.2017 10:14, Stephen Finucane wrote:
> On Mon, 2017-09-25 at 17:36 +0200, Jakub Jursa wrote:
>> Hello everyone,
>>
>> We're experiencing issues with running large instances (~60GB RAM) on
>> fairly large NUMA nodes (4 CPUs, 256GB RAM) while using cpu pinning. The
>> problem is that it seems that in some extreme cases qemu/KVM can have
>> significant memory overhead (10-15%?) which nova-compute service doesn't
>> take in to the account when launching VMs. Using our configuration as an
>> example - imagine running two VMs with 30GB RAM on one NUMA node
>> (because we use cpu pinning) - therefore using 60GB out of 64GB for
>> given NUMA domain. When both VMs would consume their entire memory
>> (given 10% KVM overhead) OOM killer takes an action (despite having
>> plenty of free RAM in other NUMA nodes). (the numbers are just
>> arbitrary, the point is that nova-scheduler schedules the instance to
>> run on the node because the memory seems 'free enough', but specific
>> NUMA node can be lacking the memory reserve).
>>
>> Our initial solution was to use ram_allocation_ratio < 1 to ensure
>> having some reserved memory - this didn't work. Upon studying source of
>> nova, it turns out that ram_allocation_ratio is ignored when using cpu
>> pinning. (see
>> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L859
>> and
>> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L821
>> ). We're running Mitaka, but this piece of code is implemented in Ocata
>> in a same way.
>> We're considering to create a patch for taking ram_allocation_ratio in
>> to account.
>>
>> My question is - is ram_allocation_ratio ignored on purpose when using
>> cpu pinning? If yes, what is the reasoning behind it? And what would be
>> the right solution to ensure having reserved RAM on the NUMA nodes?
> 
> Both 'ram_allocation_ratio' and 'cpu_allocation_ratio' are ignored when using
> pinned CPUs because they don't make much sense: you want a high performance VM
> and have assigned dedicated cores to the instance for this purpose, yet you're
> telling nova to over-schedule and schedule multiple instances to some of these
> same cores.

I wanted to use 'ram_allocation_ration' with value for example 0.8 to
force 'under-schedule' the host, to create a reserve on the host.

> 
> What you're probably looking for is the 'reserved_host_memory_mb' option. This
> defaults to 512 (at least in the latest master) so if you up this to 4192 or
> similar you should resolve the issue.

I'm afraid that this won't help as this option doesn't take into account
NUMA nodes (e.g. there would be 'reserved_host_memory_mb' amount of free
memory on the physical node, but not in all its NUMA nodes

> 
> Hope this helps,
> Stephen
> 

Regards,

Jakub

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Blair Bethwaite
Also CC-ing os-ops as someone else may have encountered this before
and have further/better advice...

On 27 September 2017 at 18:40, Blair Bethwaite
 wrote:
> On 27 September 2017 at 18:14, Stephen Finucane  wrote:
>> What you're probably looking for is the 'reserved_host_memory_mb' option. 
>> This
>> defaults to 512 (at least in the latest master) so if you up this to 4192 or
>> similar you should resolve the issue.
>
> I don't see how this would help given the problem description -
> reserved_host_memory_mb would only help avoid causing OOM when
> launching the last guest that would otherwise fit on a host based on
> Nova's simplified notion of memory capacity. It sounds like both CPU
> and NUMA pinning are in play here, otherwise the host would have no
> problem allocating RAM on a different NUMA node and OOM would be
> avoided.
>
> Jakub, your numbers sound reasonable to me, i.e., use 60 out of 64GB
> when only considering QEMU overhead - however I would expect that
> might  be a problem on NUMA node0 where there will be extra reserved
> memory regions for kernel and devices. In such a configuration where
> you are wanting to pin multiple guests into each of multiple NUMA
> nodes I think you may end up needing different flavor/instance-type
> configs (using less RAM) for node0 versus other NUMA nodes. Suggest
> freshly booting one of your hypervisors and then with no guests
> running take a look at e.g. /proc/buddyinfo/ and /proc/zoneinfo to see
> what memory is used/available and where.
>
> --
> Cheers,
> ~Blairo



-- 
Cheers,
~Blairo

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Blair Bethwaite
On 27 September 2017 at 18:14, Stephen Finucane  wrote:
> What you're probably looking for is the 'reserved_host_memory_mb' option. This
> defaults to 512 (at least in the latest master) so if you up this to 4192 or
> similar you should resolve the issue.

I don't see how this would help given the problem description -
reserved_host_memory_mb would only help avoid causing OOM when
launching the last guest that would otherwise fit on a host based on
Nova's simplified notion of memory capacity. It sounds like both CPU
and NUMA pinning are in play here, otherwise the host would have no
problem allocating RAM on a different NUMA node and OOM would be
avoided.

Jakub, your numbers sound reasonable to me, i.e., use 60 out of 64GB
when only considering QEMU overhead - however I would expect that
might  be a problem on NUMA node0 where there will be extra reserved
memory regions for kernel and devices. In such a configuration where
you are wanting to pin multiple guests into each of multiple NUMA
nodes I think you may end up needing different flavor/instance-type
configs (using less RAM) for node0 versus other NUMA nodes. Suggest
freshly booting one of your hypervisors and then with no guests
running take a look at e.g. /proc/buddyinfo/ and /proc/zoneinfo to see
what memory is used/available and where.

-- 
Cheers,
~Blairo

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] reset key pair during rebuilding

2017-09-27 Thread Michael Still
One thing I'd like to explore is what the functional difference between a
rebuild and a delete / create cycle is. With a rebuild you get to keep your
IP I suppose, but that could also be true of floating IPs for a delete /
create as well.

Operationally, why would I want to inject a new keypair? The scenario I can
think of is that there's data in that instance that I want, and I've lost
the keypair somehow. Unless that data is on an ephemeral, its gone if we do
a rebuild.

Michael


On Wed, Sep 27, 2017 at 4:05 PM, LIU Yulong  wrote:

> On Wed, Sep 27, 2017 at 10:29 AM, Matt Riedemann 
> wrote:
>
>> On 9/23/2017 8:58 AM, LIU Yulong wrote:
>>
>>> Hi nova developers,
>>>
>>> This mail is proposed to reconsider the key pair resetting of instance.
>>> The nova queens PTG discuss is here: https://etherpad.openstack.org
>>> /p/nova-ptg-queens  L498.
>>> And there are now two proposals.
>>>
>>> 1. SPEC 1: https://review.openstack.org/#/c/375221/ <
>>> https://review.openstack.org/#/c/375221/> started by me (liuyulong)
>>> since sep 2016.
>>>
>>> This spec will allow setting the new key_name for the instance
>>> during rebuild API. That’s a very simple and well-understood approach:
>>>
>>>   * Make consistent with rebuild API properties, such as name, imageRef,
>>> metadata, adminPass etc.
>>>   * Rebuild API is something like `recreating`, this is the right way to
>>> do key pair updating. For keypair-login-only VM, this is the key
>>> point.
>>>   * This does not involve to other APIs like reboot/unshelve etc.
>>>
>>
>> This was one of the issues I brought up in IRC, is that if we just
>> implemented this for the rebuild API, then someone could also ask that we
>> do it for things like reboot, cold migrate/resize, unshelve, etc. Anything
>> that involves re-creating the guest.
>>
>> IMHO, rebuild has its own meaning is that we are going to recreate a VM.
> So those inputs such as name, key, password should have a chance to be
> reset in this `rebuild` interface. Unlike reboot, cold migrate/resize,
> unshelve, those actions does not have such potential implication. If
> anything else involved, you are expanding those actions (reboot, cold
> migrate/resize, unshelve).
>
>
>
>>   * Easy to use, only one API.
>>>
>>
>> Until someone says we should also do it for the other APIs, as noted
>> above.
>>
>> This could not be acceptable. Other APIs does not have such `recreating`
> background. For rebuild, you are going to renew an instance, so those
> params for instance creation should have chance to be reset.
>
>
>>
>>> By the way, here is the patch (https://review.openstack.org/#/c/379128/
>>> ) which has implemented this
>>> spec. And it stays there more than one year too.
>>>
>>
>> It's been open because the spec was never approved. Just a procedural
>> issue.
>>
>>
>>> 2. SPEC 2 : https://review.openstack.org/#/c/506552/ <
>>> https://review.openstack.org/#/c/506552/> propose by Kevin_zheng.
>>>
>>> This spec supposed to add a new updating API for one instance’s key
>>> pair. This one has one foreseeable advantage for this is to do instance
>>> running key injection.
>>>
>>> But it may cause some issues:
>>>
>>>   * This approach needs to update the instance key pair first (one step,
>>> API call). And then do a reboot/rebuild or any actions causing the
>>> vm restart (second step, another API call). Firstly, this is waste,
>>> it use two API calls. Secondly, if updating key pair was done, and
>>> the reboot was not. That may result an inconsistent between instance
>>> DB key pair and guest VM inside key. Cloud user may confused to
>>> choose which key should be used to login.
>>>
>>
>> 1. I don't think multiple API calls is a problem. Any GUI or
>> orchestration tool can stitch these APIs together for what appears to be a
>> single operation for the end user. Furthermore, with multiple options about
>> what to do after the instance.key_name is updated, something like a GUI
>> could present the user with the option to picking if they want to reboot or
>> rebuild after the key is updated.
>>
>> We provided a discontinuous API, so we should take responsibilities for
> it. This inconsistent between instance DB key pair and guest VM inside can
> stay there. So GUI or orchestration tool can not be a reasonable support.
> More API calls may cause more problems. What if the GUI or orchestration
> tool user/developer forget the second API? What if the first API failed,
> should the retry all the APIs? Which key should be used to login if first
> API succeed and the second not succeed/not response? What if the second API
> failed? They confused again and again.
>
>
>
>> 2. An orchestrator or GUI would make sure that both APIs are called. For
>> a user that is updating the key_name, they should realize they need to make
>> another API call to enable it. 

Re: [openstack-dev] [nova] Running large instances with CPU pinning and OOM

2017-09-27 Thread Stephen Finucane
On Mon, 2017-09-25 at 17:36 +0200, Jakub Jursa wrote:
> Hello everyone,
> 
> We're experiencing issues with running large instances (~60GB RAM) on
> fairly large NUMA nodes (4 CPUs, 256GB RAM) while using cpu pinning. The
> problem is that it seems that in some extreme cases qemu/KVM can have
> significant memory overhead (10-15%?) which nova-compute service doesn't
> take in to the account when launching VMs. Using our configuration as an
> example - imagine running two VMs with 30GB RAM on one NUMA node
> (because we use cpu pinning) - therefore using 60GB out of 64GB for
> given NUMA domain. When both VMs would consume their entire memory
> (given 10% KVM overhead) OOM killer takes an action (despite having
> plenty of free RAM in other NUMA nodes). (the numbers are just
> arbitrary, the point is that nova-scheduler schedules the instance to
> run on the node because the memory seems 'free enough', but specific
> NUMA node can be lacking the memory reserve).
> 
> Our initial solution was to use ram_allocation_ratio < 1 to ensure
> having some reserved memory - this didn't work. Upon studying source of
> nova, it turns out that ram_allocation_ratio is ignored when using cpu
> pinning. (see
> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L859
> and
> https://github.com/openstack/nova/blob/mitaka-eol/nova/virt/hardware.py#L821
> ). We're running Mitaka, but this piece of code is implemented in Ocata
> in a same way.
> We're considering to create a patch for taking ram_allocation_ratio in
> to account.
> 
> My question is - is ram_allocation_ratio ignored on purpose when using
> cpu pinning? If yes, what is the reasoning behind it? And what would be
> the right solution to ensure having reserved RAM on the NUMA nodes?

Both 'ram_allocation_ratio' and 'cpu_allocation_ratio' are ignored when using
pinned CPUs because they don't make much sense: you want a high performance VM
and have assigned dedicated cores to the instance for this purpose, yet you're
telling nova to over-schedule and schedule multiple instances to some of these
same cores.

What you're probably looking for is the 'reserved_host_memory_mb' option. This
defaults to 512 (at least in the latest master) so if you up this to 4192 or
similar you should resolve the issue.

Hope this helps,
Stephen

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [freezer]Looking for previous release notes

2017-09-27 Thread hanc...@iscas.ac.cn
Hello,

To understand the ability and the issue of freezer project, I would look into 
several previous release notes, eg from mitaka to pike. However, the pike 
release note is what I could only find. For other previous release, the page is 
empty and I could find nothing. 

Anyone could give me a hint that where can I get them?

Thanks,
Chao
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [tc][nova][ironic][mogan] Evaluate Mogan project

2017-09-27 Thread Flavio Percoco

On 27/09/17 01:59 +, Jeremy Stanley wrote:

On 2017-09-27 09:15:21 +0800 (+0800), Zhenguo Niu wrote:
[...]

I don't mean there are deficiencies in Ironic. Ironic itself is cool, it
works well with TripleO, Nova, Kolla, etc. Mogan just want to be another
client to schedule workloards on Ironic and provide bare metal specific
APIs for users who seeks a way to provider virtual machines and bare metals
separately, or just bare metal cloud without interoperble with other
compute resources under Nova.

[...]

The short explanation which clicked for me (granted it's probably an
oversimplification, but still) was this: Ironic provides an admin
API for managing bare metal resources, while Mogan gives you a user
API (suitable for public cloud use cases) to your Ironic backend. I
suppose it could have been implemented in Ironic, but implementing
it separately allows Ironic to be agnostic to multiple user
frontends and also frees the Ironic team up from having to take on
yet more work directly.


ditto!

I had a similar question at the PTG and this was the answer that convinced be
may be worth the effort.

Flavio

--
@flaper87
Flavio Percoco


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][infra] Zuul v3 migration update

2017-09-27 Thread Flavio Percoco

Just wanted to say thanks to all of you for the hard work. I can only imagine
how hard it must be to do this migration without causing downtimes.

Flavio

On 26/09/17 18:04 -0500, Monty Taylor wrote:

Hey everybody,

We got significantly further along with our Zuul v3 rollout today. We
uncovered some fun bugs in the migration but were able to fix most of
them rather quickly.

We've pretty much run out of daylight though for the majority of the
team and there is a tricky zuul-cloner related issue to deal with, so
we're not going to push things further tonight. We're leaving most of
today's work in place, having gotten far enough that we feel
comfortable not rolling back.

The project-config repo should still be considered frozen except for
migration-related changes. Hopefully we'll be able to flip the final
switch early tomorrow.

If you haven't yet, please see [1] for information about the transition.

[1] https://docs.openstack.org/infra/manual/zuulv3.html

Thanks,

Monty

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


--
@flaper87
Flavio Percoco


signature.asc
Description: PGP signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] 答复: [Senlin] Senlin Queens Meetup

2017-09-27 Thread Lee Yi
21th Oct will be ok

On Tue, Sep 26, 2017 at 9:28 PM,  wrote:

>
> I will join.
>
> If time was changed on Oct 14th, 21th or 22th, It's also ok from me:)
>
>
>
> 原始邮件
> *发件人:* ;
> *收件人:* ;
> *日 期 :*2017年09月19日 22:06
> *主 题 :**[openstack-dev] [Senlin] Senlin Queens Meetup*
>
>
> Hi all,
> We are going to have a meetup to discuss the features and some other
> details about Senlin in Oct.
> Tentatively schedule:
> Date: 15th Oct.
> Location: Beijing, CHN
>
>
> Please leave your comments if you have any suggestion or the have conflict
> with the date.
>
> Sincerely,
> ruijie
>
>
>
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] reset key pair during rebuilding

2017-09-27 Thread LIU Yulong
On Wed, Sep 27, 2017 at 10:29 AM, Matt Riedemann 
wrote:

> On 9/23/2017 8:58 AM, LIU Yulong wrote:
>
>> Hi nova developers,
>>
>> This mail is proposed to reconsider the key pair resetting of instance.
>> The nova queens PTG discuss is here: https://etherpad.openstack.org
>> /p/nova-ptg-queens  L498.
>> And there are now two proposals.
>>
>> 1. SPEC 1: https://review.openstack.org/#/c/375221/ <
>> https://review.openstack.org/#/c/375221/> started by me (liuyulong)
>> since sep 2016.
>>
>> This spec will allow setting the new key_name for the instance during
>> rebuild API. That’s a very simple and well-understood approach:
>>
>>   * Make consistent with rebuild API properties, such as name, imageRef,
>> metadata, adminPass etc.
>>   * Rebuild API is something like `recreating`, this is the right way to
>> do key pair updating. For keypair-login-only VM, this is the key
>> point.
>>   * This does not involve to other APIs like reboot/unshelve etc.
>>
>
> This was one of the issues I brought up in IRC, is that if we just
> implemented this for the rebuild API, then someone could also ask that we
> do it for things like reboot, cold migrate/resize, unshelve, etc. Anything
> that involves re-creating the guest.
>
> IMHO, rebuild has its own meaning is that we are going to recreate a VM.
So those inputs such as name, key, password should have a chance to be
reset in this `rebuild` interface. Unlike reboot, cold migrate/resize,
unshelve, those actions does not have such potential implication. If
anything else involved, you are expanding those actions (reboot, cold
migrate/resize, unshelve).



>   * Easy to use, only one API.
>>
>
> Until someone says we should also do it for the other APIs, as noted above.
>
> This could not be acceptable. Other APIs does not have such `recreating`
background. For rebuild, you are going to renew an instance, so those
params for instance creation should have chance to be reset.


>
>> By the way, here is the patch (https://review.openstack.org/#/c/379128/ <
>> https://review.openstack.org/#/c/379128/>) which has implemented this
>> spec. And it stays there more than one year too.
>>
>
> It's been open because the spec was never approved. Just a procedural
> issue.
>
>
>> 2. SPEC 2 : https://review.openstack.org/#/c/506552/ <
>> https://review.openstack.org/#/c/506552/> propose by Kevin_zheng.
>>
>> This spec supposed to add a new updating API for one instance’s key pair.
>> This one has one foreseeable advantage for this is to do instance running
>> key injection.
>>
>> But it may cause some issues:
>>
>>   * This approach needs to update the instance key pair first (one step,
>> API call). And then do a reboot/rebuild or any actions causing the
>> vm restart (second step, another API call). Firstly, this is waste,
>> it use two API calls. Secondly, if updating key pair was done, and
>> the reboot was not. That may result an inconsistent between instance
>> DB key pair and guest VM inside key. Cloud user may confused to
>> choose which key should be used to login.
>>
>
> 1. I don't think multiple API calls is a problem. Any GUI or orchestration
> tool can stitch these APIs together for what appears to be a single
> operation for the end user. Furthermore, with multiple options about what
> to do after the instance.key_name is updated, something like a GUI could
> present the user with the option to picking if they want to reboot or
> rebuild after the key is updated.
>
> We provided a discontinuous API, so we should take responsibilities for
it. This inconsistent between instance DB key pair and guest VM inside can
stay there. So GUI or orchestration tool can not be a reasonable support.
More API calls may cause more problems. What if the GUI or orchestration
tool user/developer forget the second API? What if the first API failed,
should the retry all the APIs? Which key should be used to login if first
API succeed and the second not succeed/not response? What if the second API
failed? They confused again and again.



> 2. An orchestrator or GUI would make sure that both APIs are called. For a
> user that is updating the key_name, they should realize they need to make
> another API call to enable it. This would all be in the API reference
> documentation, CLI help, etc, that anyone doing this should read and
> understand.
>
>   * For the second step (reboot), there is a strong constraint is that
>> cloud-init config needs to be set to running-per-booting. But if a
>> cloud platform all images are set cloud-init to per-deployment. In
>> order to achieve this new API goal, the entire cloud platform images
>> need updating. This will cause a huge upgrading work for entire
>> cloud platform images. They need to change all the images cloud-init
>> config from running-per-deployment to running-every-boot. But that
>> still can not solve the