Re: [Openstack-operators] Problems (simple ones) at scale... Invisible VMs.

2016-05-18 Thread David Medberry
I don't think --marker does at all what I want. The --limit -1 does doe
multiple successive queries (with a marker) automagically returning a
single list as CLI output to nova list. That really IS what I want (and
what some of our automation is written around.)

Thanks!

On Wed, May 18, 2016 at 5:26 PM, James Downs  wrote:

> On Wed, May 18, 2016 at 04:37:42PM -0600, David Medberry wrote:
>
> > It seems to bypass it... or I'm running into a LOWER limit
> (undocumented).
> > So help on  limit -1 says it is still limited by osapi_max_limit
>
> You're looking for:
>
> --marker  The last server UUID of the previous page;
> displays list of servers after "marker".
>
> This is much faster than increasing the size of results, at least in
> sufficiently
> large environments.
>
> Cheers,
> -j
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Problems (simple ones) at scale... Invisible VMs.

2016-05-18 Thread James Downs
On Wed, May 18, 2016 at 04:37:42PM -0600, David Medberry wrote:

> It seems to bypass it... or I'm running into a LOWER limit (undocumented).
> So help on  limit -1 says it is still limited by osapi_max_limit

You're looking for:

--marker  The last server UUID of the previous page;
displays list of servers after "marker".

This is much faster than increasing the size of results, at least in 
sufficiently
large environments.

Cheers,
-j

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Problems (simple ones) at scale... Invisible VMs.

2016-05-18 Thread Chris Morgan
We just hit this and replaced a call to nova with a direct database query - 
literally yesterday! 

Chris Morgan

Sent from my iPhone

> On May 18, 2016, at 6:13 PM, David Medberry  wrote:
> 
> So, we just ran into an "at-sale" issue that shouldn't have been an issue.
> 
> Many of the OpenStack CLI tools accept a limit parameter (to limit how much 
> data you get back from a single query). However, much less well documented is 
> that there is an inherent limit that you will run into at a 1000 VMs (not 
> counting deleted ones). Many operators have already exceeded that limit and 
> likely run into this. With nova cli and openstack client, you can simply pass 
> in a limit of -1 to get around this (and though it will still make paged 
> queries, you won't have "invisible" VMs which is what I've begun to call the 
> ones that don't make it into the first/default page.
> 
> I can't really call this a bug for Nova (but it is definitely a bug for 
> Cinder which doesn't have a functional get me all of them command and is also 
> limited at 1000 for a single call but you can never get the rest at least in 
> our Liberty environment.)
> 
> box:~# nova list  |tail -n +4 |head -n -1 |wc
>1000   16326  416000
> box:~# nova list --limit -1  |tail -n +4 |head -n -1 |wc
>1060   17274  440960
> 
> (so I recently went over the limit of 1000)
> 
> YMMV.
> 
> Good luck.
> 
> -d
> ___
> OpenStack-operators mailing list
> OpenStack-operators@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Problems (simple ones) at scale... Invisible VMs.

2016-05-18 Thread David Medberry
It seems to bypass it... or I'm running into a LOWER limit (undocumented).
So help on  limit -1 says it is still limited by osapi_max_limit

I'll check my config for that value (likely the default) when I get home.

On Wed, May 18, 2016 at 4:18 PM, Kris G. Lindgren 
wrote:

> Nova has a config setting for the maximum number of results to be returned
> by a single call.  You can bump that up so that you can do a nova list
> —all-tenants and still see everything. However if I am reading the below
> correctly, then I didn't realize that the —limit –1 apparently by-passes
> that config option?
>
> ___
> Kris Lindgren
> Senior Linux Systems Engineer
> GoDaddy
>
> From: David Medberry 
> Date: Wednesday, May 18, 2016 at 4:13 PM
> To: "openstack-operators@lists.openstack.org" <
> openstack-operators@lists.openstack.org>
> Subject: [Openstack-operators] Problems (simple ones) at scale...
> Invisible VMs.
>
> So, we just ran into an "at-sale" issue that shouldn't have been an issue.
>
> Many of the OpenStack CLI tools accept a limit parameter (to limit how
> much data you get back from a single query). However, much less well
> documented is that there is an inherent limit that you will run into at a
> 1000 VMs (not counting deleted ones). Many operators have already exceeded
> that limit and likely run into this. With nova cli and openstack client,
> you can simply pass in a limit of -1 to get around this (and though it will
> still make paged queries, you won't have "invisible" VMs which is what I've
> begun to call the ones that don't make it into the first/default page.
>
> I can't really call this a bug for Nova (but it is definitely a bug for
> Cinder which doesn't have a functional get me all of them command and is
> also limited at 1000 for a single call but you can never get the rest at
> least in our Liberty environment.)
>
> box:~# nova list  |tail -n +4 |head -n -1 |wc
>1000   16326  416000
> box:~# nova list --limit -1  |tail -n +4 |head -n -1 |wc
>1060   17274  440960
>
> (so I recently went over the limit of 1000)
>
> YMMV.
>
> Good luck.
>
> -d
>
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] Problems (simple ones) at scale... Invisible VMs.

2016-05-18 Thread Kris G. Lindgren
Nova has a config setting for the maximum number of results to be returned by a 
single call.  You can bump that up so that you can do a nova list —all-tenants 
and still see everything. However if I am reading the below correctly, then I 
didn't realize that the —limit –1 apparently by-passes that config option?

___
Kris Lindgren
Senior Linux Systems Engineer
GoDaddy

From: David Medberry >
Date: Wednesday, May 18, 2016 at 4:13 PM
To: 
"openstack-operators@lists.openstack.org"
 
>
Subject: [Openstack-operators] Problems (simple ones) at scale... Invisible VMs.

So, we just ran into an "at-sale" issue that shouldn't have been an issue.

Many of the OpenStack CLI tools accept a limit parameter (to limit how much 
data you get back from a single query). However, much less well documented is 
that there is an inherent limit that you will run into at a 1000 VMs (not 
counting deleted ones). Many operators have already exceeded that limit and 
likely run into this. With nova cli and openstack client, you can simply pass 
in a limit of -1 to get around this (and though it will still make paged 
queries, you won't have "invisible" VMs which is what I've begun to call the 
ones that don't make it into the first/default page.

I can't really call this a bug for Nova (but it is definitely a bug for Cinder 
which doesn't have a functional get me all of them command and is also limited 
at 1000 for a single call but you can never get the rest at least in our 
Liberty environment.)

box:~# nova list  |tail -n +4 |head -n -1 |wc
   1000   16326  416000
box:~# nova list --limit -1  |tail -n +4 |head -n -1 |wc
   1060   17274  440960

(so I recently went over the limit of 1000)

YMMV.

Good luck.

-d
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Problems (simple ones) at scale... Invisible VMs.

2016-05-18 Thread David Medberry
So, we just ran into an "at-sale" issue that shouldn't have been an issue.

Many of the OpenStack CLI tools accept a limit parameter (to limit how much
data you get back from a single query). However, much less well documented
is that there is an inherent limit that you will run into at a 1000 VMs
(not counting deleted ones). Many operators have already exceeded that
limit and likely run into this. With nova cli and openstack client, you can
simply pass in a limit of -1 to get around this (and though it will still
make paged queries, you won't have "invisible" VMs which is what I've begun
to call the ones that don't make it into the first/default page.

I can't really call this a bug for Nova (but it is definitely a bug for
Cinder which doesn't have a functional get me all of them command and is
also limited at 1000 for a single call but you can never get the rest at
least in our Liberty environment.)

box:~# nova list  |tail -n +4 |head -n -1 |wc
   1000   16326  416000
box:~# nova list --limit -1  |tail -n +4 |head -n -1 |wc
   1060   17274  440960

(so I recently went over the limit of 1000)

YMMV.

Good luck.

-d
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [scientific] Ironic Summit recap - ops experiences

2016-05-18 Thread Jim Rollenhagen
I forgot how to reply-all here :)

// jim

On Wed, May 18, 2016 at 05:35:55PM -0400, Jim Rollenhagen wrote:
> On Tue, May 17, 2016 at 11:32:25PM +0100, Stig Telfer wrote:
> > Is there anywhere that these experiences can be captured in a way that 
> > might help?
> > 
> > For example, I have a few DRAC-managed servers.  About half have fallen 
> > into a state where the pxe_drac driver can’t do anything with them 
> > (python-dracclient claims another transaction is underway).  But 
> > pxe_ipmitool works happily.
> > 
> > I’m pretty sure Ironic is not at fault here so it doesn’t seem fair to 
> > catalogue these things as Ironic bugs.  Perhaps the best action would be 
> > for Ironic to be more informative when it identifies a BMC is playing up.
> > 
> > Jay and Jim - any thoughts?
> 
> Yeah, unfortunately we can't fix the terribleness of all the BMCs in the
> world. We are working on a few different efforts to help operators deal
> with these, generally (which are described in my summit wrapup).
> Nova-style notifications, BMC reset APIs, automatically returning nodes
> to service when a BMC is reachable again, etc.
> 
> I'd totally file a bug with python-dracclient for the specific DRAC
> thing you mentioned.
> 
> In general, feel free to file bugs, if it's something we can deal with
> we will triage it, if not we'll keep it in mind for the more general
> handling of these things.
> 
> Does that help?
> 
> // jim
> 
> > 
> > Best wishes,
> > Stig
> > 
> > 
> > > On 12 May 2016, at 11:37, Peter Love  wrote:
> > > 
> > > Nice talk on this stuff: https://www.youtube.com/watch?v=GZeUntdObCA
> > > 
> > > On 12 May 2016 at 10:54, Matt Jarvis  
> > > wrote:
> > >> Very familiar list Tim, and we end up working around a lot of them with
> > >> horrible hardware specific code. Our bugbears also include :
> > >> 
> > >> Required configuration only being available via a web interface - eg.
> > >> setting hostname of the BMC on Supermicro hardware
> > >> IPMI hanging and requiring complete removal and reload of the kernel 
> > >> modules
> > >> to enable resetting
> > >> Undocumented functions requiring raw IPMI commands - again on Supermicro
> > >> there is some black magic to set dedicated ports, check power supply 
> > >> status
> > >> etc.
> > >> Web interfaces requiring Java, and totally broken on mainstream browsers 
> > >> -
> > >> HP ILO's in particular, which are almost impossible to use with a Mac.
> > >> Firmware and BIOS'es which don't allow command line updating from inside 
> > >> a
> > >> running OS
> > >> 
> > >> We're used to being able to flash BIOS images and CMOS settings by 
> > >> writing
> > >> directly to the memory addresses, but more and more modern hardware won't
> > >> let you do this anymore :(
> > >> 
> > >> We're hoping Redfish will solve some of the configuration related issues,
> > >> although obviously it won't make any difference to flaky BMC 
> > >> implementations
> > >> and proprietary tooling to update firmware.
> > >> 
> > >> On 12 May 2016 at 06:25, Tim Bell  wrote:
> > >>> 
> > >>> 
> > >>> 
> > >>> On 12/05/16 06:22, "Stig Telfer"  wrote:
> > >>> 
> >  Hi All -
> >  
> >  Jim Rollenhagen from the Ironic project has just posted a great summit
> >  report of Ironic team activities on the openstack-devs mailing list[1],
> >  which included this item which will be of interest to the Scientific WG
> >  members who are looking to work on bare metal activities this cycle:
> >  
> > > # Making ops less worse
> > > 
> > > [Etherpad](https://etherpad.openstack.org/p/ironic-newton-summit-ops)
> > > 
> > > We discussed some common failure cases that operators see, and how we
> > > can solve them in code.
> > > 
> > > We discussed flaky BMCs, which end with the node in maintenance mode,
> > > and if Ironic can get them out of that mode automagically. We
> > > identified
> > > the need to distinguish between maintenance set by ironic and set by
> > > operators, and do things like attempt to connect to the BMC on a power
> > > state request, and turn off maintenance mode if successful. JayF is
> > > going to write a spec for this differentiation.
> > > 
> > > Folks also expressed the desire to be able to reset the BMC via APIs.
> > > We
> > > have a BMC reset function in the vendor interface for the ipmitool
> > > driver; dtantsur volunteered to write a spec to promote that method to
> > > an official ManagementInterface method.
> > > 
> > > We also talked for a while about stuck states. This has been mostly
> > > solved in code, but is still a problem for some deployers. We decided
> > > that we should not have a "reset-state" API like nova does, but rather
> > > a
> > > command line tool to handle this. lintan has volunteered to write a
> > > proposal for 

Re: [Openstack-operators] [openstack-dev] disabling deprecated APIs by config?

2016-05-18 Thread John Griffith
On Wed, May 18, 2016 at 9:20 AM, Sean Dague  wrote:

> nova-net is now deprecated - https://review.openstack.org/#/c/310539/
>
> And we're in the process in Nova of doing some spring cleaning and
> deprecating the proxies to other services -
> https://review.openstack.org/#/c/312209/
>
> At some point in the future after deprecation the proxy code is going to
> stop working. Either accidentally, because we're not going to test or
> fix this forever (and we aren't going to track upstream API changes to
> the proxy targets), or intentionally when we decide to delete it to make
> it easier to address core features and bugs that everyone wants addressed.
>
> However, the world moves forward slowly. Consider the following scenario.
>
> We delete nova-net & the network proxy entirely in Peru (a not entirely
> unrealistic idea). At that release there are a bunch of people just
> getting around to Newton. Their deployments allow all these things to
> happen which are going to 100% break when they upgrade, and people are
> writing more and more OpenStack software every cycle.
>
> How do we signal to users this kind of deprecation? Can we give sites
> tools to help prevent new software being written to deprecated (and
> scheduled for deletion) APIs?
>
> One idea was a "big red switch" in the format of a config option
> ``disable_deprecated_apis=True`` (defaults to False). Which would set
> all deprecated APIs to 404 routes.
>
> One of the nice ideas here is this would allow some API servers to have
> this set, and others not. So users could point to the "clean" API
> server, figure out that they will break, but the default API server
> would still support these deprecated APIs. Or, conversely, the default
> could be the clean API server, and a legacy API server endpoint could be
> provided for projects that really needed it that included these
> deprecated things for now. Either way it would allow some site assisted
> transition. And be something like the -Werror flag in gcc.
>
> In the Nova case the kinds of things ending up in this bucket are going
> to be interfaces that people *really* shouldn't be using any more. Many
> of them data back to when OpenStack was only 2 projects, and the concept
> of splitting out function wasn't really thought about (note: we're
> getting ahead of this one for the 'placement' rest API, so it won't have
> any of these issues). At some point this house cleaning was going to
> have to happen, and now seems to be the time to do get it rolling.
>
> Feedback on this idea would be welcomed. We're going to deprecate the
> proxy APIs regardless, however disable_deprecated_apis is it's own idea
> and consequences, and we really want feedback before pushing forward on
> this.
>
> -Sean
>
> --
> Sean Dague
> http://dague.net
>
> __
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
​I like the idea of a switch in the config file.  To Dean's point, would it
also be worth considering a "list-deprecated-calls" that could give him a
list without having to do the roundtrip every time?  That might not
actually solve anything for him, but perhaps something along those lines
would help?​
___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] disabling deprecated APIs by config?

2016-05-18 Thread Sean Dague
nova-net is now deprecated - https://review.openstack.org/#/c/310539/

And we're in the process in Nova of doing some spring cleaning and
deprecating the proxies to other services -
https://review.openstack.org/#/c/312209/

At some point in the future after deprecation the proxy code is going to
stop working. Either accidentally, because we're not going to test or
fix this forever (and we aren't going to track upstream API changes to
the proxy targets), or intentionally when we decide to delete it to make
it easier to address core features and bugs that everyone wants addressed.

However, the world moves forward slowly. Consider the following scenario.

We delete nova-net & the network proxy entirely in Peru (a not entirely
unrealistic idea). At that release there are a bunch of people just
getting around to Newton. Their deployments allow all these things to
happen which are going to 100% break when they upgrade, and people are
writing more and more OpenStack software every cycle.

How do we signal to users this kind of deprecation? Can we give sites
tools to help prevent new software being written to deprecated (and
scheduled for deletion) APIs?

One idea was a "big red switch" in the format of a config option
``disable_deprecated_apis=True`` (defaults to False). Which would set
all deprecated APIs to 404 routes.

One of the nice ideas here is this would allow some API servers to have
this set, and others not. So users could point to the "clean" API
server, figure out that they will break, but the default API server
would still support these deprecated APIs. Or, conversely, the default
could be the clean API server, and a legacy API server endpoint could be
provided for projects that really needed it that included these
deprecated things for now. Either way it would allow some site assisted
transition. And be something like the -Werror flag in gcc.

In the Nova case the kinds of things ending up in this bucket are going
to be interfaces that people *really* shouldn't be using any more. Many
of them data back to when OpenStack was only 2 projects, and the concept
of splitting out function wasn't really thought about (note: we're
getting ahead of this one for the 'placement' rest API, so it won't have
any of these issues). At some point this house cleaning was going to
have to happen, and now seems to be the time to do get it rolling.

Feedback on this idea would be welcomed. We're going to deprecate the
proxy APIs regardless, however disable_deprecated_apis is it's own idea
and consequences, and we really want feedback before pushing forward on
this.

-Sean

-- 
Sean Dague
http://dague.net

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Ops Meetups Team - Next Meeting Coordinates and News

2016-05-18 Thread Tom Fifield

Hello all,

Thank you very much to the 25(!) of you who participated in our first 
meeting. We had some fantastic discussion, which has been helpfully 
summarised by Chris Morgan at[1].


==Proposal: Regular Meeting Time==
Those at the first meeting concluded that generally, the Ops Meetups 
Team should meet


* every two weeks on Tuesday at 1400 UTC.

In the period prior to an upcoming meetup, this may increase to weekly. 
After the team gets more established, there may be times where this 
drops down to monthly.


Unless there is further discussion, this means the next meeting is at:

==> Tuesday, 31 of May at 1400 UTC[2]

[3] will be kept up to date with information about the meeting time and 
agenda




==Proposal: Continue using the #openstack-operators channel==
There was a question about whether the use of #openstack-operators IRC 
channel was appropriate, or whether the meeting should move to a defined 
meeting channel.


Given that all meeting channels are currently occupied at that timeslot, 
unless there is discussion the meeting will remain in the 
#openstack-operators channel.




==New Content: "How Ops Meetups are planned"==
Based on the decisions of last meeting, I added some new content at:

https://wiki.openstack.org/wiki/Operations/Meetups#How_Ops_Meetups_are_planned

including notes approximate timelines, dates and region selection and 
venue selection.


Please take a look and edit judiciously!



==New Content: Approach of the Ops Meetups Team==
Based on the scoping discussion section, I attempted to list all the 
tasks and areas we will be making decisions on in:


https://wiki.openstack.org/wiki/Ops_Meetups_Team#Approach

Please look on in horror :)


Regards,


Tom




[1] 
http://lists.openstack.org/pipermail/openstack-operators/2016-May/010461.html


[2] 
http://www.timeanddate.com/worldclock/fixedtime.html?msg=Ops+Meetups+Team=20160531T22=241


[3] https://wiki.openstack.org/wiki/Ops_Meetups_Team#Meeting_Information

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


Re: [Openstack-operators] [kolla] Moving from distro packages to containers (or virtualenvs...)

2016-05-18 Thread Saverio Proto
About docker:
testing this docker setup is in my TODO list from a long time:

https://github.com/dguerri/dockerstack

Looks very well done but I think is not very well known.

Saverio

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


[Openstack-operators] Moving from distro packages to containers (or virtualenvs...)

2016-05-18 Thread Mark Goddard
> Hi there all-ye-operators,

> I am investigating how to help move godaddy from rpms to a
> container-like solution (virtualenvs, lxc, or docker...) and a set of
> questions that comes up is the following (and I would think that some
> folks on this mailing list may have some useful insight into the answers):

> * Have you done the transition?

We are currently in the middle of the transition to a containerized deployment 
solution for Cray's OpenStack-based System Management platform.

> * How did the transition go?

So far, so good.

> * Was/is kolla used or looked into? or something custom?

We evaluated both openstack-ansible and Kolla. Each has its advantages but 
Kolla won over thanks to its simplicity, speed of deployment and use of 
immutable containers. We're also using Kolla's ansible deployment tool.

> * How long did it take to do the transition from a package based
> solution (with say puppet/chef being used to deploy these packages)?

For the first iteration we're still using a package-based solution. This is 
largely because we have considerable infrastructure built up around this. We're 
using OpenStack's Anvil tool, which I know you are familiar with, Josh :).

Kolla supports installing from both packages and source, so we have the option 
of moving to source in future if it fits.

>* Follow-up being how big was the team to do this?

There are 4 of us working on this project. The work involved is considerably 
more than for an operator because we are building a platform based on OpenStack 
with multiple deployment environments. We estimated about 6 man-months.

>
> * What was the roll-out strategy to achieve the final container solution?

This will be difficult. As others have said, we expect to do this one service 
at a time, and will add considerable automation around it. Migrating stateful 
services such as the DB and nova-compute will need care.

>
> Any other feedback (and/or questions that I missed)?
>
> Thanks,

> Josh

Thanks,
Mark

Cray, UK Ltd.

Cray U.K. Limited is a limited company registered in England and Wales.
Company Registration No. 03941275. Registered Office: 5 Fleet Place, London, 
EC4M 7RD, UK

___
OpenStack-operators mailing list
OpenStack-operators@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators