Re: [openstack-dev] [nova] nova cellsv2 and DBs / down cells / quotas

2018-10-25 Thread Dan Smith
> I guess our architecture is pretty unique in a way but I wonder if
> other people are also a little scared about the whole all DB servers
> need to up to serve API requests?

When we started down this path, we acknowledged that this would create a
different access pattern which would require ops to treat the cell
databases differently. The input we were getting at the time was that
the benefits outweighed the costs here, and that we'd work on caching to
deal with performance issues if/when that became necessary.

> I’ve been thinking of some hybrid cellsv1/v2 thing where we’d still
> have the top level api cell DB but the API would only ever read from
> it. Nova-api would only write to the compute cell DBs.
> Then keep the nova-cells processes just doing instance_update_at_top to keep 
> the nova-cell-api db up to date.

I'm definitely not in favor of doing more replication in python to
address this. What was there in cellsv1 was lossy, even for the subset
of things it actually supported (which didn't cover all nova features at
the time and hasn't kept pace with features added since, obviously).

About a year ago, I proposed that we add another "read only mirror"
field to the cell mapping, which nova would use if and only if the
primary cell database wasn't reachable, and only for read
operations. The ops, if they wanted to use this, would configure plain
old one-way mysql replication of the cell databases to a
highly-available server (probably wherever the api_db is) and nova could
use that as a read-only cache for things like listing instances and
calculating quotas. The reaction was (very surprisingly to me) negative
to this option. It seems very low-effort, high-gain, and proper re-use
of existing technologies to me, without us having to replicate a
replication engine (hah) in python. So, I'm curious: does that sound
more palatable to you?

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-23 Thread Dan Smith
> I tested a code change that essentially reverts
> https://review.openstack.org/#/c/276861/1/nova/api/metadata/base.py
>
> In other words, with this change metadata tables are not fetched by
> default in API requests. If I understand correctly, metadata is
> fetched in separate queries as the instance object is
> created. Everything seems to work just fine, and I've considerably
> reduced the amount of data fetched from the database, as well as
> reduced the average response time of API requests.
>
> Given how simple it is and the results I'm getting, I don't see any
> reason not to patch my clusters with this change.
>
> Do you guys see any other impact this change could have? Anything that
> it could potentially break?

This is probably fine as a bandage fix, but it's not the right one for
upstream, IMHO. By doing what you did, you cause two RPC round-trips to
fetch the instance and then the metadata every single time the metadata
API is hit (not including the cache). By converting the DB load to do
the two-step, we still hit the DB twice, but only one RPC round-trip,
which will be much more efficient especially at load/scale.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Dan Smith
> Do you guys see an easy fix here?
>
> Should I open a bug report?

Definitely open a bug. IMHO, we should just make the single-instance
load work like the multi ones, where we load the metadata separately if
requested. We might be able to get away without sysmeta these days, but
we needed it for the flavor details back when the join was added. But,
user metadata is controllable by the user and definitely of interest in
that code, so just dropping sysmeta from the explicit required_attrs
isn't enough, IMHO.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Dan Smith
> We haven't been doing this (intentionally) for quite some time, as we
> query and fill metadata linearly:
>
> https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L2244
>
> and have since 2013 (Havana):
>
> https://review.openstack.org/#/c/26136/
>
> So unless there has been a regression that is leaking those columns back
> into the join list, I'm not sure why the query you show would be
> generated.

Ah, Matt Riedemann just pointed out on IRC that we're not doing it on
single-instance fetch, which is what you'd be hitting in this path. We
use that approach in a lot of places where the rows would also be
multiplied by the number of instances, but not in the single case. So,
that makes sense now.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Metadata API cross joining "instance_metadata" and "instance_system_metadata"

2018-10-22 Thread Dan Smith
> Of course this is only a problem when instances have a lot of metadata
> records. An instance with 50 records in "instance_metadata" and 50
> records in "instance_system_metadata" will fetch 50 x 50 = 2,500 rows
> from the database. It's not difficult to see how this can escalate
> quickly. This can be a particularly significant problem in a HA
> scenario with multiple API nodes pulling data from multiple database
> nodes.

We haven't been doing this (intentionally) for quite some time, as we
query and fill metadata linearly:

https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L2244

and have since 2013 (Havana):

https://review.openstack.org/#/c/26136/

So unless there has been a regression that is leaking those columns back
into the join list, I'm not sure why the query you show would be
generated.

Just to be clear, you don't have any modifications to the code anywhere,
do you?

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [nova] Supporting force live-migrate and force evacuate with nested allocations

2018-10-10 Thread Dan Smith
>> I disagree on this. I'd rather just do a simple check for >1
>> provider in the allocations on the source and if True, fail hard.
>>
>> The reverse (going from a non-nested source to a nested destination)
>> will hard fail anyway on the destination because the POST
>> /allocations won't work due to capacity exceeded (or failure to have
>> any inventory at all for certain resource classes on the
>> destination's root compute node).
>
> I agree with Jay here. If we know the source has allocations on >1
> provider, just fail fast, why even walk the tree and try to claim
> those against the destination - the nested providers aren't going to
> be the same UUIDs on the destination, *and* trying to squash all of
> the source nested allocations into the single destination root
> provider and hope it works is super hacky and I don't think we should
> attempt that. Just fail if being forced and nested allocations exist
> on the source.

Same, yeah.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement] The "intended purpose" of traits

2018-10-01 Thread Dan Smith
>> I still want to use something like "Is capable of RAID5" and/or "Has
>> RAID5 already configured" as part of a scheduling and placement
>> decision. Being able to have the GET /a_c response filtered down to
>> providers with those, ahem, traits is the exact purpose of that operation.
>
> And yep, I have zero problem with this either, as I've noted. This is
> precisely what placement and traits were designed for.

Same.

>> While we're in the neighborhood, we agreed in Denver to use a trait to
>> indicate which service "owns" a provider [1], so we can eventually
>> coordinate a smooth handoff of e.g. a device provider from nova to
>> cyborg. This is certainly not a capability (but it is a trait), and it
>> can certainly be construed as a key/value (owning_service=cyborg). Are
>> we rescinding that decision?
>
> Unfortunately I have zero recollection of a conversation about using
> traits for indicating who "owns" a provider. :(

I definitely do.

> I don't think I would support such a thing -- rather, I would support
> adding an attribute to the provider model itself for an owning service
> or such thing.
>
> That's a great example of where the attribute has specific conceptual
> meaning to placement (the concept of ownership) and should definitely
> not be tucked away, encoded into a trait string.

No, as I recall it means nothing to placement - it means something to
the consumers. A gentleperson's agreement for identifying who owns what
if we're going to, say, remove things that might be stale from placement
when updating the provider tree.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement] The "intended purpose" of traits

2018-10-01 Thread Dan Smith
> It sounds like you might be saying, "I would rather not see encoded
> trait names OR a new key/value primitive; but if the alternative is
> ending up with 'a much larger mess', I would accept..." ...which?
>
> Or is it, "We should not implement a key/value primitive, nor should we
> implement restrictions on trait names; but we should continue to
> discourage (ab)use of trait names by steering placement consumers to..."
> ...do what?

The second one.

> The restriction is real, not perceived. Without key/value (either
> encoded or explicit), how should we steer placement consumers to satisfy
> e.g., "Give me disk from a provider with RAID5"?

Sure, I'm not doubting the need to find providers with certain
abilities. What I'm saying (and I assume Jay is as well), is that
finding things with more domain-specific attributes is the job of the
domain controller (i.e. nova). Placement's strength, IMHO, is the
unified and extremely simple data model and consistency guarantees that
it provides. It takes a lot of the work of searching and atomic
accounting of enumerable and qualitative things out of the scheduler of
the consumer. IMHO, it doesn't (i.e. won't ever) and shouldn't replace
all the things that nova's scheduler needs to do. I think it's useful to
draw the line in front of a full-blown key=value store and DSL grammar
for querying everything with all the operations anyone could ever need.

Unifying the simpler and more common bits into placement and keeping the
domain-specific consideration and advanced filtering of the results in
nova/ironic/etc is the right separation of responsibilities, IMHO. RAID
level is, of course, an overly simplistic example to use, which makes
the problem seem small, but we know more complicated examples exist.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [placement] The "intended purpose" of traits

2018-10-01 Thread Dan Smith
I was out when much of this conversation happened, so I'm going to
summarize my opinion here.

> So from a code perspective _placement_ is completely agnostic to
> whether a trait is "PCI_ADDRESS_01_AB_23_CD", "STORAGE_DISK_SSD", or
> "JAY_LIKES_CRUNCHIE_BARS".
>
> However, things which are using traits (e.g., nova, ironic) need to
> make their own decisions about how the value of traits are
> interpreted. I don't have a strong position on that except to say
> that _if_ we end up in a position of there being lots of traits
> willy nilly, people who have chosen to do that need to know that the
> contract presented by traits right now (present or not present, no
> value comprehension) is fixed.

I agree with what Chris holds sacred here, which is that placement
shouldn't ever care about what the trait names are or what they mean to
someone else. That also extends to me hoping we never implement a
generic key=value store on resource providers in placement.

>> I *do* see a problem with it, based on my experience in Nova where
>> this kind of thing leads to ugly, unmaintainable, and
>> incomprehensible code as I have pointed to in previous responses.

I definitely agree with what Jay holds sacred here, which is that
abusing the data model to encode key=value information into single trait
strings is bad (which is what you're doing with something like
PCI_ADDRESS_01_AB_23_CD).

I don't want placement (the code) to try to put any technical
restrictions on the meaning of trait names, in an attempt to try to
prevent the above abuse. I agree that means people _can_ abuse it if
they wish, which I think is Chris' point. However, I think it _is_
important for the placement team (the people) to care about how
consumers (nova, etc) use traits, and thus provide guidance on that is
necessary. Not everyone will follow that guidance, but we should provide
it. Projects with history-revering developers on both sides of the fence
can help this effort if they lead by example.

If everyone goes off and implements their way around the perceived
restriction of not being able to ask placement for RAID_LEVEL>=5, we're
going to have a much larger mess than the steaming pile of extra specs
in nova that we're trying to avoid.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Open letter/request to TC candidates (and existing elected officials)

2018-09-12 Thread Dan Smith
> I'm just a bit worried to limit that role to the elected TC members. If
> we say "it's the role of the TC to do cross-project PM in OpenStack"
> then we artificially limit the number of people who would sign up to do
> that kind of work. You mention Ildiko and Lance: they did that line of
> work without being elected.

Why would saying that we _expect_ the TC members to do that work limit
such activities only to those that are on the TC? I would expect the TC
to take on the less-fun or often-neglected efforts that we all know are
needed but don't have an obvious champion or sponsor.

I think we expect some amount of widely-focused technical or project
leadership from TC members, and certainly that expectation doesn't
prevent others from leading efforts (even in the areas of proposing TC
resolutions, etc) right?

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [upgrade] request for pre-upgrade check for db purge

2018-09-11 Thread Dan Smith
> How do people feel about this? It seems pretty straight-forward to
> me. If people are generally in favor of this, then the question is
> what would be sane defaults - or should we not assume a default and
> force operators to opt into this?

I dunno, adding something to nova.conf that is only used for nova-status
like that seems kinda weird to me. It's just a warning/informational
sort of thing so it just doesn't seem worth the complication to me.

Moving it to an age thing set at one year seems okay, and better than
making the absolute limit more configurable.

Any reason why this wouldn't just be a command line flag to status if
people want it to behave in a specific way from a specific tool?

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement][upgrade][qa] Some upgrade-specific news on extraction

2018-09-07 Thread Dan Smith
> The other obvious thing is the database. The placement repo code as-is
> today still has the check for whether or not it should use the
> placement database but falls back to using the nova_api database
> [5]. So technically you could point the extracted placement at the
> same nova_api database and it should work. However, at some point
> deployers will clearly need to copy the placement-related tables out
> of the nova_api DB to a new placement DB and make sure the
> 'migrate_version' table is dropped so that placement DB schema
> versions can reset to 1.

I think it's wrong to act like placement and nova-api schemas are the
same. One is a clone of the other at a point in time, and technically it
will work today. However the placement db sync tool won't do the right
thing, and I think we run the major risk of operators not fully grokking
what is going on here, seeing that pointing placement at nova-api
"works" and move on. Later, when we add the next placement db migration
(which could technically happen in stein), they will either screw their
nova-api schema, or mess up their versioning, or be unable to apply the
placement change.

> With respect to grenade and making this work in our own upgrade CI
> testing, we have I think two options (which might not be mutually
> exclusive):
>
> 1. Make placement support using nova.conf if placement.conf isn't
> found for Stein with lots of big warnings that it's going away in
> T. Then Rocky nova.conf with the nova_api database configuration just
> continues to work for placement in Stein. I don't think we then have
> any grenade changes to make, at least in Stein for upgrading *from*
> Rocky. Assuming fresh devstack installs in Stein use placement.conf
> and a placement-specific database, then upgrades from Stein to T
> should also be OK with respect to grenade, but likely punts the
> cut-over issue for all other deployment projects (because we don't CI
> with grenade doing Rocky->Stein->T, or FFU in other words).

As I have said above and in the review, I really think this is the wrong
approach. At the current point of time, the placement schema is a clone
of the nova-api schema, and technically they will work. At the first point
that placement evolves its schema, that will no longer be a workable
solution, unless we also evolve nova-api's database in lockstep.

> 2. If placement doesn't support nova.conf in Stein, then grenade will
> require an (exceptional) [6] from-rocky upgrade script which will (a)
> write out placement.conf fresh and (b) run a DB migration script,
> likely housed in the placement repo, to create the placement database
> and copy the placement-specific tables out of the nova_api
> database. Any script like this is likely needed regardless of what we
> do in grenade because deployers will need to eventually do this once
> placement would drop support for using nova.conf (if we went with
> option 1).

Yep, and I'm asserting that we should write that script, make grenade do
that step, and confirm that it works. I think operators should do that
step during the stein upgrade because that's where the fork/split of
history and schema is happening. I'll volunteer to do the grenade side
at least.

Maybe it would help to call out specifically that, IMHO, this can not
and should not follow the typical config deprecation process. It's not a
simple case of just making sure we "find" the nova-api database in the
various configs. The problem is that _after_ the split, they are _not_
the same thing and should not be considered as the same. Thus, I think
to avoid major disaster and major time sink for operators later, we need
to impose the minor effort now to make sure that they don't take the
process of deploying a new service lightly.

Jay's original relatively small concern was that deploying a new
placement service and failing to properly configure it would result in a
placement running with the default, empty, sqlite database. That's a
valid concern, and I think all we need to do is make sure we fail in
that case, explaining the situation.

We just had a hangout on the topic and I think we've come around to the
consensus that just removing the default-to-empty-sqlite behavior is the
right thing to do. Placement won't magically find nova.conf if it exists
and jump into its database, and it also won't do the silly thing of
starting up with an empty database if the very important config step is
missed in the process of deploying placement itself. Operators will have
to deploy the new package and do the database surgery (which we will
provide instructions and a script for) as part of that process, but
there's really no other sane alternative without changing the current
agreed-to plan regarding the split.

Is everyone okay with the above summary of the outcome?

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: 

Re: [openstack-dev] [nova] [placement] extraction (technical) update

2018-09-05 Thread Dan Smith
> I think there was a period in time where the nova_api database was created
> where entires would try to get pulled out from the original nova database and
> then checking nova_api if it doesn't exist afterwards (or vice versa).  One
> of the cases that this was done to deal with was for things like instance 
> types
> or flavours.
>
> I don't know the exact details but I know that older instance types exist in
> the nova db and the newer ones are sitting in nova_api.  Something along
> those lines?

Yep, we've moved entire databases before in nova with minimal disruption
to the users. Not just flavors, but several pieces of data came out of
the "main" database and into the api database transparently. It's
doable, but with placement being split to a separate
project/repo/whatever, there's not really any option for being graceful
about it in this case.

> At this point, I'm thinking turn off placement, setup the new one, do
> the migration
> of the placement-specific tables (this can be a straightforward documented 
> task
> OR it would be awesome if it was a placement command (something along
> the lines of `placement-manage db import_from_nova`) which would import all
> the right things
>
> The idea of having a command would be *extremely* useful for deployment tools
> in automating the process and it also allows the placement team to selectively
> decide what they want to onboard?

Well, it's pretty cut-and-dried as all the tables in nova-api are either
for nova or placement, so there's not much confusion about what belongs.

I'm not sure that doing this import in python is really the most
efficient way. I agree a placement-manage command would be ideal from an
"easy button" point of view, but I think a couple lines of bash that
call mysqldump are likely to vastly outperform us doing it natively in
python. We could script exec()s of those commands from python, but.. I
think I'd rather just see that as a shell script that people can easily
alter/test on their own.

Just curious, but in your case would the service catalog entry change at
all? If you stand up the new placement in the exact same spot, it
shouldn't, but I imagine some people will have the catalog entry change
slightly (even if just because of a VIP or port change). Am I
remembering correctly that the catalog can get cached in various places
such that much of nova would need a restart to notice?

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] extraction (technical) update

2018-09-05 Thread Dan Smith
>> Yes, we should definitely trim the placement DB migrations to only
>> things relevant to placement. And we can use this opportunity to get
>> rid of cruft too and squash all of the placement migrations together
>> to start at migration 1 for the placement repo. If anyone can think
>> of a problem with doing that, please shout it out.

I agree, FWIW.

> Umm, nova-manage db sync creates entries in a sqlalchemy-migrate
> versions table, something like that, to track per database what the
> latest migration sync version has been.
>
> Based on that, and the fact I thought our DB extraction policy was to
> mostly tell operators to copy the nova_api database and throw it
> elsewhere in a placement database, then the migrate versions table is
> going to be saying you're at 061 and you can't start new migrations
> from 1 at that point, unless you wipe out that versions table after
> you copy the API DB.

They can do this, sure. However, either we'll need migrations to delete
all the nova-api-related tables, or they will need to trim them
manually. If we do the former, then everyone who ever installs placement
from scratch will go through the early history of nova-api only to have
that removed. Or we trim those off the front, but we have to keep the
collapsing migrations until we compact again, etc.

The thing I'm more worried about is operators being surprised by this
change (since it's happening suddenly in the middle of a release),
noticing some split, and then realizing that if they just point the
placement db connection at nova_api everything seems to work. That's
going to go really bad when things start to diverge.

> I could be wrong, but just copying the database, squashing/trimming
> the migration scripts and resetting the version to 1, and assuming
> things are going to be hunky dory doesn't sound like it will work to
> me.

Why not?

I think the safest/cleanest thing to do here is renumber placement-related
migrations from 1, and provide a script or procedure to dump just the
placement-related tables from the nova_api database to the new one (not
including the sqlalchemy-migrate versions table).

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Freezing placement for extraction

2018-08-31 Thread Dan Smith
> If we're going to do the extraction in Stien, which we said we'd do in
> Dublin, we need to start that as early as possible to iron out any
> deployment bugs in the switch. We can't wait until the 2nd or 3rd
> milestone, it would be too risky.

I agree that the current extraction plan is highly risky and that if
it's going to happen, we need plenty of time to clean up the mess. I
imagine what Sylvain is getting at here is that if we followed the
process of other splits like nova-volume, we'd be doing this
differently.

In that case, we'd freeze late in the cycle when freezing is appropriate
anyway. We'd split out placement such that the nova-integrated one and
the separate one are equivalent, and do the work to get it working on
its own. In the next cycle new changes go to the split placement
only. Operators are able to upgrade to stein without deploying a new
stein service first, and can switch to the split placement at their
leisure, separate from the release upgrade process.

To be honest, I'm not sure how we got to the point of considering it
acceptable to be splitting out a piece of nova in a single cycle such
that operators have to deploy a new thing in order to upgrade. But alas,
as has been said, this is politically more important than ... everything
else.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] extraction (technical) update

2018-08-28 Thread Dan Smith
> Grenade already has it's own "resources db" right? So we can shove
> things in there before we upgrade and then verify they are still there
> after the upgrade?

Yep, I'm working on something right now. We create an instance that
survives the upgrade and validate it on the other side. I'll just do
some basic inventory and allocation validation that we'll trip over if
we somehow don't migrate that data from nova to placement.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] extraction (technical) update

2018-08-28 Thread Dan Smith
> Grenade uses devstack so once we have devstack on master installing
> (and configuring) placement from the new repo and disable installing
> and configuring it from the nova repo, that's the majority of the
> change I'd think.
>
> Grenade will likely need a from-rocky script to move any config that
> is necessary, but as you already noted below, if the new repo can live
> with an existing nova.conf, then we might not need to do anything in
> grenade since placement from the new repo (in stein) could then run
> with nova.conf created for placement from the nova repo (in rocky).

The from-rocky will also need to extract data from the nova-api database
for the placement tables and put it into the new placement database (as
real operators will have to do). It'll need to do this after the split
code has been installed and the schema has been sync'd. Without this,
the pre-upgrade resources won't have allocations known by the split
placement service. I do not think we should cheat by just pointing the
split placement at nova's database.

Also, ISTR you added some allocation/inventory checking to devstack via
hook, maybe after the tempest job ran? We might want to add some stuff
to grenade to verify the pre/post resource allocations before we start
this move so we can make sure they're still good after we roll. I'll see
if I can hack something up to that effect.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] extraction (technical) update

2018-08-28 Thread Dan Smith
>> 2. We have a stack of changes to zuul jobs that show nova working but 
>> deploying placement in devstack from the new repo instead of nova's 
>> repo. This includes the grenade job, ensuring that upgrade works.
>
> I'm guessing there would need to be changes to Devstack itself, outside
> of the zuul jobs?

I think we'll need changes to devstack itself, as well as grenade, as
well as zuul jobs I'd assume.

Otherwise, this sequence of steps is what I've been anticipating.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] UUID sentinel needs a home

2018-08-23 Thread Dan Smith
> The compromise, using the patch as currently written [1], would entail
> adding one line at the top of each test file:
>
>  uuids = uuidsentinel.UUIDSentinels()
>
> ...as seen (more or less) at [2]. The subtle difference being that this
> `uuids` wouldn't share a namespace across the whole process, only within
> that file. Given current usage, that shouldn't cause a problem, but it's
> a change.

...and it doesn't work like mock.sentinel does, which is part of the
value. I really think we should put this wherever it needs to be so that
it can continue to be as useful as is is today. Even if that means just
copying it into another project -- it's not that complicated of a thing.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] UUID sentinel needs a home

2018-08-23 Thread Dan Smith
> Do you mean an actual fixture, that would be used like:
>
>  class MyTestCase(testtools.TestCase):
>  def setUp(self):
>  self.uuids = self.useFixture(oslofx.UUIDSentinelFixture()).uuids
>
>  def test_foo(self):
>  do_a_thing_with(self.uuids.foo)
>
> ?
>
> That's... okay I guess, but the refactoring necessary to cut over to it
> will now entail adding 'self.' to every reference. Is there any way
> around that?

I don't think it's okay. It makes it a lot more work to use it, where
merely importing it (exactly like mock.sentinel) is a large factor in
how incredibly convenient it is.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][cinder][neutron] Cross-cell cold migration

2018-08-23 Thread Dan Smith
> I think Nova should never have to rely on Cinder's hosts/backends
> information to do migrations or any other operation.
>
> In this case even if Nova had that info, it wouldn't be the solution.
> Cinder would reject migrations if there's an incompatibility on the
> Volume Type (AZ, Referenced backend, capabilities...)

I think I'm missing a bunch of cinder knowledge required to fully grok
this situation and probably need to do some reading. Is there some
reason that a volume type can't exist in multiple backends or something?
I guess I think of volume type as flavor, and the same definition in two
places would be interchangeable -- is that not the case?

> I don't know anything about Nova cells, so I don't know the specifics of
> how we could do the mapping between them and Cinder backends, but
> considering the limited range of possibilities in Cinder I would say we
> only have Volume Types and AZs to work a solution.

I think the only mapping we need is affinity or distance. The point of
needing to migrate the volume would purely be because moving cells
likely means you moved physically farther away from where you were,
potentially with different storage connections and networking. It
doesn't *have* to mean that, but I think in reality it would. So the
question I think Matt is looking to answer here is "how do we move an
instance from a DC in building A to building C and make sure the
volume gets moved to some storage local in the new building so we're
not just transiting back to the original home for no reason?"

Does that explanation help or are you saying that's fundamentally hard
to do/orchestrate?

Fundamentally, the cells thing doesn't even need to be part of the
discussion, as the same rules would apply if we're just doing a normal
migration but need to make sure that storage remains affined to compute.

> I don't know how the Nova Placement works, but it could hold an
> equivalency mapping of volume types to cells as in:
>
>  Cell#1Cell#2
>
> VolTypeA <--> VolTypeD
> VolTypeB <--> VolTypeE
> VolTypeC <--> VolTypeF
>
> Then it could do volume retypes (allowing migration) and that would
> properly move the volumes from one backend to another.

The only way I can think that we could do this in placement would be if
volume types were resource providers and we assigned them traits that
had special meaning to nova indicating equivalence. Several of the words
in that sentence are likely to freak out placement people, myself
included :)

So is the concern just that we need to know what volume types in one
backend map to those in another so that when we do the migration we know
what to ask for? Is "they are the same name" not enough? Going back to
the flavor analogy, you could kinda compare two flavor definitions and
have a good idea if they're equivalent or not...

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [nova] [placement] placement below or beside compute after extraction?

2018-08-20 Thread Dan Smith
>> So my hope is that (in no particular order) Jay Pipes, Eric Fried,
>> Takashi Natsume, Tetsuro Nakamura, Matt Riedemann, Andrey Volkov,
>> Alex Xu, Balazs Gibizer, Ed Leafe, and any other contributor to
>> placement whom I'm forgetting [1] would express their preference on
>> what they'd like to see happen.

I apparently don't qualify for a vote, so I'll just reply to Jay's
comments here.

> I am not opposed to extracting the placement service into its own
> repo. I also do not view it as a priority that should take precedence
> over the completion of other items, including the reshaper effort and
> the integration of placement calls into Nova (nested providers,
> sharing providers, etc).
>
> The remaining items are Nova-centric. We need Nova-focused
> contributors to make placement more useful to Nova, and I fail to see
> how extracting the placement service will meet that goal. In fact, one
> might argue, as Melanie implies, that extracting placement outside of
> the Compute project would increase the velocity of the placement
> project *at the expense of* getting things done in the Nova project.

Yep, this. I know it's a Nova-centric view, but unlike any other
project, we have taken the risk of putting placement in our critical
path. That has yielded several fire drills right before releases, as
well as complicated backports to fix things that we have broken in the
process, etc. We've got a list of things that are half-finished or
promised-but-not-started, and those are my priority over most everything
else.

> We've shown we can get many things done in placement. We've shown we
> can evolve the API fairly quickly. The velocity of the placement
> project isn't the problem. The problem is the lag between features
> being written into placement (sometimes too hastily IMHO) and actually
> *using* those features in Nova.

Right, and the reshaper effort is a really good example of what I'm
concerned about. Nova has been getting ready for NRPs for several cycles
now, and just before crunch time in Rocky, we realize there's a huge
missing piece of the puzzle on the placement side. That's not the first
time that has happened and I'm sure it won't be the last.

> As for the argument about other projects being able (or being more
> willing to) use placement, I think that's not actually true. The
> projects that might want to ditch their own custom resource tracking
> and management code (Cyborg, Neutron, Cinder, Ironic) have either
> already done so or would require minimal changes to do that. There are
> no projects other than Ironic that I'm aware of that are interested in
> using the allocation candidates functionality (and the allocation
> claim process that entails) for the rough scheduling functionality
> that provides. I'm not sure placement being extracted would change
> that.

My point about this is that "reporting" and "consuming" placement are
different things. Neutron reports, we'd like Cinder to report. Ironic
reports, but indirectly. Cyborg would report. Those reporting activities
are to help projects that "consume" placement make better decisions, but
I think it's entirely likely that Nova will be the only one that ever
does that.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] [nova] [placement] placement below or beside compute after extraction?

2018-08-17 Thread Dan Smith
> The subject of using placement in Cinder has come up, and since then I've had 
> a
> few conversations with people in and outside of that team. I really think 
> until
> placement is its own project outside of the nova team, there will be 
> resistance
> from some to adopt it.

I know politics will be involved in this, but this is a really terrible
reason to do a thing, IMHO. After the most recent meeting we had with
the Cinder people on placement adoption, I'm about as convinced as ever
that Cinder won't (and won't need to) _consume_ placement any time
soon. I hope it will _report_ to placement so Nova can make better
decisions, just like Neutron does now, but I think that's the extent
we're likely to see if we're honest.

What other projects are _likely_ to _consume_ placement even if they
don't know they'd want to? What projects already want to use it but
refuse to because it has Nova smeared all over it? We talked about this
a lot in the early justification for placement, but the demand for that
hasn't really materialized, IMHO; maybe it's just me.

> This reluctance on having it part of Nova may be real or just perceived, but
> with it within Nova it will likely be an uphill battle for some time 
> convincing
> other projects that it is a nicely separated common service that they can use.

Splitting it out to another repository within the compute umbrella (what
do we call it these days?) satisfies the _technical_ concern of not
being able to use placement without installing the rest of the nova code
and dependency tree. Artificially creating more "perceived" distance
sounds really political to me, so let's be sure we're upfront about the
reasoning for doing that if so :)

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] A multi-cell instance-list performance test

2018-08-17 Thread Dan Smith
> We have tried out the patch:
> https://review.openstack.org/#/c/592698/
> we also applied https://review.openstack.org/#/c/592285/
>
> it turns out that we are able to half the overall time consumption, we
> did try with different sort key and dirs, the results are similar, we
> didn't try out paging yet:

Excellent! Let's continue discussion of the batching approach in that
review. There are some other things to try.

Thanks!

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] A multi-cell instance-list performance test

2018-08-16 Thread Dan Smith
>  yes, the DB query was in serial, after some investigation, it seems that we 
> are unable to perform eventlet.mockey_patch in uWSGI mode, so
>  Yikun made this fix:
>
>  https://review.openstack.org/#/c/592285/

Cool, good catch :)

>
>  After making this change, we test again, and we got this kind of data:
>
>    total collect sort view 
>  before monkey_patch 13.5745 11.7012 1.1511 0.5966 
>  after monkey_patch 12.8367 10.5471 1.5642 0.6041 
>
>  The performance improved a little, and from the log we can saw:

Since these all took ~1s when done in series, but now take ~10s in
parallel, I think you must be hitting some performance bottleneck in
either case, which is why the overall time barely changes. Some ideas:

1. In the real world, I think you really need to have 10x database
   servers or at least a DB server with plenty of cores loading from a
   very fast (or separate) disk in order to really ensure you're getting
   full parallelism of the DB work. However, because these queries all
   took ~1s in your serialized case, I expect this is not your problem.

2. What does the network look like between the api machine and the DB?

3. What do the memory and CPU usage of the api process look like while
   this is happening?

Related to #3, even though we issue the requests to the DB in parallel,
we still process the result of those calls in series in a single python
thread on the API. That means all the work of reading the data from the
socket, constructing the SQLA objects, turning those into nova objects,
etc, all happens serially. It could be that the DB query is really a
small part of the overall time and our serialized python handling of the
result is the slow part. If you see the api process pegging a single
core at 100% for ten seconds, I think that's likely what is happening.

>  so, now the queries are in parallel, but the whole thing still seems
>  serial.

In your table, you show the time for "1 cell, 1000 instances" as ~3s and
"10 cells, 1000 instances" as 10s. The problem with comparing those
directly is that in the latter, you're actually pulling 10,000 records
over the network, into memory, processing them, and then just returning
the first 1000 from the sort. A closer comparison would be the "10
cells, 100 instances" with "1 cell, 1000 instances". In both of those
cases, you pull 1000 instances total from the db, into memory, and
return 1000 from the sort. In that case, the multi-cell situation is
faster (~2.3s vs. ~3.1s). You could also compare the "10 cells, 1000
instances" case to "1 cell, 10,000 instances" just to confirm at the
larger scale that it's better or at least the same.

We _have_ to pull $limit instances from each cell, in case (according to
the sort key) the first $limit instances are all in one cell. We _could_
try to batch the results from each cell to avoid loading so many that we
don't need, but we punted this as an optimization to be done later. I'm
not sure it's really worth the complexity at this point, but it's
something we could investigate.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] increasing the number of allowed volumes attached per instance > 26

2018-06-15 Thread Dan Smith
> I thought we were leaning toward the option where nova itself doesn't
> impose a limit, but lets the virt driver decide.
>
> I would really like NOT to see logic like this in any nova code:
>
>> if kvm|qemu:
>> return 256
>> elif POWER:
>> return 4000
>> elif:
>> ...

It's insanity to try to find a limit that will work for
everyone. PowerVM supports a billion, libvirt/kvm has some practical and
theoretical limits, both of which are higher than what is actually
sane. It depends on your virt driver, and how you're attaching your
volumes, maybe how tightly you pack your instances, probably how many
threads you give to an instance, how big your compute nodes are, and
definitely what your workload is.

That's a really big matrix, and even if we decide on something, IBM will
come out of the woodwork with some other hypervisor that has been around
since the Nixon era that uses BCD-encoded volume numbers and thus can
only support 10. It's going to depend, and a user isn't going to be able
to reasonably probe it using any of our existing APIs.

If it's going to depend on all the above factors, I see no reason not to
put a conf value in so that operators can pick a reasonably sane
limit. Otherwise, the limit we pick will be wrong for everyone.

Plus... if we do a conf option we can put this to rest and stop talking
about it, which I for one am *really* looking forward to :)

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] review runways check-in and feedback

2018-06-14 Thread Dan Smith
> While I have tried to review a few of the runway-slotted efforts, I
> have gotten burned out on a number of them. Other runway-slotted
> efforts, I simply don't care enough about or once I've seen some of
> the code, simply can't bring myself to review it (sorry, just being
> honest).

I have the same feeling, although I have reviewed a lot of things I
wouldn't have otherwise as a result of them being in the runway. I spent
a bunch of time early on with the image signing stuff, which I think was
worthwhile, although at this point I'm a bit worn out on it. That's not
the fault of runways though.

> Is your concern that placement stuff is getting unfair attention since
> many of the patch series aren't in the runways? Or is your concern
> that you'd like to see *more* core reviews on placement stuff outside
> of the usual placement-y core reviewers (you, me, Alex, Eric, Gibi and
> Dan)?

I think placement has been getting a bit of a free ride, with constant
review and insulation from the runway process. However, I don't think
that we can stop progress on that effort while we circle around, and the
subteam/group of people that focus on placement already has a lot of
supporting cores already. So, it's cheating a little bit, but we always
said that we're not going to tell cores *not* to review something unless
it is in a runway and pragmatially I think it's probably the right thing
to do for placement.

>> Having said that, it's clear from the list of things in the runways
>> etherpad that there are some lower priority efforts that have been
>> completed probably because they leveraged runways (there are a few
>> xenapi blueprints for example, and the powervm driver changes).
>
> Wasn't that kind of the point of the runways, though? To enable "lower
> priority" efforts to have a chance at getting reviews? Or are you just
> stating here the apparent success of that effort?

It was, and I think it has worked well for that for several things. The
image signing stuff got more review in its first runway slot than it has
in years I think.

Overall, I don't think we're worse off with runways than we were before
it. I think that some things that will get attention regardless are
still progressing. I think that some things that are far off on the
fringe are still getting ignored. I think that for the huge bulk of
things in the middle of those two, runways has helped focus review on
specific efforts and thus increased the throughput there. For a first
attempt, I'd call that a success.

I think maybe a little more monitoring of the review rate of things in
the runways and some gentle prodding of people to look at ones that are
burning time and not seeing much review would maybe improve things a
bit.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] increasing the number of allowed volumes attached per instance > 26

2018-06-08 Thread Dan Smith
> Some ideas that have been discussed so far include:

FYI, these are already in my order of preference.

> A) Selecting a new, higher maximum that still yields reasonable
> performance on a single compute host (64 or 128, for example). Pros:
> helps prevent the potential for poor performance on a compute host
> from attaching too many volumes. Cons: doesn't let anyone opt-in to a
> higher maximum if their environment can handle it.

I prefer this because I think it can be done per virt driver, for
whatever actually makes sense there. If powervm can handle 500 volumes
in a meaningful way on one instance, then that's cool. I think libvirt's
limit should likely be 64ish.

> B) Creating a config option to let operators choose how many volumes
> allowed to attach to a single instance. Pros: lets operators opt-in to
> a maximum that works in their environment. Cons: it's not discoverable
> for those calling the API.

This is a fine compromise, IMHO, as it lets operators tune it per
compute node based on the virt driver and the hardware. If one compute
is using nothing but iSCSI over a single 10g link, then they may need to
clamp that down to something more sane.

Like the per virt driver restriction above, it's not discoverable via
the API, but if it varies based on compute node and other factors in a
single deployment, then making it discoverable isn't going to be very
easy anyway.

> C) Create a configurable API limit for maximum number of volumes to
> attach to a single instance that is either a quota or similar to a
> quota. Pros: lets operators opt-in to a maximum that works in their
> environment. Cons: it's yet another quota?

Do we have any other quota limits that are per-instance like this would
be? If not, then this would likely be weird, but if so, then this would
also be an option, IMHO. However, it's too much work for what is really
not a hugely important problem, IMHO, and both of the above are
lighter-weight ways to solve this and move on.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-06-01 Thread Dan Smith
> FWIW, I don't have a problem with the virt driver "knowing about
> allocations". What I have a problem with is the virt driver *claiming
> resources for an instance*.

+1000.

> That's what the whole placement claims resources things was all about,
> and I'm not interested in stepping back to the days of long racy claim
> operations by having the compute nodes be responsible for claiming
> resources.
>
> That said, once the consumer generation microversion lands [1], it
> should be possible to *safely* modify an allocation set for a consumer
> (instance) and move allocation records for an instance from one
> provider to another.

Agreed. I'm hesitant to have the compute nodes arguing with the
scheduler even to patch things up, given the mess we just cleaned
up. The thing that I think makes this okay is that one compute node
cleaning/pivoting allocations for instances isn't going to be fighting
anything else whilst doing it. Migrations and new instance builds where
the source/destination or scheduler/compute aren't clear who owns the
allocation is a problem.

That said, we need to make sure we can handle the case where an instance
is in resize_confirm state across a boundary where we go from non-NRP to
NRP. It *should* be okay for the compute to handle this by updating the
instance's allocation held by the migration instead of the instance
itself, if the compute determines that it is the source.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-06-01 Thread Dan Smith
> Dan, you are leaving out the parts of my response where I am agreeing
> with you and saying that your "Option #2" is probably the things we
> should go with.

No, what you said was:

>> I would vote for Option #2 if it comes down to it.

Implying (to me at least) that you still weren't in favor of either, but
would choose that as the least offensive option :)

I didn't quote it because I didn't have any response. I just wanted to
address the other assertions about what is and isn't a common upgrade
scenario, which I think is the important data we need to consider when
making a decision here.

I didn't mean to imply or hide anything with my message trimming, so
sorry if it came across as such.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-06-01 Thread Dan Smith
> So, you're saying the normal process is to try upgrading the Linux
> kernel and associated low-level libs, wait the requisite amount of
> time that takes (can be a long time) and just hope that everything
> comes back OK? That doesn't sound like any upgrade I've ever seen.

I'm saying I think it's a process practiced by some to install the new
kernel and libs and then reboot to activate, yeah.

> No, sorry if I wasn't clear. They can live-migrate the instances off
> of the to-be-upgraded compute host. They would only need to
> cold-migrate instances that use the aforementioned non-movable
> resources.

I don't think it's reasonable to force people to have to move every
instance in their cloud (live or otherwise) in order to upgrade. That
means that people who currently do their upgrades in-place in one step,
now have to do their upgrade in N steps, for N compute nodes. That
doesn't seem reasonable to me.

> If we are going to go through the hassle of writing a bunch of
> transformation code in order to keep operator action as low as
> possible, I would prefer to consolidate all of this code into the
> nova-manage (or nova-status) tool and put some sort of
> attribute/marker on each compute node record to indicate whether a
> "heal" operation has occurred for that compute node.

We need to know details of each compute node in order to do that. We
could make the tool external and something they run per-compute node,
but that still makes it N steps, even if the N steps are lighter
weight.

> Someone (maybe Gibi?) on this thread had mentioned having the virt
> driver (in update_provider_tree) do the whole set reserved = total
> thing when first attempting to create the child providers. That would
> work to prevent the scheduler from attempting to place workloads on
> those child providers, but we would still need some marker on the
> compute node to indicate to the nova-manage heal_nested_providers (or
> whatever) command that the compute node has had its provider tree
> validated/healed, right?

So that means you restart your cloud and it's basically locked up until
you perform the N steps to unlock N nodes? That also seems like it's not
going to make us very popular on the playground :)

I need to go read Eric's tome on how to handle the communication of
things from virt to compute so that this translation can be done. I'm
not saying I have the answer, I'm just saying that making this the
problem of the operators doesn't seem like a solution to me, and that we
should figure out how we're going to do this before we go down the
rabbit hole.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] Upgrade concerns with nested Resource Providers

2018-05-31 Thread Dan Smith
> My feeling is that we should not attempt to "migrate" any allocations
> or inventories between root or child providers within a compute node,
> period.

While I agree this is the simplest approach, it does put a lot of
responsibility on the operators to do work to sidestep this issue, which
might not even apply to them (and knowing if it does might be
difficult).

> The virt drivers should simply error out of update_provider_tree() if
> there are ANY existing VMs on the host AND the virt driver wishes to
> begin tracking resources with nested providers.
>
> The upgrade operation should look like this:
>
> 1) Upgrade placement
> 2) Upgrade nova-scheduler
> 3) start loop on compute nodes. for each compute node:
>  3a) disable nova-compute service on node (to take it out of scheduling)
>  3b) evacuate all existing VMs off of node

You mean s/evacuate/cold migrate/ of course... :)

>  3c) upgrade compute node (on restart, the compute node will see no
>  VMs running on the node and will construct the provider tree inside
>  update_provider_tree() with an appropriate set of child providers
>  and inventories on those child providers)
>  3d) enable nova-compute service on node
>
> Which is virtually identical to the "normal" upgrade process whenever
> there are significant changes to the compute node -- such as upgrading
> libvirt or the kernel.

Not necessarily. It's totally legit (and I expect quite common) to just
reboot the host to take kernel changes, bringing back all the instances
that were there when it resumes. The "normal" case of moving things
around slide-puzzle-style applies to live migration (which isn't an
option here). I think people that can take downtime for the instances
would rather not have to move things around for no reason if the
instance has to get shut off anyway.

> Nested resource tracking is another such significant change and should
> be dealt with in a similar way, IMHO.

This basically says that for anyone to move to rocky, they will have to
cold migrate every single instance in order to do that upgrade right? I
mean, anyone with two socket machines or SRIOV NICs would end up with at
least one level of nesting, correct? Forcing everyone to move everything
to do an upgrade seems like a non-starter to me.

We also need to consider the case where people would be FFU'ing past
rocky (i.e. never running rocky computes). We've previously said that
we'd provide a way to push any needed transitions with everything
offline to facilitate that case, so I think we need to implement that
method anyway.

I kinda think we need to either:

1. Make everything perform the pivot on compute node start (which can be
   re-used by a CLI tool for the offline case)
2. Make everything default to non-nested inventory at first, and provide
   a way to migrate a compute node and its instances one at a time (in
   place) to roll through.

We can also document "or do the cold-migration slide puzzle thing" as an
alternative for people that feel that's more reasonable.

I just think that forcing people to take down their data plane to work
around our own data model is kinda evil and something we should be
avoiding at this level of project maturity. What we're really saying is
"we know how to translate A into B, but we require you to move many GBs
of data over the network and take some downtime because it's easier for
*us* than making it seamless."

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] z/VM introducing a new config driveformat

2018-05-30 Thread Dan Smith
> can I know a use case for this 'live copy metadata or ' the 'only way
> to access device tags when hot-attach?  my thought is this is one time
> thing in cloud-init side either through metatdata service or config
> drive and won't be used later? then why I need a live copy?

If I do something like this:

  nova interface-attach --tag=data-network --port-id=foo myserver

Then we update the device metadata live, which is visible immediately
via the metadata service. However, in config drive, that only gets
updated the next time the drive is generated (which may be a long time
away). For more information on device metadata, see:

  
https://specs.openstack.org/openstack/nova-specs/specs/mitaka/approved/virt-device-role-tagging.html

Further, some of the drivers support setting the admin password securely
via metadata, which similarly requires the instance pulling updated
information out, which wouldn't be available in the config drive. For
reference:

  
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L1985-L1993

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [StarlingX] StarlingX code followup discussions

2018-05-24 Thread Dan Smith
> For example, I look at your nova fork and it has a "don't allow this
> call during an upgrade" decorator on many API calls. Why wasn't that
> done upstream? It doesn't seem overly controversial, so it would be
> useful to understand the reasoning for that change.

Interesting. We have internal accounting for service versions and can
make a determination of if we're in an upgrade scenario (and do block
operations until the upgrade is over). Unless this decorator you're
looking at checks some non-upstream is-during-upgrade flag, this would
be an easy thing to close the gap on.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-manage cell_v2 map_instances uses invalid UUID as marker in the db

2018-05-10 Thread Dan Smith
> Takashi Natsume  writes:
>
>> In some compute REST APIs, it returns the 'marker' parameter
>> in their pagination.
>> Then users can specify the 'marker' parameter in the next request.

I read this as you saying there was some way that the in-band marker
mapping could be leaked to the user via the REST API. However, if you
meant to just offer up the REST API's pagination as an example that we
could follow in the nova-manage CLI, requiring users to provide the
marker each time, then ignore this part:

> How is this possible? The only way we would get the marker is if we
> either (a) listed the mappings by project_id, using
> INSTANCE_MAPPING_MARKER as the query value, or (b) listed all the
> mappings and somehow returned those to the user.
>
> I don't think (a) is a thing, and I'm not seeing how (b) could be
> either. If you know of a place, please write a functional test for it
> and we can get it resolves. In my proposed patch, I added a filter to
> ensure that this doesn't show up in the get_by_cell_id() query, but
> again, I'm not sure how this would ever be exposed to a user.
>
> https://review.openstack.org/#/c/567669/1/nova/objects/instance_mapping.py@173

As I said in my reply to gibi, I don't think making the user keep track
of the marker is a very nice UX for a management CLI, nor is it as
convenient for something like puppet to run as it has to parse the
(grossly verbose) output each time to extract that marker.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-manage cell_v2 map_instances uses invalid UUID as marker in the db

2018-05-10 Thread Dan Smith
Takashi Natsume  writes:

> In some compute REST APIs, it returns the 'marker' parameter
> in their pagination.
> Then users can specify the 'marker' parameter in the next request.

How is this possible? The only way we would get the marker is if we
either (a) listed the mappings by project_id, using
INSTANCE_MAPPING_MARKER as the query value, or (b) listed all the
mappings and somehow returned those to the user.

I don't think (a) is a thing, and I'm not seeing how (b) could be
either. If you know of a place, please write a functional test for it
and we can get it resolves. In my proposed patch, I added a filter to
ensure that this doesn't show up in the get_by_cell_id() query, but
again, I'm not sure how this would ever be exposed to a user.

https://review.openstack.org/#/c/567669/1/nova/objects/instance_mapping.py@173

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] nova-manage cell_v2 map_instances uses invalid UUID as marker in the db

2018-05-10 Thread Dan Smith
> The oslo UUIDField emits a warning if the string used as a field value
> does not pass the validation of the uuid.UUID(str(value)) call
> [3]. All the offending places are fixed in nova except the nova-manage
> cell_v2 map_instances call [1][2]. That call uses markers in the DB
> that are not valid UUIDs.

No, that call uses markers in the DB that don't fit the canonical string
representation of a UUID that the oslo library is looking for. There are
many ways to serialize a UUID:

https://en.wikipedia.org/wiki/Universally_unique_identifier#Format

The 8-4-4-4-12 format is one of them (and the most popular). Changing
the dashes to spaces does not make it not a UUID, it makes it not the
same _string_ and it's done (for better or worse) in the aforementioned
code to skirt the database's UUID-ignorant _string_ uniqueness
constraint.

> If we could fix this last offender then we could merge the patch [4]
> that changes the this warning to an exception in the nova tests to
> avoid such future rule violations.
>
> However I'm not sure it is easy to fix. Replacing
> 'INSTANCE_MIGRATION_MARKER' at [1] to
> '----' might work

The project_id field on the object is not a UUIDField, nor is it 36
characters in the database schema. It can't be because project ids are
not guaranteed to be UUIDs.

> but I don't know what to do with instance_uuid.replace(' ', '-') [2]
> to make it a valid uuid. Also I think that if there is an unfinished
> mapping in the deployment and then the marker is changed in the code
> that leads to inconsistencies.

IMHO, it would be bad to do anything that breaks people in the middle of
a mapping procedure. While I understand the desire to have fewer
spurious warnings in the test runs, I feel like doing anything to impact
the UX or performance of runtime code to make the unit test output
cleaner is a bad idea.

> I'm open to any suggestions.

We already store values in this field that are not 8-4-4-4-12, and the
oslo field warning is just a warning. If people feel like we need to do
something, I propose we just do this:

https://review.openstack.org/#/c/567669/

It is one of those "we normally wouldn't do this with object schemas,
but we know this is okay" sort of situations.

Personally, I'd just make the offending tests shut up about the warning
and move on, but I'm also okay with the above solution if people prefer.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild

2018-05-03 Thread Dan Smith
> I'm late to this thread but I finally went through the replies and my
> thought is, we should do a pre-flight check to verify with placement
> whether the image traits requested are 1) supported by the compute
> host the instance is residing on and 2) coincide with the
> already-existing allocations. Instead of making an assumption based on
> "last image" vs "new image" and artificially limiting a rebuild that
> should be valid to go ahead. I can imagine scenarios where a user is
> trying to do a rebuild that their cloud admin says should be perfectly
> valid on their hypervisor, but it's getting rejected because old image
> traits != new image traits. It seems like unnecessary user and admin
> pain.

Yeah, I think we have to do this.

> It doesn't seem correct to reject the request if the current compute
> host can fulfill it, and if I understood correctly, we have placement
> APIs we can call from the conductor to verify the image traits
> requested for the rebuild can be fulfilled. Is there a reason not to
> do that?

Well, it's a little itcky in that it makes a random part of conductor a
bit like the scheduler in its understanding of and iteraction with
placement. I don't love it, but I think it's what we have to do. Trying
to do the trait math with what was used before, or conservatively
rejecting the request and being potentially wrong about that is not
reasonable, IMHO.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] z/VM introducing a new config driveformat

2018-04-30 Thread Dan Smith
> According to requirements and comments, now we opened the CI runs with
> run_validation = True And according to [1] below, for example, [2]
> need the ssh validation passed the test
>
> And there are a couple of comments need some enhancement on the logs
> of CI such as format and legacy incorrect links of logs etc the newest
> logs sample can be found [3] (take n-cpu as example and those logs are
> with _white.html)
>
> Also, the blueprint [4] requested by previous discussion post here
> again for reference
>
> please let us know whether the procedure -2 can be removed in order to
> proceed . thanks for your help

The CI log format issues look fixed to me and validation is turned on
for the stuff supported, which is what was keeping it out of the
runway.

I still plan to leave the -2 on there until the next few patches have
agreement, just so we don't land an empty shell driver before we are
sure we're going to land spawn/destroy, etc. That's pretty normal
procedure and I'll be around to remove it when appropriate.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] z/VM introducing a new config driveformat

2018-04-18 Thread Dan Smith
> Having briefly read the cloud-init snippet which was linked earlier in
> this thread, the requirement seems to be that the guest exposes the
> device as /dev/srX or /dev/cdX. So I guess in order to make this work:
>
> * You need to tell z/VM to expose the virtual disk as an optical disk
> * The z/VM kernel needs to call optical disks /dev/srX or /dev/cdX

According to the docs, it doesn't need to be. You can indicate the
configdrive via filesystem label which makes sense given we support vfat
for it as well.

http://cloudinit.readthedocs.io/en/latest/topics/datasources/configdrive.html#version-2

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Concern about trusted certificates API change

2018-04-18 Thread Dan Smith
> Maybe it wasn't clear but I'm not advocating that we block the change
> until volume-backed instances are supported with trusted certs. I'm
> suggesting we add a policy rule which allows deployers to at least
> disable it via policy if it's not supported for their cloud.

That's fine with me, and provides an out for another issue I pointed out
on the code review. Basically, the operator has no way to disable this
feature. If they haven't set this up properly and have no desire to, a
user reading the API spec and passing trusted certs will not be able to
boot an instance and not really understand why.

> I agree. I'm the one that noticed the issue and pointed out in the
> code review that we should explicitly fail the request if we can't
> honor it.

I agree for the moment for sure, but it would obviously be nice not to
open another gap we're not going to close. There's no reason this can't
be supported for volume-backed instances, it just requires some help
from cinder.

I would think that it'd be nice if we could declare the "can't do this
for reasons" response as a valid one regardless of the cause so we don't
need another microversion for the future where volume-backed instances
can do this.

> Again, I'm not advocating that we block until boot from volume is
> supported. However, we have a lot of technical debt for "good
> functionality" added over the years that failed to consider
> volume-backed instances, like rebuild, rescue, backup, etc and it's
> painful to deal with that after the fact, as can be seen from the
> various specs proposed for adding that support to those APIs.

Totes agree.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] z/VM introducing a new config driveformat

2018-04-18 Thread Dan Smith
> Thanks for the concern and fully under it , the major reason is
> cloud-init doesn't have a hook or plugin before it start to read
> config drive (ISO disk) z/VM is an old hypervisor and no way to do
> something like libvirt to define a ISO format disk in xml definition,
> instead, it can define disks in the defintion of virtual machine and
> let VM to decide its format.
>
> so we need a way to tell cloud-init where to find ISO file before
> cloud-init start but without AE, we can't handle that...some update on
> the spec here for further information
> https://review.openstack.org/#/c/562154/

The ISO format does not come from telling libvirt something about
it. The host creates and formats the image, adds the data, and then
attaches it to the instance. The latter part is the only step that
involves configuring libvirt to attach the image to the instance. The
rest is just stuff done by nova-compute (and the virt driver) on the
linux system it's running on. That's the same arrangement as your
driver, AFAICT.

You're asking the system to hypervisor (or something running on it) to
grab the image from glance, pre-filled with data. This is no different,
except that the configdrive image comes from the system running the
compute service. I don't see how it's any different in actual hypervisor
mechanics, and thus feel like there _has_ to be a way to do this without
the AE magic agent.

I agree with Mikal that needing more agent behavior than cloud-init does
a disservice to the users.

I feel like we get a lot of "but no, my hypervisor is special!"
reasoning when people go to add a driver to nova. So far, I think
they're a lot more similar than people think. Ironic is the weirdest one
we have (IMHO and no offense to the ironic folks) and it can support
configdrive properly.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] z/VM introducing a new config driveformat

2018-04-17 Thread Dan Smith
> I propose that we remove the z/VM driver blueprint from the runway at
> this time and place it back into the queue while work on the driver
> continues. At a minimum, we need to see z/VM CI running with
> [validation]run_validation = True in tempest.conf before we add the
> z/VM driver blueprint back into a runway in the future.

Agreed. I also want to see the CI reporting cleaned up so that it's
readable and consistent. Yesterday I pointed out some issues with the
fact that the actual config files being used are not the ones being
uploaded. There are also duplicate (but not actually identical) logs
from all services being uploaded, including things like a full compute
log from starting with the libvirt driver.

I'm also pretty troubled by the total lack of support for the metadata
service. I know it's technically optional on our matrix, but it's a
pretty important feature for a lot of scenarios, and it's also a
dependency for other features that we'd like to have wider support for
(like attached device metadata).

Going back to the spec, I see very little detail on some of the things
raised here, and very (very) little review back when it was first
approved. I'd also like to see more detail be added to the spec about
all of these things, especially around required special changes like
this extra AE agent.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova][Deployers] Optional, platform specific, dependancies in requirements.txt

2018-04-13 Thread Dan Smith
>> global ironic
>> if ironic is None:
>> ironic = importutils.import_module('ironicclient')

I believe ironic was an early example of a client library we hot-loaded,
and I believe at the time we said this was a pattern we were going to
follow. Personally, I think this makes plenty of sense and I think that
even moving things like the python-libvirt load out to something like
this to avoid hyperv people having to nuke it from requirements makes
sense.

> I have a pretty strong dislike for this mechanism.  For one thing, I'm
> frustrated when I can't use hotkeys to jump to an ironicclient method
> because my IDE doesn't recognize that dynamic import.  I have to go look
> up the symbol some other way (and hope I'm getting the right one).  To
> me (with my bias as a dev rather than a deployer) that's way worse than
> having the 704KB python-ironicclient installed on my machine even though

This seems like a terrible reason to make everyone install ironicclient
(or the z/vm client) on their systems at runtime.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] z/VM introducing a new config driveformat

2018-04-13 Thread Dan Smith
> for the run_validation=False issue, you are right, because z/VM driver
> only support config drive and don't support metadata service ,we made
> bad assumption and took wrong action to disabled the whole ssh check,
> actually according to [1] , we should only disable
> CONF.compute_feature_enabled.metadata_service but keep both
> self.run_ssh and CONF.compute_feature_enabled.config_drive as True in
> order to make config drive test validation take effect, our CI will
> handle that

Why don't you support the metadata service? That's a pretty fundamental
mechanism for nova and openstack. It's the only way you can get a live
copy of metadata, and it's the only way you can get access to device
tags when you hot-attach something. Personally, I think that it's
something that needs to work.

> For the tgz/iso9660 question below, this is because we got wrong info
> from low layer component folks back to 2012 and after discuss with
> some experts again, actually we can create iso9660 in the driver layer
> and pass down to the spawned virtual machine and during startup
> process, the VM itself will mount the iso file and consume it, because
> from linux perspective, either tgz or iso9660 doesn't matter , only
> need some files in order to transfer the information from openstack
> compute node to the spawned VM.  so our action is to change the format
> from tgz to iso9660 and keep consistent to other drivers.

The "iso file" will not be inside the guest, but rather passed to the
guest as a block device, right?

> For the config drive working mechanism question, according to [2] z/VM
> is Type 1 hypervisor while Qemu/KVM are mostly likely to be Type 2
> hypervisor, there is no file system in z/VM hypervisor (I omit too
> much detail here) , so we can't do something like linux operation
> system to keep a file as qcow2 image in the host operating system,

I'm not sure what the type-1-ness has to do with this. The hypervisor
doesn't need to support any specific filesystem for this to work. Many
drivers we have in the tree are type-1 (xen, vmware, hyperv, powervm)
and you can argue that KVM is type-1-ish. They support configdrive.

> what we do is use a special file pool to store the config drive and
> during VM init process, we read that file from special device and
> attach to VM as iso9660 format then cloud-init will handle the follow
> up, the cloud-init handle process is identical to other platform

This and the previous mention of this sort of behavior has me
concerned. Are you describing some sort of process that runs when the
instance is starting to initialize its environment, or something that
runs  *inside* the instance and thus functionality that has to exist in
the *image* to work?

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] z/VM introducing a new config drive format

2018-04-11 Thread Dan Smith
> https://review.openstack.org/#/c/527658 is a z/VM patch which
> introduces their support for config drive. They do this by attaching a
> tarball to the instance, having pretended in the nova code that it is
> an iso9660. This worries me.
>
> In the past we've been concerned about adding new filesystem formats
> for config drives, and the long term support implications of that --
> the filesystem formats for config drive that we use today were
> carefully selected as being universally supported by our guest
> operating systems.
>
> The previous example we've had of these issues is the parallels
> driver, which had similar "my hypervisor doesn't support these
> filesystem format" concerns. We worked around those concerns IIRC, and
> certainly virt.configdrive still only supports iso9660 and vfat.

Yeah, IIRC, the difference with the parallels driver was that it ends up
mounted in the container automagically for the guest by the..uh..man
behind the curtain. However, z/VM being much more VM-y I imagine that
the guest is just expected to grab that blob and do something with it to
extract it on local disk at runtime or something. That concerns me too.

In the past I've likened adding filesystem (or format, in this case)
options to configdrive as a guest ABI change. I think the stability of
what we present to guests is second only to our external API in terms of
importance. I know z/VM is "weird" or "different", but I wouldn't want a
more conventional hypervisor exposing the configdrive as a tarball, so I
don't really think it's a precedent we should set. Both vfat and iso9660
are easily supportable by most everything on the planet so I don't think
it's an unreasonable bar.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [cyborg] Race condition in the Cyborg/Nova flow

2018-03-29 Thread Dan Smith
> ==> Fully dynamic: You can program one region with one function, and
> then still program a different region with a different function, etc.

Note that this is also the case if you don't have virtualized multi-slot
devices. Like, if you had one that only has one region. Consuming it
consumes the one and only inventory.

> ==> Single program: Once you program the card with a function, *all* its
> virtual slots are *only* capable of that function until the card is
> reprogrammed.  And while any slot is in use, you can't reprogram.  This
> is Sundar's FPGA use case.  It is also Sylvain's VGPU use case.
>
> The "fully dynamic" case is straightforward (in the sense of being what
> placement was architected to handle).
> * Model the PF/region as a resource provider.
> * The RP has inventory of some generic resource class (e.g. "VGPU",
> "SRIOV_NET_VF", "FPGA_FUNCTION").  Allocations consume that inventory,
> plain and simple.
> * As a region gets programmed dynamically, it's acceptable for the thing
> doing the programming to set a trait indicating that that function is in
> play.  (Sundar, this is the thing I originally said would get
> resistance; but we've agreed it's okay.  No blood was shed :)
> * Requests *may* use preferred traits to help them land on a card that
> already has their function flashed on it. (Prerequisite: preferred
> traits, which can be implemented in placement.  Candidates with the most
> preferred traits get sorted highest.)

Yup.

> The "single program" case needs to be handled more like what Alex
> describes below.  TL;DR: We do *not* support dynamic programming,
> traiting, or inventorying at instance boot time - it all has to be done
> "up front".
> * The PFs can be initially modeled as "empty" resource providers.  Or
> maybe not at all.  Either way, *they can not be deployed* in this state.
> * An operator or admin (via a CLI, config file, agent like blazar or
> cyborg, etc.) preprograms the PF to have the specific desired
> function/configuration.
>   * This may be cyborg/blazar pre-programming devices to maintain an
> available set of each function
>   * This may be in response to a user requesting some function, which
> causes a new image to be laid down on a device so it will be available
> for scheduling
>   * This may be a human doing it at cloud-build time
> * This results in the resource provider being (created and) set up with
> the inventory and traits appropriate to that function.
> * Now deploys can happen, using required traits representing the desired
> function.

...and it could be in response to something noticing that a recent nova
boot failed to find any candidates with a particular function, which
provisions that thing so it can be retried. This is kindof the "spot
instances" approach -- that same workflow would work here as well,
although I expect most people would fit into the above cases.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposing Eric Fried for nova-core

2018-03-27 Thread Dan Smith
> To the existing core team members, please respond with your comments,
> +1s, or objections within one week.

+1.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Does Cell v2 support for muti-cell deployment in Pike?

2018-03-23 Thread Dan Smith
> Does Cell v2 support for multi-cell deployment in pike? Is there any
> good document about the deployment?

In the release notes of Pike:

  https://docs.openstack.org/releasenotes/nova/pike.html

is this under 16.0.0 Prelude:

  Nova now supports a Cells v2 multi-cell deployment. The default
  deployment is a single cell. There are known limitations with multiple
  cells. Refer to the Cells v2 Layout page for more information about
  deploying multiple cells.

There are some links to documentation in that paragraph which should be
helpful.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Rocky spec review day

2018-03-21 Thread Dan Smith
>>  And I, for one, wouldn't be offended if we could "officially start
>>  development" (i.e. focus on patches, start runways, etc.) before the
>>  mystical but arbitrary spec freeze date.

Yeah, I agree. I see runways as an attempt to add pressure to the
earlier part of the cycle, where we're ignoring things that have been
ready but aren't super high priority because "we have plenty of time."
The later part of the cycle is when we start having to make hard
decisions on things to de-focus, and where focus on the important core
changes goes up naturally anyway.

Personally, I think we're already kinda late in the cycle to be going on
this, as I would have hoped to exit PTG with a plan to start operating
in the new process immediately. Maybe I'm in the minority there, but I
think that if we start this process late in the middle of a cycle, we'll
probably need to adjust the prioritization of things in the queue more
strictly, and remember that when retrospecting on the process for next
cycle.

> Sure, but given we have a lot of specs to review, TBH it'll be
> possible for me to look at implementation patches only close to the
> 1st milestone.

I'm not sure I get this. We can't not review code while we review specs
for weeks on end. We've already approved 75% of the blueprints (in
number) that we completed in queens. One of the intended outcomes of
this effort was to complete a higher percentage of what we approved, so
we're not lying to contributors and so we have more focused review of
things so they actually get completed instead of half-landed. To that
end, I would kind of expect that we need to constantly be throttling (or
maybe re-starting) spec review/approval rates to keep the queue full
enough so we don't run dry, but without just ending up with a thousand
approved things that we'll never get to.

Anyway, just MHO. Obviously this will be an experiment and we won't get
it right the first time.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] New image backend: StorPool

2018-03-16 Thread Dan Smith
> Can you be more specific about what is limiting you when you use
> volume-backed instances?

Presumably it's because you're taking a trip over iscsi instead of using
the native attachment mechanism for the technology that you're using? If
so, that's a valid argument, but it's hard to see the tradeoff working
in favor of adding all these drivers to nova as well.

If cinder doesn't support backend-specific connectors, maybe that's
something we could work on? People keep saying that "cinder is where I
put my storage, that's how I want to back my instances" when it comes to
justifying BFV, and that argument is starting to resonate with me more
and more.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] about rebuild instance booted from volume

2018-03-15 Thread Dan Smith
> Deleting all snapshots would seem dangerous though...
>
> 1. I want to reset my instance to how it was before
> 2. I'll just do a snapshot in case I need any data in the future
> 3. rebuild
> 4. oops

Yep, for sure. I think if there are snapshots, we have to refuse to do
te thing. My comment was about the "does nova have authority to destroy
the root volume during a rebuild" and I think it does, if
delete_on_termination=True, and if there are no snapshots.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] about rebuild instance booted from volume

2018-03-15 Thread Dan Smith
> Rather than overload delete_on_termination, could another flag like
> delete_on_rebuild be added?

Isn't delete_on_termination already the field we want? To me, that field
means "nova owns this". If that is true, then we should be able to
re-image the volume (in-place is ideal, IMHO) and if not, we just
fail. Is that reasonable?

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] AggregateMultiTenancyIsolation with multiple (many) projects

2018-03-08 Thread Dan Smith
> 2. Dan Smith mentioned another idea such that we could index the
> aggregate metadata keys like filter_tenant_id0, filter_tenant_id1,
> ... filter_tenant_idN and then combine those so you have one host
> aggregate filter_tenant_id* key per tenant.

Yep, and that's what I've done in my request_filter implementation:

https://review.openstack.org/#/c/545002/9/nova/scheduler/request_filter.py

Basically it allows any suffix to 'filter_tenant_id' to be processed as
a potentially-matching key.

Note that I'm hoping we can deprecate/remove the post filter and replace
it with this much more efficient version.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Switching to longer development cycles

2017-12-14 Thread Dan Smith
Ed Leafe  writes:

> I think you're missing the reality that intermediate releases have
> about zero uptake in the real world. We have had milestone releases of
> Nova for years, but I challenge you to find me one non-trivial
> deployment that uses one of them. To my knowledge, based on user
> surveys, it is only the major 6-month named releases that are
> deployed, and even then, some time after their release.
>
> Integrated releases make sense for deployers. What does it mean if
> Nova has some new stuff, but it requires a new release from Cinder in
> order to use it, and Cinder hasn't yet released the necessary updates?
> Talking about releasing projects on a monthly-tagged basis just dumps
> the problem of determining what works with the rest of the codebase
> onto the deployers.

Similarly, right now we have easy and uniform points at which we have to
make upgrade and compatibility guarantees. Presumably in such a new
world order, a project would not be allowed to drop compatibility in an
intermediate release, which means we're all being forced into a longer
support envelope for versioned APIs, config files, etc.

If we did do more of what I assume Doug is suggesting, which is just tag
monthly and let the projects decide what to do with upgrades, then we
end up with a massively more complex problem (for our own CI, as well as
for operators) of mapping out where compatibility begins and ends
per-project, instead of at least all aiming for the same point in the
timeline.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Switching to longer development cycles

2017-12-14 Thread Dan Smith
> In my experience, the longer a patch (or worse, patch series) sits
> around, the staler it gets. Others are merging changes, so the
> long-lived patch series has to be constantly rebased.

This is definitely true.

> The 20% developer would be spending a greater proportion of her time
> figuring out how to solve the rebase conflicts instead of just
> focusing on her code.

Agreed. The first reaction I had to this proposal was pretty much what
you state here: that now the 20% person has a 365-day window in which
they have to keep their head in the game, instead of a 180-day one.

Assuming doubling the length of the cycle has no impact on the
_importance_ of the thing the 20% person is working on, relative to
project priorities, then the longer cycle just means they have to
continuously rebase for a longer period of time.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Privsep transition state of play

2017-11-05 Thread Dan Smith
> I hope everyone travelling to the Sydney Summit is enjoying jet lag
> just as much as I normally do. Revenge is sweet! My big advice is that
> caffeine is your friend, and to not lick any of the wildlife.

I wasn't planning on licking any of it, but thanks for the warning.

> As of just now, all rootwrap usage has been removed from the libvirt
> driver, if you assume that the outstanding patches from the blueprint
> are merged. I think that's a pretty cool milestone. That said, I feel
> that https://review.openstack.org/#/c/517516/ needs a short talk to
> make sure that people don't think the implementation approach I've
> taken is confusing -- basically not all methods in nova/privsep are
> now escalated, as sometimes we only sometimes escalate our privs for a
> call. The review makes it clearer than I can in an email.

I commented, agreeing with gibi. Make the exceptional cases
exceptionally named; assume non-exceptional names are escalated by
default.

> We could stop now for Queens if we wanted -- we originally said we'd
> land things early to let them stabilise. That said, we haven't
> actually caused any stability problems so far -- just a few out of
> tree drivers having to play catchup. So we could also go all in and
> get this thing done fully in Queens.

I agree we should steam ahead. I don't really want to hang the fate of
the privsep transition on the removal of cellsv2 and nova-network, so
personally I'm not opposed to privsepping those bits if you're
willing. I also agree that the lack of breakage thus far should give us
more confidence that we're safe to continue applying these changes later
in the cycle. Just MHO.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] A way to delete a record in 'host_mappings' table

2017-10-03 Thread Dan Smith
> But the record in 'host_mappings' table of api database is not deleted
> (I tried it with nova master 8ca24bf1ff80f39b14726aca22b5cf52603ea5a0).
> The cell cannot be deleted if the records for the cell remains in 
> 'host_mappings' table.
> (An error occurs with a message "There are existing hosts mapped to cell with 
> uuid ...".)
> 
> Are there any ways (CLI, API) to delete the host record in 'host_mappings' 
> table?
> I couldn't find it.

Hmm, yeah, I bet this is a gap. Can you file a bug for this?

I think making the cell delete check for instances=0 in the cell and then 
deleting the host mapping along with the cell would be a good idea. We could 
also add a command to clean up orphaned host records, although hopefully that’s 
an exceptional situation.

—Dan
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all] Update on Zuul v3 Migration - and what to do about issues

2017-10-03 Thread Dan Smith
> Any update on where we stand on issues now? Because every single patch I
> tried to land yesterday was killed by POST_FAILURE in various ways.
> Including some really small stuff - https://review.openstack.org/#/c/324720/

Yeah, Nova has only landed eight patches since Thursday. Most of those are 
test-only patches that run a subset of jobs, and a couple that landed in the 
wee hours when overall system load was low.

> Do we have a defined point on the calendar for getting the false
> negatives back below the noise threshold otherwise a rollback is
> implemented so that some of these issues can be addressed in parallel
> without holding up community development?

On Friday I was supportive of the decision to keep steaming forward instead of 
rolling back. Today, I’m a bit more concerned about light at the end of the 
tunnel. The infra folks have been hitting this hard for a long time, and for 
that I’m very appreciative. I too hope that we’re going to revisit mitigation 
strategies as we approach the weekiversary of being stuck.

-—Dan
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-10-02 Thread Dan Smith
>> I also think there is value in exposing vGPU in a generic way, irrespective 
>> of the underlying implementation (whether it is DEMU, mdev, SR-IOV or 
>> whatever approach Hyper-V/VMWare use).
> 
> That is a big ask. To start with, all GPUs are not created equal, and
> various vGPU functionality as designed by the GPU vendors is not
> consistent, never mind the quirks added between different hypervisor
> implementations. So I feel like trying to expose this in a generic
> manner is, at least asking for problems, and more likely bound for
> failure.

I feel the opposite. IMHO, Nova’s role in life is not to expose all the quirks 
of the underlying platform, but rather to provide a useful abstraction on top 
of those things. In spite of them.

> Nova already exposes plenty of hypervisor-specific functionality (or
> functionality only implemented for one hypervisor), and that's fine.

And those bits of functionality are some of the most problematic we have. Among 
other reasons, they make it difficult for us to expose Thing 2.0, when we’ve 
encoded Thing 1.0 into our API so rigidly. This happens even within one virt 
driver where Thing 2.0 is significantly different than Thing 1.0.

The vGPU stuff seems well-suited for the generic modeling work that we’ve spent 
the last few years working on, and is a perfect example of an area where we can 
avoid piling on more debt to a not-abstract-enough “model” and move forward 
with the new one. That’s certainly my preference, and I think it’s actually 
less work than the debt-ridden way.

-—Dan



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova][stable] attn: No approvals for stable/newton right now

2017-09-29 Thread Dan Smith

Hi all,

Due to a zuulv3 bug, we're running an old nova-network test job on 
master and, as you would expect, failing hard. As a workaround in the 
meantime, we're[0] going to disable that job entirely so that it runs 
nowhere. This makes it not run on master (good) but also not run on 
stable/newton (not so good).


So, please don't approve anything new for stable/newton until we turn 
this job back on. That will happen when this patch lands:


  https://review.openstack.org/#/c/508638

Thanks!

--Dan

[0]: Note that this is all magic and dedication from the infra people, 
all I did was stand around and applaud. I'm including myself in the "we" 
here because I like to feel included by standing next to smart people, 
not because I did any work.


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-29 Thread Dan Smith

The concepts of PCI and SR-IOV are, of course, generic


They are, although the PowerVM guys have already pointed out that they
don't even refer to virtual devices by PCI address and thus anything 
based on that subsystem isn't going to help them.



but I think out of principal we should avoid a hypervisor-specific
integration for vGPU (indeed Citrix has been clear from the beginning
that the vGPU integration we are proposing is intentionally
hypervisor agnostic) I also think there is value in exposing vGPU in
a generic way, irrespective of the underlying implementation (whether
it is DEMU, mdev, SR-IOV or whatever approach Hyper-V/VMWare use).


I very much agree, of course.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] vGPUs support for Nova - Implementation

2017-09-28 Thread Dan Smith

In this serie of patches we are generalizing the PCI framework to
handle MDEV devices. We arguing it's a lot of patches but most of them
are small and the logic behind is basically to make it understand two
new fields MDEV_PF and MDEV_VF.


That's not really "generalizing the PCI framework to handle MDEV 
devices" :) More like it's just changing the /pci module to understand a 
different device management API, but ok.


Yeah, the series is adding more fields to our PCI structure to allow for 
more variations in the kinds of things we lump into those tables. This 
is my primary complaint with this approach, and has been since the topic 
first came up. I really want to avoid building any more dependency on 
the existing pci-passthrough mechanisms and focus any new effort on 
using resource providers for this. The existing pci-passthrough code is 
almost universally hated, poorly understood and tested, and something we 
should not be further building upon.



In this serie of patches we make libvirt driver support, as usually,
return resources and attach devices returned by the pci manager. This
part can be reused for Resource Provider.


Perhaps, but the idea behind the resource providers framework is to 
treat devices as generic things. Placement doesn't need to know about 
the particular device attachment status.


I quickly went through the patches and left a few comments. The base 
work of pulling some of this out of libvirt is there, but it's all 
focused on the act of populating pci structures from the vgpu 
information we get from libvirt. That code could be made to instead 
populate a resource inventory, but that's about the most of the set that 
looks applicable to the placement-based approach.


As mentioned in IRC and the previous ML discussion, my focus is on the 
nested resource providers work and reviews, along with the other two 
top-priority scheduler items (move operations and alternate hosts).


I'll do my best to look at your patch series, but please note it's lower 
priority than a number of other items.


FWIW, I'm not really planning to spend any time reviewing it 
until/unless it is retooled to generate an inventory from the virt driver.


With the two patches that report vgpus and then create guests with them 
when asked converted to resource providers, I think that would be enough 
to have basic vgpu support immediately. No DB migrations, model changes, 
etc required. After that, helping to get the nested-rps and traits work 
landed gets us the ability to expose attributes of different types of 
those vgpus and opens up a lot of possibilities. IMHO, that's work I'm 
interested in reviewing.


One thing that would be very useful, Sahid, if you could get with Eric 
Fried (efried) on IRC and discuss with him the "generic device 
management" system that was discussed at the PTG. It's likely that the 
/pci module is going to be overhauled in Rocky and it would be good to 
have the mdev device management API requirements included in that 
discussion.


Definitely this.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [skip-level-upgrades][fast-forward-upgrades] PTG summary

2017-09-20 Thread Dan Smith

- Modify the `supports-upgrades`[3] and `supports-accessible-upgrades`[4] tags

   I have yet to look into the formal process around making changes to
   these tags but I will aim to make a start ASAP.


We've previously tried to avoid changing assert tag definitions because
we then have to re-review all of the projects that already have the tags
to ensure they meet the new criteria. It might be easier to add a new
tag for assert:supports-fast-forward-upgrades with the criteria that are
unique to this use case.


We already have a confusing array of upgrade tags, so I would really 
rather not add more that overlap in complicated ways. Most of the change 
here is clarification of things I think most people assume, so I don't 
think the validation effort will be a lot of work.


--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Proposing Balazs Gibizer for nova-core

2017-08-29 Thread Dan Smith
So to the existing core team members, please respond with a yay/nay and 
after about a week or so we should have a decision (knowing a few cores 
are on vacation right now).


+1 on the condition that gibi stops finding so many bugs in the stuff I 
worked on. It's embarrassing.


--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo.db] [ndb] ndb namespace throughout openstack projects

2017-07-24 Thread Dan Smith
> So, I see your point here, but my concern here is that if we *modify* an
> existing schema migration that has already been tested to properly apply
> a schema change for MySQL/InnoDB and PostgreSQL with code that is
> specific to NDB, we introduce the potential for bugs where users report
> that the same migration works sometimes but fails other times.

This ^.

The same goes for really any sort of conditional in a migration where
you could end up with different schema. I know that is Mike's point (to
not have that happen) but I think the difficulty is proving and
guaranteeing (now and going forward) that they're identical. Modifying a
migration in the past is like a late-breaking conditional.

> I would much prefer to *add* a brand new schema migration that handles
> conversion of the entire InnoDB schema at a certain point to an
> NDB-compatible one *after* that point. That way, we isolate the NDB
> changes to one specific schema migration -- and can point users to that
> one specific migration in case bugs arise. This is the reason that every
> release we add a number of "placeholder" schema migration numbered files
> to handle situations such as these.

Yes.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Openstack-operators] [nova] Does anyone rely on PUT /os-services/disable for non-compute services?

2017-06-13 Thread Dan Smith

Are we allowed to cheat and say auto-disabling non-nova-compute services
on startup is a bug and just fix it that way for #2? :) Because (1) it
doesn't make sense, as far as we know, and (2) it forces the operator to
have to use the API to enable them later just to fix their nova
service-list output.


Yes, definitely.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Does anyone rely on PUT /os-services/disable for non-compute services?

2017-06-13 Thread Dan Smith

So it seems our options are:

1. Allow PUT /os-services/{service_uuid} on any type of service, even if
doesn't make sense for non-nova-compute services.

2. Change the behavior of [1] to only disable new "nova-compute" services.


Please, #2. Please.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler][placement] Allocating Complex Resources

2017-06-09 Thread Dan Smith
>> b) a compute node could very well have both local disk and shared 
>> disk. how would the placement API know which one to pick? This is a
>> sorting/weighing decision and thus is something the scheduler is 
>> responsible for.

> I remember having this discussion, and we concluded that a 
> computenode could either have local or shared resources, but not 
> both. There would be a trait to indicate shared disk. Has this 
> changed?

I've always thought we discussed that one of the benefits of this
approach was that it _could_ have both. Maybe we said "initially we
won't implement stuff so it can have both" but I think the plan has been
that we'd be able to support it.

>>> * We already have the information the filter scheduler needs now
>>>  by some other means, right?  What are the reasons we don't want
>>>  to use that anymore?
>> 
>> The filter scheduler has most of the information, yes. What it 
>> doesn't have is the *identifier* (UUID) for things like SRIOV PFs 
>> or NUMA cells that the Placement API will use to distinguish 
>> between things. In other words, the filter scheduler currently does
>> things like unpack a NUMATopology object into memory and determine
>> a NUMA cell to place an instance to. However, it has no concept
>> that that NUMA cell is (or will soon be once 
>> nested-resource-providers is done) a resource provider in the 
>> placement API. Same for SRIOV PFs. Same for VGPUs. Same for FPGAs,
>>  etc. That's why we need to return information to the scheduler 
>> from the placement API that will allow the scheduler to understand 
>> "hey, this NUMA cell on compute node X is resource provider 
>> $UUID".

Why shouldn't scheduler know those relationships? You were the one (well
one of them :P) that specifically wanted to teach the nova scheduler to
be in the business of arranging and making claims (allocations) against
placement before returning. Why should some parts of the scheduler know
about resource providers, but not others? And, how would scheduler be
able to make the proper decisions (which require knowledge of
hierarchical relationships) without that knowledge? I'm sure I'm missing
something obvious, so please correct me.

IMHO, the scheduler should eventually evolve into a thing that mostly
deals in the currency of placement, translating those into nova concepts
where needed to avoid placement having to know anything about them.
In other words, I would expect to be able to explain the purpose of the
scheduler as "applies nova-specific logic to the generic resources that
placement says are _valid_, with the goal of determining which one is
_best_".

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][scheduler][placement] Allocating Complex Resources

2017-06-09 Thread Dan Smith
>> My current feeling is that we got ourselves into our existing mess
>> of ugly, convoluted code when we tried to add these complex 
>> relationships into the resource tracker and the scheduler. We set
>> out to create the placement engine to bring some sanity back to how
>> we think about things we need to virtualize.
> 
> Sorry, I completely disagree with your assessment of why the
> placement engine exists. We didn't create it to bring some sanity
> back to how we think about things we need to virtualize. We created
> it to add consistency and structure to the representation of
> resources in the system.
> 
> I don't believe that exposing this structured representation of 
> resources is a bad thing or that it is leaking "implementation
> details" out of the placement API. It's not an implementation detail
> that a resource provider is a child of another or that a different
> resource provider is supplying some resource to a group of other
> providers. That's simply an accurate representation of the underlying
> data structures.

This ^.

With the proposal Jay has up, placement is merely exposing some of its
own data structures to a client that has declared what it wants. The
client has made a request for resources, and placement is returning some
allocations that would be valid. None of them are nova-specific at all
-- they're all data structures that you would pass to and/or retrieve
from placement already.

>> I don't know the answer. I'm hoping that we can have a discussion 
>> that might uncover a clear approach, or, at the very least, one
>> that is less murky than the others.
> 
> I really like Dan's idea of returning a list of HTTP request bodies
> for POST /allocations/{consumer_uuid} calls along with a list of
> provider information that the scheduler can use in its
> sorting/weighing algorithms.
> 
> We've put this straw-man proposal here:
> 
> https://review.openstack.org/#/c/471927/
> 
> I'm hoping to keep the conversation going there.

This is the most clear option that we have, in my opinion. It simplifies
what the scheduler has to do, it simplifies what conductor has to do
during a retry, and it minimizes the amount of work that something else
like cinder would have to do to use placement to schedule resources.
Without this, cinder/neutron/whatever has to know about things like
aggregates and hierarchical relationships between providers in order to
make *any* sane decision about selecting resources. If placement returns
valid options with that stuff figured out, then those services can look
at the bits they care about and make a decision.

I'd really like us to use the existing strawman spec as a place to
iterate on what that API would look like, assuming we're going to go
that route, and work on actual code in both placement and the scheduler
to use it. I'm hoping that doing so will help clarify whether this is
the right approach or not, and whether there are other gotchas that we
don't yet have on our radar. We're rapidly running out of runway for
pike here and I feel like we've got to get moving on this or we're going
to have to punt. Since several other things depend on this work, we need
to consider the impact to a lot of our pike commitments if we're not
able to get something merged.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [upgrades][skip-level][leapfrog] - RFC - Skipping releases when upgrading

2017-05-26 Thread Dan Smith
> I haven't looked at what Keystone is doing, but to the degree they are
> using triggers, those triggers would only impact new data operations as
> they continue to run into the schema that is straddling between two
> versions (e.g. old column/table still exists, data should be synced to
> new column/table).   If they are actually running a stored procedure to
> migrate existing data (which would be surprising to me...) then I'd
> assume that invokes just like any other "ALTER TABLE" instruction in
> their migrations.  If those operations themselves rely on the triggers,
> that's fine.

I haven't looked closely either, but I thought the point _was_ to
transform data. If they are, and you run through a bunch of migrations
where you end at a spot that expects that data was migrated while
running at step 3, triggers dropped at step 7, and then schema compacted
at step 11, then just blowing through them could be a problem. It'd work
for a greenfield install no problem because there was nothing to
migrate, but real people would trip over it.

> But a keystone person to chime in would be much better than me just
> making stuff up.

Yeah, same :)

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [upgrades][skip-level][leapfrog] - RFC - Skipping releases when upgrading

2017-05-26 Thread Dan Smith
> As most of the upgrade issues center around database migrations, we
> discussed some of the potential pitfalls at length. One approach was to
> roll-up all DB migrations into a single repository and run all upgrades
> for a given project in one step. Another was to simply have mutliple
> python virtual environments and just run in-line migrations from a
> version specific venv (this is what the OSA tooling does). Does one way
> work better than the other? Any thoughts on how this could be better?

IMHO, and speaking from a Nova perspective, I think that maintaining a
separate repo of migrations is a bad idea. We occasionally have to fix a
migration to handle a case where someone is stuck and can't move past a
certain revision due to some situation that was not originally
understood. If you have a separate copy of our migrations, you wouldn't
get those fixes. Nova hasn't compacted migrations in a while anyway, so
there's not a whole lot of value there I think.

The other thing to consider is that our _schema_ migrations often
require _data_ migrations to complete before moving on. That means you
really have to move to some milestone version of the schema, then
move/transform data, and then move to the next milestone. Since we
manage those according to releases, those are the milestones that are
most likely to be successful if you're stepping through things.

I do think that the idea of being able to generate a small utility
container (using the broad sense of the word) from each release, and
using those to step through N, N+1, N+2 to arrive at N+3 makes the most
sense.

Nova has offline tooling to push our data migrations (even though the
command is intended to be runnable online). The concern I would have
would be over how to push Keystone's migrations mechanically, since I
believe they moved forward with their proposal to do data migrations in
stored procedures with triggers. Presumably there is a need for
something similar to nova's online-data-migrations command which will
trip all the triggers and provide a green light for moving on?

In the end, projects support N->N+1 today, so if you're just stepping
through actual 1-version gaps, you should be able to do as many of those
as you want and still be running "supported" transitions. There's a lot
of value in that, IMHO.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] [Cells] Stupid question: Cells v2 & AZs

2017-05-24 Thread Dan Smith
> Thanks for answering the base question. So, if AZs are implemented with
> haggs, then really, they are truly disjoint from cells (ie, not a subset
> of a cell and not a superset of a cell, just unrelated.) Does that
> philosophy agree with what you are stating?

Correct, aggregates are at the top level, and they can span cells if you
so desire (or not if you don't configure any that do). The aggregate
stuff doesn't know anything about cells, it only knows about hosts, so
it's really independent.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Boston Forum session recap - cellsv2

2017-05-19 Thread Dan Smith
The etherpad for this session is here [1]. The goal of the session was
to get some questions answered that the developers had for operators
around the topic of cellsv2.

The bulk of the time was spent discussing ways to limit instance
scheduling retries in a cellsv2 world where placement eliminates
resource-reservation races. Reschedules would be upcalls from the cell,
which we are trying to avoid.

While placement should eliminate 95% (or more) of reschedules due to
pre-claiming resources before booting, there will still be cases where
we may want to reschedule due to unexpected transient failures. How many
of those remain, and whether or not rescheduling for them is really
useful is in question.

The compromise that seemed popular in the room was to grab more than one
host at the time of scheduling, claim for that one, but pass the rest to
the cell. If the cell needs to reschedule, the cell conductor would try
one of the alternates that came as part of the original boot request,
instead of asking scheduler again.

During the discussion of this, an operator raised the concern that
without reschedules, a single compute that fails to boot 100% of the
time ends up becoming a magnet for all future builds, looking like an
excellent target for the scheduler, but failing anything that is sent to
it. If we don't reschedule, that situation could be very problematic. An
idea came out that we should really have compute monitor and disable
itself if a certain number of _consecutive_ build failures crosses a
threshold. That would mitigate/eliminate the "fail magnet" behavior and
further reduce the need for retries. A patch has been proposed for this,
and so far enjoys wide support [2].

We also discussed the transition to counting quotas, and what that means
for operators. The room seemed in favor of this, and discussion was brief.

Finally, I made the call for people with reasonably-sized pre-prod
environments to begin testing cellsv2 to help prove it out and find the
gremlins. CERN and NeCTAR specifically volunteered for this effort.

[1]
https://etherpad.openstack.org/p/BOS-forum-cellsv2-developer-community-coordination
[2] https://review.openstack.org/#/c/463597/

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [tc] [all] OpenStack moving both too fast and too slow at the same time

2017-05-05 Thread Dan Smith
> +1. ocata's cell v2 stuff added a lot of extra required complexity
> with no perceivable benefit to end users. If there was a long term
> stable version, then putting it in the non lts release would have
> been ok. In absence of lts, I would have recommended the cell v2
> stuff have been done in a branch instead and merged all together when
> it provided something (pike I think)

That's how cellsv1 was developed and that turned out spectacularly well.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] experimenting with extracting placement

2017-03-13 Thread Dan Smith
Interestingly, we just had a meeting about cells and the scheduler,
which had quite a bit of overlap on this topic.

> That said, as mentioned in the previous email, the priorities for Pike
> (and likely Queens) will continue to be, in order: traits, ironic,
> shared resource pools, and nested providers.

Given that the CachingScheduler is still a thing until we get claims in
the scheduler, and given that CachingScheduler doesn't use placement
like the FilterScheduler does, I think we need to prioritize the claims
part of the above list.

Based on the discussion several of us just had, the priority list
actually needs to be this:

1. Traits
2. Ironic
3. Claims in the scheduler
4. Shared resources
5. Nested resources

Claims in the scheduler is not likely to be a thing for Pike, but should
be something we do as much prep for as possible, and land early in Queens.

Personally, I think getting to the point of claiming in the scheduler
will be easier if we have placement in tree, and anything we break in
that process will be easier to backport if they're in the same tree.
However, I'd say that after that goal is met, splitting placement should
be good to go.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] scaling rabbitmq with cells v2 requires manual database update

2017-02-15 Thread Dan Smith
> The problem is there's no way to update an existing cell's transport_url
> via nova-manage.

There is:

https://review.openstack.org/#/c/431582/

> It appears the only way to get around this is manually deleting the old
> cell1 record from the db.

No, don't do that :)

> I'd like to hear more opinions on this but it really seems like this
> should be a priority to fix prior to the Ocata final release.

Already done!

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Cells meeting on Feb-15 is canceled

2017-02-14 Thread Dan Smith
Hi all,

In an epic collision of cosmic coincidences, four of the primary cells
meeting attendees have a conflict tomorrow. Since there won't really be
anyone around to run (or attend) the meeting, we'll have to cancel again.

Next week we will be at the PTG so any meeting will be done there.

So, expect the next cells meeting to be on March 1.

Thanks!

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] FYI: cells v1 job is blocked

2017-02-14 Thread Dan Smith
> We have a fix here:

Actual link to fix is left as an exercise for the reader?

https://review.openstack.org/#/c/433707

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] Cells meeting canceled

2017-02-08 Thread Dan Smith
Hi all,

Today's cells meeting is canceled. We're still working on getting ocata
out the door, a bunch of normal participants are out today, and not much
has transpired for pike just yet.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-25 Thread Dan Smith
> Update on that agreement : I made the necessary modification in the
> proposal [1] for not verifying the filters. We now send a request to the
> Placement API by introspecting the flavor and we get a list of potential
> destinations.

Thanks!

> When I began doing that modification, I know there was a functional test
> about server groups that needed modifications to match our agreement. I
> consequently made that change located in a separate patch [2] as a
> prerequisite for [1].
> 
> I then spotted a problem that we didn't identified when discussing :
> when checking a destination, the legacy filters for CPU, RAM and disk
> don't verify the maximum capacity of the host, they only multiple the
> total size by the allocation ratio, so our proposal works for them.
> Now, when using the placement service, it fails because somewhere in the
> DB call needed for returning the destinations, we also verify a specific
> field named max_unit [3].
> 
> Consequently, the proposal we agreed is not feature-parity between
> Newton and Ocata. If you follow our instructions, you will still get
> different result from a placement perspective between what was in Newton
> and what will be Ocata.

To summarize some discussion on IRC:

The max_unit field limits the maximum size of any single allocation and
is not scaled by the allocation_ratio (for good reason). Right now,
computes report a max_unit equal to their total for CPU and RAM
resources. So the different behavior here is that placement will not
choose hosts where the instance would single-handedly overcommit the
entire host. Multiple instances still could, per the rules of the
allocation-ratio.

The consensus seems to be that this is entirely sane behavior that the
previous core and ram filters weren't considering. If there's a good
reason to allow computes to report that they're willing to take a
larger-than-100% single allocation, then we can make that change later,
but the justification seems lacking at the moment.

> Technically speaking, the functional test is a canary bird, telling you
> that you get NoValidHosts while it was working previously.

My opinion, which is shared by several other people, is that this test
is broken. It's trying to overcommit the host with a single instance,
and in fact, it's doing it unintentionally for some resources that just
aren't checked before the move to placement. Changing the test to
properly reflect the resources on the host should be the path forward
and Sylvain is working on that now.

The other concern that was raised was that since CoreFilter is not
necessarily enabled on all clouds, cpu_allocation_ratio is not being
honored on those systems today. Moving to placement with ocata will
cause that value to be used, which may be incorrect for certain
overly-committed clouds which had previously ignored it. However, I
think we need not be too concerned as the defaults for these values are
16x overcommit for CPU and 1.5x overcommit for RAM. Those are probably
on the upper limit of sane for most environments, but also large enough
to not cause any sort of immediate panic while people realize (if they
didn't read the release notes) that they may want to tweak them.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

2017-01-24 Thread Dan Smith
> No. Have administrators set the allocation ratios for the resources they
> do not care about exceeding capacity to a very high number.
> 
> If someone previously removed a filter, that doesn't mean that the
> resources were not consumed on a host. It merely means the admin was
> willing to accept a high amount of oversubscription. That's what the
> allocation_ratio is for.
> 
> The flavor should continue to have a consumed disk/vcpu/ram amount,
> because the VM *does actually consume those resources*. If the operator
> doesn't care about oversubscribing one or more of those resources, they
> should set the allocation ratios of those inventories to a high value.
> 
> No more adding configuration options for this kind of thing (or in this
> case, looking at an old configuration option and parsing it to see if a
> certain filter is listed in the list of enabled filters).
> 
> We have a proper system of modeling these data-driven decisions now, so
> my opinion is we should use it and ask operators to use the placement
> REST API for what it was intended.

I agree with the above. I think it's extremely counter-intuitive to set
a bunch of over-subscription values only to have them ignored because a
scheduler filter isn't configured.

If we ignore some of the resources on schedule, the compute nodes will
start reporting values that will make the resources appear to be
negative to anything looking at the data. Before a somewhat-recent
change of mine, the oversubscribed computes would have *failed* to
report negative resources at all, which was a problem for a reconfigure
event. I think the scheduler purposefully forcing computes into the red
is a mistake.

Further, new users that don't know our sins of the past will wonder why
the nice system they see in front of them isn't doing the right thing.
Existing users can reconfigure allocation ratio values before they
upgrade. We can also add something to our upgrade status tool to warn them.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] No cells meeting next week (Jan 18)

2017-01-11 Thread Dan Smith
Hi all,

There will be no cells meeting next week, Jan 18 2017. I'll be in the
wilderness and nobody else was brave enough to run it in my absence.
Yeah, something like that.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo][nova]Accessing nullable, not set versioned object field

2016-12-16 Thread Dan Smith
>> NotImplementedError: Cannot load 'nullable_string' in the base class
>>
>> Is this the correct behavior?
> 
> Yes, that's the expected behaviour.

Yes.

>> Then what is the expected behavior if the field is also defaulted to
>> None?
>>
>> fields = {
>> 'nullable_string': fields.StringField(nullable=True,
>> default=None),
>> }
>>
>> The actual behavior is still the same exception above. Is it the
>> correct behavior?
> 
> Yes. So, what the default=None does is describe the behaviour of the
> field when obj_set_defaults() is called. It does *not* describe what is
> returned if the field *value* is accessed before being populated.
>
> What you're looking for is the obj_attr_is_set() method:
>
> 
> if MyObject.obj_attr_is_set('nullable_string'):
> print my_obj.nullable_string

I think you meant s/MyObject/my_obj/ above. However, in modern times,
it's better to use:

 if 'nullable_string' in myobj

On a per-object basis, it may also be reasonable to define
obj_load_attr() to provide the default for a field if it's not set and
attempted to be loaded.

> In addition to the obj_attr_is_set() method, use the obj_set_defaults()
> method to manually set all fields that have a default=XXX value to XXX
> if those fields have not yet been manually set:

There's another wrinkle here. The default=XXX stuff was actually
introduced before we had obj_set_defaults(), and for a very different
reason. That reason was confusing and obscure, and mostly supportive of
the act of converting nova from dicts to objects. If you look in fields,
there is an obscure handling of default, where if you _set_ a field to
None that has a default and is not nullable, it will gain the default value.

It's confusing and I wish we had never done it, but.. it's part of the
contract now and I'd have to do a lot of digging to see if we can remove
it (probably can from Nova, but...).

Your use above is similar to this, so I just wanted to point it out in
case you came across it and it led you to thinking your original example
would work.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] No cells meetings until 2017

2016-12-14 Thread Dan Smith
Hi all,

Given the upcoming holidays, there will not be nova cells meetings for
the remainder of the year. That puts the next one at January 4, 2017.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Stepping back

2016-11-22 Thread Dan Smith
> It has been a true pleasure working with you all these past few years
> and I'm thankful to have had the opportunity. As I've told people many
> times when they ask me what it's like to work on an open source project
> like this: working on proprietary software exposes you to smart people
> but you're limited to the small set of people within an organization,
> working on a project like this exposed me to smart people from many
> companies and many parts of the world. I have learned a lot working with
> you all. Thanks.

Andrew, thanks so much for all your contributions to the Nova community
over the years. I have so enjoyed working with you, learning from you,
and making nova better alongside you. I'm really sad to see you go, but
purely from a selfish point of view. I know you will go on to make other
software and communities better, and I wish you the best of luck.

I'll leave you with a snippet from my #openstack-nova archives, on a
Friday in late 2013, where I think you had just figured out something
that I broke in those early days of objectification. I stand by my
statement:

Oct 25 11:05:13 bearhands: my official opinion is that
you should hang on to lascii, just FYI

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] 23-Nov Cells meeting cancelled

2016-11-22 Thread Dan Smith
Hi all,

Since this week's cells meeting falls on food-coma-day-eve, we're
canceling it. Anyone that wants to help move things along could review
the following patches:

https://review.openstack.org/#/q/topic:bp/cells-sched-staging+project:openstack/nova+status:open

https://review.openstack.org/#/q/topic:cell-databases-fixture

Thanks!

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Thanks to those reviewing specs this week

2016-11-17 Thread Dan Smith
> I just wanted to say thanks to everyone reviewing specs this week. I've
> seen a lot of non-core newer people to the specs review process chipping
> in and helping to review a lot of the specs we're trying to get approved
> for Ocata. It can be hard to grind through several specs reviews in a
> day so I appreciate all of the help from everyone here, and it helps
> later when reviewing the code if you were familiar with the spec as it
> was being written and reviewed.

I second this sentiment and was just thinking the same as I was looking
at a few specs this morning. The amount of non-core, non-specs-core
involvement in spec review has been noticeably higher this cycle. It's
definitely nice to start reviewing a spec that is visibly more complete,
and then see that it has been through multiple rounds of review already
to hammer out the details.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Follow up on BCN review cadence discussions

2016-11-08 Thread Dan Smith
> I do imagine, however, that most folks who have been working
> on nova for long enough have a list of domain experts in their heads
> already. Would actually putting that on paper really hurt?

You mean like this?

https://wiki.openstack.org/wiki/Nova#Developer_Contacts

Those are pretty much the people I look to have sign off on a thing I'm
not completely familiar with before approving something. I'm sure it
could use some updating, of course.

This is linked from the MAINTAINERS file in our tree, by the way.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] No cellsv2 meeting today

2016-11-02 Thread Dan Smith
Hi all,

A bunch of the usual participants cannot attend the CellsV2 meeting
today, and the ones that can just discussed it last week face-to-face in
Barcelona. So, I'm going to declare it canceled for today for lack of
critical mass.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] VIF plugin issue _get_neutron_events

2016-10-08 Thread Dan Smith
> Basically the issue is seen in the following three lines of nova compute
> log. For that port even though it received the vif plugging event 2 mins
> before it waits for it and blocks and times out
> Is there a race condition in the code that basically gets the events to
> wait for and the one where it registers for this callback 
> Any comments?

This shouldn't be possible because the point at which we call
plug_vifs() is when we should trigger neutron to fire the vif-plugged
event, and that is after where we have already setup to receive those
events. In other words, your log lines should never be able to be in the
order you showed. If it's happening in that order (especially two
minutes early) I would be highly suspect of some modifications or
something else weird going on.

It'd be much better to handle this in the context of a bug, especially
providing the versions and components (i.e. neutron driver), etc. Can
you open one and provide all the usual details? I'll be happy to look.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Draft Ocata design summit schedule is up

2016-10-06 Thread Dan Smith
> Is there a particular reason we're only retrospecting on placement?

I think that we need to have a concrete topic that applied to newton and
will apply to ocata in order to be productive. I think there will be
specific things we can change in ocata that will have an actual impact
on major work for the cycle.

> I suspect we can map many of the ideas and experiences from a
> retrospective devoted to placement to more general concerns but I'd
> hate for people who had no involvement in it but are concerned about
> Nova to feel excluded.
> 
> [1] https://etherpad.openstack.org/p/nova-newton-retrospective

As has been demonstrated in that etherpad, if we try to retrospect every
aspect of newton, we'd need a week and wouldn't have time to have the
other sessions we need in order to plan for ocata. Picking two major
ongoing topics seems like the best way to frame a useful discussion to me.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Future of turbo-hipster CI

2016-10-04 Thread Dan Smith
> Having said that, I think Dan Smith came across a fairly large
> production DB dataset recently which he was using for testing some
> archive changes, maybe Dan will become our new Johannes, but grumpier of
> course. :)

That's quite an insult to Johannes :)

While working on the db archiving thing recently I was thinking about
how it would be great to get t-h to run this process on one of its
large/real datasets. Then I started to wonder when was the last time I
actually saw it comment.

I feel like these days Nova, by policy, isn't doing any database
migrations that can really take a long time for a variety of reasons
(i.e. expand-only schema migrations, no data migrations). That means the
original thing t-h set out to prevent is not really much of a risk anymore.

I surely think it's valuable, but I understand if the benefit does not
outweigh the cost at this point.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] ops meetup feedback

2016-09-20 Thread Dan Smith
> The current DB online data upgrade model feels *very opaque* to
> ops. They didn't realize the current model Nova was using, and didn't
> feel like it was documented anywhere.

> ACTION: document the DB data lifecycle better for operators

This is on me, so I'll take it. I've just thrown together something that
I think will help a little bit:

  https://review.openstack.org/373361

Which, instead of a blank screen and a return code, gives you something
like this:

+---+--+---+
| Migration | Total Needed | Completed |
+---+--+---+
| migrate_aggregates|  5   | 4 |
| migrate_instance_keypairs |  6   | 6 |
+---+--+---+

I'll also see about writing up some docs about the expected workflow
here. Presumably that needs to go in some fancy docs and not into the
devref, right? Can anyone point me to where that should go?

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Next steps for resource providers work

2016-09-02 Thread Dan Smith
> We know:
> 
> * It pretty much does what we intend it to do: allocations are added
>   and deleted on server create and delete.
> * On manipulations like a resize the allocations are not updated
>   immediately, there is a delay until the heal periodic job does its
>   thing.

We know one more thing. For some reason we're overrunning the vcpu
capacity in a normal tempest run. It doesn't seem to affect any other
resources, though. We are configured to not use CoreFilter, which means
the scheduler isn't worried about overcommit or honoring the
cpu_allocation_ratio value. I put up a devstack change to hack it up to
4.0, thinking that that would give us enough room to stop getting the
errors (4 VCPUS x 4.0 = 16), but it doesn't:

https://review.openstack.org/#/c/364581/

You can see in anything that runs placement that it's unhappy:

> 2016-09-02 00:34:11.768 18251 WARNING nova.scheduler.client.report 
> [req-cde54d5f-bcef-4670-8864-eaf479cc9bb9 
> tempest-ServersAdminTestJSON-1247365597 
> tempest-ServersAdminTestJSON-1247365597] Unable to submit allocation for 
> instance abc7661d-f258-4eed-8c0c-91b17216d32c (409 409 Conflict
> 
> There was a conflict when trying to complete your request.
> 
>  Unable to allocate inventory: Unable to create allocation for 'VCPU' on 
> resource provider '5295c607-fbb8-472d-8e3d-6067b8814ef8'. The requested 
> amount would exceed the capacity. 

We should try to get this figured out before newton ships if possible. I
don't think I see it locally, but I have a large dev machine, so I'll
have to try to poke it harder.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [keystone][nova][neutron][all] Rolling upgrades: database triggers and oslo.versionedobjects

2016-09-01 Thread Dan Smith
> So that is fine.  However, correct me if I'm wrong but you're 
> proposing just that these projects migrate to also use a new service 
> layer with oslo.versionedobjects, because IIUC Nova/Neutron's 
> approach is dependent on that area of indirection being present. 
> Otherwise, if you meant something like, "use an approach that's kind 
> of like what Nova does w/ versionedobjects but without actually 
> having to use versionedobjects", that still sounds like, "come up 
> with a new idea".

If you don't need the RPC bits, versionedobjects is nothing more than an
object facade for you to insulate your upper layers from such change.
Writing your facade using versionedobjects just means inheriting from a
superclass that does a bunch of stuff you don't need. So I would not say
that taking the same general approach without that inheritance is "come
up with a new idea".

Using triggers and magic to solve this instead of an application-level
facade is a substantially different approach to the problem.

> I suppose if you're thinking more at the macro level, where "current
>  approach" means "do whatever you have to on the app side", then your
>  position is consistent, but I think there's still a lot of
> confusion in that area when the indirection of a versioned service
> layer is not present. It gets into the SQL nastiness I was discussing
> w/ Clint and I don't see anyone doing anything like that yet.

The indirection service is really unrelated to this discussion, IMHO. If
you take RPC out of the picture, all you have left is a
direct-to-the-database facade to handle the fact that schema has
expanded underneath you. As Clint (et al) have said -- designing the
application to expect schema expansion (and avoiding unnecessary
contraction) is the key here.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] migrate_flavor_data doesn't flavor migrate meta data of VMs spawned during upgrade.

2016-08-31 Thread Dan Smith
> Thanks Dan for your response. While I do run that before I start my
> move to liberty, what I see is that it doesn't seem to flavor migrate
> meta data for the VMs that are spawned after controller upgrade from
> juno to kilo and before all computes upgraded from juno to kilo. The
> current work around is to delete those VMs that are spawned after
> controller upgrade and before all computes upgrade, and then initiate
> liberty upgrade. Then it works fine.

I can't think of any reason why that would be, or why it would be a
problem. Instances created after the controllers are upgraded should not
have old-style flavor info, so they need not be touched by the migration
code.

Maybe filing a bug is in order describing what you see?

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] migrate_flavor_data doesn't flavor migrate meta data of VMs spawned during upgrade.

2016-08-31 Thread Dan Smith
> While migrate_flavor_data seem to flavor migrate meta data of the VMs
> that were spawned before upgrade procedure, it doesn't seem to flavor
> migrate for the VMs that were spawned during the upgrade procedure more
> specifically after openstack controller upgrade and before compute
> upgrade. Am I missing something here or is it by intention?

You can run the flavor migration as often as you need, and can certainly
run it after your last compute is upgraded before you start to move into
liberty.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [keystone][nova][neutron][all] Rolling upgrades: database triggers and oslo.versionedobjects

2016-08-30 Thread Dan Smith
>> I don't think it's all that ambitious to think we can just use
>> tried and tested schema evolution techniques that work for everyone
>> else.
> 
> People have been asking me for over a year how to do this, and I have
> no easy answer, I'm glad that you do.  I would like to see some
> examples of these techniques.

I'm not sure how to point you at the examples we have today because
they're not on a single line (or set of lines) in a single file. Nova
has moved a lot of data around at runtime using this approach in the
last year or so with good success.

> If you can show me the SQL access code that deals with the above
> change, that would help a lot.

We can't show you that, because as you said, there isn't a way to do
it...in SQL. That is in fact the point though: don't do it in SQL.

> If the answer is, "oh well just don't do a schema change like that", 
> then we're basically saying we aren't really changing our schemas 
> anymore except for totally new features that otherwise aren't
> accessed by the older version of the code.

We _are_ saying "don't change schema like that", but it's not a very
limiting requirement. It means you can't move things in a schema
migration, but that's all. Nova changes schema all the time.

In the last year or so, off the top of my head, nova has:

1. Moved instance flavors from row=value metadata storage to a JSON
   blob in another table
2. Moved core flavors, aggregates, keypairs and other structures from
   the cell database to the api database
3. Added uuid to aggregates
4. Added a parent_addr linkage in PCI device

...all online. Those are just the ones I have in my head that have
required actual data migrations. We've had dozens of schema changes that
enable new features that are all just new data and don't require any of
this.

> That's fine.   It's not what people coming to me are saying, though.

Not sure who is coming to you or what they're saying, but.. okay :)

If keystone really wants to use triggers to do this, then that's fine.
But I think the overwhelming response from this thread (which is asking
people's opinions on the matter) seems to be that they're an unnecessary
complication that will impede people debugging and working on that part
of the code base. We have such impediments elsewhere, but I think we
generally try to avoid doing one thing a hundred different ways to keep
the playing field as level as possible.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [keystone][nova][neutron][all] Rolling upgrades: database triggers and oslo.versionedobjects

2016-08-30 Thread Dan Smith
>> Even in the case of projects using versioned objects, it still
>> means a SQL layer has to include functionality for both versions of
>> a particular schema change which itself is awkward.

That's not true. Nova doesn't have multiple models to straddle a
particular change. We just...

> It's simple, these are the holy SQL schema commandments:
> 
> Don't delete columns, ignore them.
> Don't change columns, create new ones.
> When you create a column, give it a default that makes sense.
> Do not add new foreign key constraints.

...do this ^ :)

We can drop columns once they're long-since-unused, but we still don't
need duplicate models for that.

--Dan

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


  1   2   3   4   >