> Here it is :)
>
> https://wiki.openstack.org/wiki/Special:AncientPages
Great, I see at least one I can nuke on the first page.
Note that I don't seem to have delete powers on the wiki. That's surely
a first step in letting people maintain the relevance of things on the wiki.
--Dan
__
> Thanks for summing this up, Deva. The planned solution still gets my
> vote; we build that, deprecate the old single compute host model where
> nova handles all scheduling, and in the meantime figure out the gaps
> that operators need filled and the best way to fill them.
Mine as well, speaking
> It was in the queue for 11 days, Dan Smith took a look, he added Jay
> Pipes, and I also added Matt Riedemann, there were also a bunch of
> neutron folks on it since this fix originated from their end.
Yeah, and both of those other people have had serious other commitments
in the last
> While I don't think it's strictly required by the api change guidelines [3]
> I think the API interactions and behavior here feel different enough to
> warrant
> having a microversion. Ideally there should have been some versioning in the
> cinder api around the multiattach support but that ship
> It is however not ideal when a deployment is set up such that
> multiattach will always fail because a hypervisor is in use which
> doesn't support it. An immediate solution would be to add a policy so a
> deployer could disallow it that way which would provide immediate
> feedback to a user tha
> I'm formally proposing that the nova-stable-maint team [1] adds Tony
> Breeds to the core team.
My major complaint with Tony is that he talks funny. If he's willing to
work on fixing that, I'm +1.
:-P
--Dan
__
OpenStack
> I know we have some projects (heat, I think?) that don't have UUIDs at
> all. Are they using oslo.versionedobjects? I suppose we could move them
> to a string field instead of a UUID field, instead of flipping the
> enforcement flag on. Alternately, if we add a new class we wouldn't have
> to eve
> The thing is, UpgradeImpact isn't always appropriate for the change, but
> DocImpact is used too broadly and as far as I can tell, it's not really
> for updating release notes [2]. It's for updating stuff found in
> docs.openstack.org.
>
> So we kicked around the idea of a ReleaseNoteImpact tag
> This is, I believe, sufficient to solve our entire problem.
> Specifically, we have no need for an indirection API that rebroadcasts
> messages that are too new (since that can't happen with pinning) and no
> need for Versioned Objects in the RPC layer. (Versioned objects for the
> DB are still c
> Here are a few --
> instance_get_all_by_filters joins manually with
> instances_fill_metadata --
> https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1890
> https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/api.py#L1782
>
> Almost all instance query function
> If OTOH we are referring to the width of the columns and the join is
> such that you're going to get the same A identity over and over again,
> if you join A and B you get a "wide" row with all of A and B with a very
> large amount of redundant data sent over the wire again and again (note
> tha
> In the past I've taken a different approach to problematic one to
> many relationships and have made the metadata a binary JSON blob. Is
> there some reason that won't work?
We have done that for various pieces of data that were previously in
system_metadata. Where this breaks down is if you ne
> No, that's not valid behaviour. You need to upgrade the controller
> infrastructure (conductor, API nodes, etc) before any compute nodes.
Yep.
--Dan
signature.asc
Description: OpenPGP digital signature
__
OpenStack Devel
> Is this documented somewhere?
>
> I did a bit of digging and couldn't find anywhere that explicitly
> required that for the J->K upgrade. Certainly it was documented for the
> I->J upgrade.
It's our model, so I don't think we need to document it for each cycle
since we don't expect it to chang
> there was a little skeptism because it was originally sold as magic,
> but reading the slides from Vancouver[1], it is not magic.
I think I specifically said "they're not magic" in my slides. Not sure
who sold you them as magic, but you should leave them a
less-than-five-stars review.
> Ceilome
> we store everything as primitives: floats, time, integer, etc... since
> we need to query on attributes. it seems like versionedobjects might not
> be useful to our db configuration currently.
I don't think the former determines the latter -- we have lots of things
stored as rows of column primi
> If you want my inexperienced opinion, a young project is the perfect
> time to start this.
^--- This ---^
> I understand that something like [2] will cause a test to fail when you
> make a major change to a versioned object. But you *want* that. It helps
> reviewers more easily catch co
>>> we store everything as primitives: floats, time, integer, etc... since
>>> we need to query on attributes. it seems like versionedobjects might not
>>> be useful to our db configuration currently.
>> I don't think the former determines the latter -- we have lots of things
>> stored as rows of c
> I would like to gather all upgrade activities in Neutron in one place,
> in order to summarizes the current status and future activities on
> rolling upgrades in Mitaka.
Glad to see this work really picking up steam in other projects!
> b. TODO: To have the rolling upgrade we have to imple
> Conductors will always need to talk to the database. APIs may not need
> to talk to the database. I think we can just roll conductor
> upgrades through, and then update ironic-api after that. This should
> just work, as long as we're very careful about schema changes (this is
> where the expand/c
> But 4.x was EOL over a year ago:
>
> http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2039567
...and was released in 2010.
We're supporting a minimum version of libvirt from 2014, so I think that
dropping support for five-year-old EOL'd VMware is go
> The proposed patch also drops support for 5.0, which as I understand
> it is not EOL'd? The documentation appears to indicate that some
> functionality will not work with < 5.1, but it's not explicitly clear
> what that it is.
Yeah, I guess I assumed that anyone on 5.0 was just late moving to
>=
> Here's the big thing that you're missing: no data migrations are allowed
> any more in DB migration scripts.
Correct, online-schema-migrations aside, we're trying to avoid ever
doing data migrations in schema migrations anymore.
> In this way, data migrations occur *over time* in the nova.objec
Hi,
Mellanox CI has been broken for a while now. Test runs are reported as
"successful" after an impossibly short less-than-a-minute run. Could the
owners of this please take a look and address the problem? At least
disabling commenting while working on the issue would be helpful.
Also, on succes
> The remaining work is to have a way of preventing database contracts
> from running until data migrations that the column interacts with are
> complete, this is not an easy problem to solve as the goal of the
> online migrations is to do pure model based schema migrations and
> there is no way of
> I like the idea of having a named condition, but the issue is how to maintain
> and control multiple of these conditions in a system that will use model
> against current schema to determine changes.
It's not about having a named condition, it's about having a single
condition for a given sche
> Well as long as you want to be able to load data and perform updates on
> Instance.name using normal ORM patterns, you'd still have that column
> mapped, if you want to put extra things into it to signal your migration
> tool, there is an .info dictionary available on the Column that you can
> us
>> 3. Build the controls into our process with a way of storing the
>> current release cycle information to only allow contract’s to
>> occur at a set major release and maintain the column in the model
>> till it is ready to be removed major release + 1 since the
>> migration was a
> Every change like this makes it harder for newcomers to participate.
> Frankly, it makes it harder for everyone because it means there are
> more moving parts, but in this specific case many of the people
> involved in these messaging drivers are relatively new, so I point
> that out.
I dunno ab
> Why do cores need approved specs for example - and indeed for many of us
> - it's just a dance we do. I refuse to believe that a core can be
> trusted to approve patches but not to write any code other than a bugfix
> without a written document explaining themselves, and then have a yet
> more ex
> If multi-node isn't reliable more generally yet, do you think the
> simpler implementation of partial-upgrade testing could proceed? I've
> already done all of the patches to do it for Neutron. That way we could
> quickly get something in place to help block regressions and work on the
> longer
Hi Dan,
> I put together a quick etherpad to help outline the remaining patches
> left to support package based upgrades within the TripleO heat
> templates:
>
> https://etherpad.openstack.org/p/tripleo-package-upgrades
>
> The etherpad includes a brief overview of the upgrade approach, a list
>
> Yeah I think it's fair to say this is just the first step of probably
> several iterations towards fully orchestrated upgrades such as you
> describe, atm really it's just a way of pushing out minor updates not
> version-to-version upgrades (yet).
Okay, that's cool, I just wanted to make sure I
> +1 such a blog post would be great and would really help other projects
> align/reuse where appropriate :)
Hot off the presses:
http://www.danplanet.com/blog/2015/06/26/upgrading-nova-to-kilo-with-minimal-downtime/
--Dan
> I have a three node Juno system running on Ubuntu 14.04. On the
> compute node I keep getting the following Endpoint does not support
> RPC version 3.33 to caller error when I launch a Nova instance. The
> Controller is running rabbitMQ v3.4.2. So I do not understand why the
> compute node thinks
> In our continued quest on being more explicit about plug points it feels
> like we should other document the interface (which means creating
> stability on the hook parameters) or we should deprecate this construct
> as part of a bygone era.
>
> I lean on deprecation because it feels like a thin
> Forgive my ignorance or for playing devil's advocate, but wouldn't the
> main difference between notifications and hooks be that notifications
> are asynchronous and hooks aren't?
The main difference is that notifications are external and intended to
be stable (especially with the versioned noti
Hi all,
Just like we did in Kilo, Newton will have a blocker migration at the
front that ensures things we expect to be migrated online in Mitaka have
completed. That needs to land first, so please do not approve any DB
migrations until this lands:
https://review.openstack.org/#/c/289450/
I'll
> The EC2-API project doesn't appear to be very actively worked on.
> There is one very recent commit from an Oslo team member, another
> couple from a few days before, and then the next one is almost a
> month old. Given the lack of activity, if no team member has
> volunteered to be PTL I think w
> Just clarifying:
> For Nova: Yes. Matt's the only candidate I think that's a preety save
> assumption.
Sorry, I meant for Nova.
I guess since the nomination period is over (right), I'm not sure how
it's not more than an assumption...but okay :)
--Dan
>> Shouldn't we be trying to remove central bottlenecks by
>> decentralizing communications where we can?
>
> I think that's a good goal to continue having. Some deployers have
> setup firewalls between compute nodes, or between compute nodes and
> the database, so we use the conductor to facilita
> Please respond with +1s or any concerns.
+1
--Dan
signature.asc
Description: OpenPGP digital signature
__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.or
> I'm with Daniel on that one. We shouldn't "deprecate" until we are 100%
> sure that the replacement is up to the task and that strategy is solid.
My problem with this is: If there wasn't a stackforge project, what
would we do? Nova's in-tree EC2 support has been rotting for years now,
and despit
> I can do another release if needed once we've landed a fix, although
> it sounds like this can be fixed in neutron?
It's just pylint being silly, from the sound of it. I don't think there
is anything we need to do in novaclient, since the transition adapter
works. Either convince neutron pylint
> considerations before redoing the release cycle.
Well, I'll refer to my original words:
> On 02/23/2015 03:45 PM, Dan Smith wrote:
>> I've been wondering this myself for quite a while now. I'm really
>> interested to hear what things would look like in a no-r
Did we really need another top-level thread for this?
> 1. _destroy_evacuated_instances() should do a better job of sanity
> checking before performing such a drastic action.
I agree, and no amount of hostname checking will actually address this
problem. If we don't have a record of an evacuate
> The hostname is a unique identifier, however, it isn't a /stable/
> unique identifier because it is determined at the whim of the
> administrator.
Honestly, I think it's as stable as the administrator wants it to be. At
the level of automation that any reasonable deployment will be running,
I
> I am really sorry it got in as I have -1ed it several times for the same
> reason (I _really_ hate using the -2 hammer - we're all adults here
> after all).
I guess that I should take some blame as a reviewer on that patch, but
only after this mail do I read some of your comments as fundamentall
> To quote John from an earlier email in this thread:
>
> Its worth noting, we do have the "experimental" flag:
> "
> The first header specifies the version number of the API which was
> executed. Experimental is only returned if the operator has made a
> modification to the API behaviour that is
> we need to make sure we continue to progress on bugs targeted as
> likely to need backport to Kilo. The current list is here --
> https://bugs.launchpad.net/nova/+bugs?field.tag=kilo-rc-potential .
>
> Bugs on that list which have fixes merged and backports prepared will
> stand a very good chan
Hi all,
We're trying to land a specific DB migration as the first in L:
https://review.openstack.org/#/c/174480/
In order to do that, we need to get some changes into grenade, which are
blocked on other housekeeping. That should happen in a week or so.
In the meantime, please don't approve any
This proposed patch requiring a data migration in Nova master is making
Turbo Hipster face plant - https://review.openstack.org/#/c/174480/
This is because we will require Kilo deployers to fully migrate their
flavor data from system_metadata to instance_extra before they upgrade
to the next relea
> Well, I think there are very few cases where *less* coverage is better.
IMHO, most of the test coverage we have for nova's neutronapi is more
than useless. It's so synthetic that it provides no regression
protection, and often requires significantly more work than the change
that is actually bei
> Let's not mix the bad unit tests in Nova with the fact that code should
> be fully covered by well written unit tests.
I'm not using bad tests in nova to justify not having coverage testing.
I'm saying that the argument that "more coverage is always better" has
some real-life counter examples.
> (We should also note that we can just merge a thing without turbo
> hipster passing if we understand the reason for the test failure.
> Sure, that breaks the rest of the turbo hipster runs, but we're not
> 100% blocked here.)
Indeed, the fact that it's failing actually proves the patch works,
wh
> The migrate_flavor_data command didn't actually work on the CLI (unless
> I'm missing something or did something odd). See
> https://review.openstack.org/#/c/175890/ where I fix the requirement of
> max_number. This likely means that operators have not bothered to do or
> test the migrate_flavor_
> Sure, but for people doing continuous deployment, they clearly haven't
> ran the migrate_flavor_data (or if they have, they haven't filed any
> bugs about it not working[0]).
Hence the usefulness of T-H here, right? The point of the migration
check is to make sure that people _do_ run it before
> That is correct -- these are icehouse datasets which have been
> upgraded, but never had an juno run against them. It would be hard for
> turbo hipster to do anything else, as it doesn't actually run a cloud.
> We can explore ideas around how to run live upgrade code, but its
> probably a project
> If I selected all the instance_type_id's from the system-metadata table
> and used those uuid's to load the object with something like:
> instance = objects.Instance.get_by_uuid(
> context, instance_uuid,
> expected_attrs=['system_metadata', 'flavor'])
>
> The tes
> That change works on the dataset. However I was referring to the
> db/object api (which I have no real knowledge of) that it should be able
> to get_by_uuid unmigrated instances and in my case I got the traceback
> given in that paste. It's possible I'm just using the API incorrectly.
You should
> In defense of those of us asking questions, I'll just point out
> that as a core reviewer I need to be sure I understand the intent
> and wide-ranging ramifications of patches as I review them. Especially
> in the Oslo code, what appears to be a small local change can have
> unintended consequen
Hi Lenny,
> Is there anything missing for us to start 'non-voting' Nova CI ?
Sorry for the slow response from the team.
The results that you've posted look good to me. A quick scan of the
tempest results don't seem to indicate any new tests that are
specifically testing SRIOV things. I assume th
> I propose we add Melanie to nova-core.
>
> She has been consistently doing great quality code reviews[1],
> alongside a wide array of other really valuable contributions to the
> Nova project.
Extremely +1.
--Dan
__
OpenS
> There is an open discussion to replace mysql-python with PyMySQL, but
> PyMySQL has worse performance:
>
> https://wiki.openstack.org/wiki/PyMySQL_evaluation
My major concern with not moving to something different (i.e. not based
on the C library) is the threading problem. Especially as we move
> +1 Agreed nested containers are a thing. Its a great reason to keep
> our LXC driver.
I don't think that's a reason we should keep our LXC driver, because you
can still run containers in containers with other things. If anything,
using a nova vm-like container to run application-like containers
Interestingly, we just had a meeting about cells and the scheduler,
which had quite a bit of overlap on this topic.
> That said, as mentioned in the previous email, the priorities for Pike
> (and likely Queens) will continue to be, in order: traits, ironic,
> shared resource pools, and nested prov
> +1. ocata's cell v2 stuff added a lot of extra required complexity
> with no perceivable benefit to end users. If there was a long term
> stable version, then putting it in the non lts release would have
> been ok. In absence of lts, I would have recommended the cell v2
> stuff have been done in
The etherpad for this session is here [1]. The goal of the session was
to get some questions answered that the developers had for operators
around the topic of cellsv2.
The bulk of the time was spent discussing ways to limit instance
scheduling retries in a cellsv2 world where placement eliminates
> Thanks for answering the base question. So, if AZs are implemented with
> haggs, then really, they are truly disjoint from cells (ie, not a subset
> of a cell and not a superset of a cell, just unrelated.) Does that
> philosophy agree with what you are stating?
Correct, aggregates are at the top
> As most of the upgrade issues center around database migrations, we
> discussed some of the potential pitfalls at length. One approach was to
> roll-up all DB migrations into a single repository and run all upgrades
> for a given project in one step. Another was to simply have mutliple
> python v
> I haven't looked at what Keystone is doing, but to the degree they are
> using triggers, those triggers would only impact new data operations as
> they continue to run into the schema that is straddling between two
> versions (e.g. old column/table still exists, data should be synced to
> new col
>> My current feeling is that we got ourselves into our existing mess
>> of ugly, convoluted code when we tried to add these complex
>> relationships into the resource tracker and the scheduler. We set
>> out to create the placement engine to bring some sanity back to how
>> we think about things
>> b) a compute node could very well have both local disk and shared
>> disk. how would the placement API know which one to pick? This is a
>> sorting/weighing decision and thus is something the scheduler is
>> responsible for.
> I remember having this discussion, and we concluded that a
> comp
So it seems our options are:
1. Allow PUT /os-services/{service_uuid} on any type of service, even if
doesn't make sense for non-nova-compute services.
2. Change the behavior of [1] to only disable new "nova-compute" services.
Please, #2. Please.
--Dan
___
Are we allowed to cheat and say auto-disabling non-nova-compute services
on startup is a bug and just fix it that way for #2? :) Because (1) it
doesn't make sense, as far as we know, and (2) it forces the operator to
have to use the API to enable them later just to fix their nova
service-list outp
> So, I see your point here, but my concern here is that if we *modify* an
> existing schema migration that has already been tested to properly apply
> a schema change for MySQL/InnoDB and PostgreSQL with code that is
> specific to NDB, we introduce the potential for bugs where users report
> that
So to the existing core team members, please respond with a yay/nay and
after about a week or so we should have a decision (knowing a few cores
are on vacation right now).
+1 on the condition that gibi stops finding so many bugs in the stuff I
worked on. It's embarrassing.
--Dan
___
- Modify the `supports-upgrades`[3] and `supports-accessible-upgrades`[4] tags
I have yet to look into the formal process around making changes to
these tags but I will aim to make a start ASAP.
We've previously tried to avoid changing assert tag definitions because
we then have to re-rev
In this serie of patches we are generalizing the PCI framework to
handle MDEV devices. We arguing it's a lot of patches but most of them
are small and the logic behind is basically to make it understand two
new fields MDEV_PF and MDEV_VF.
That's not really "generalizing the PCI framework to hand
The concepts of PCI and SR-IOV are, of course, generic
They are, although the PowerVM guys have already pointed out that they
don't even refer to virtual devices by PCI address and thus anything
based on that subsystem isn't going to help them.
but I think out of principal we should avoid a
Hi all,
Due to a zuulv3 bug, we're running an old nova-network test job on
master and, as you would expect, failing hard. As a workaround in the
meantime, we're[0] going to disable that job entirely so that it runs
nowhere. This makes it not run on master (good) but also not run on
stable/new
>> I also think there is value in exposing vGPU in a generic way, irrespective
>> of the underlying implementation (whether it is DEMU, mdev, SR-IOV or
>> whatever approach Hyper-V/VMWare use).
>
> That is a big ask. To start with, all GPUs are not created equal, and
> various vGPU functionality
> Any update on where we stand on issues now? Because every single patch I
> tried to land yesterday was killed by POST_FAILURE in various ways.
> Including some really small stuff - https://review.openstack.org/#/c/324720/
Yeah, Nova has only landed eight patches since Thursday. Most of those are
> But the record in 'host_mappings' table of api database is not deleted
> (I tried it with nova master 8ca24bf1ff80f39b14726aca22b5cf52603ea5a0).
> The cell cannot be deleted if the records for the cell remains in
> 'host_mappings' table.
> (An error occurs with a message "There are existing host
> I hope everyone travelling to the Sydney Summit is enjoying jet lag
> just as much as I normally do. Revenge is sweet! My big advice is that
> caffeine is your friend, and to not lick any of the wildlife.
I wasn't planning on licking any of it, but thanks for the warning.
> As of just now, all
> In my experience, the longer a patch (or worse, patch series) sits
> around, the staler it gets. Others are merging changes, so the
> long-lived patch series has to be constantly rebased.
This is definitely true.
> The 20% developer would be spending a greater proportion of her time
> figuring
Ed Leafe writes:
> I think you're missing the reality that intermediate releases have
> about zero uptake in the real world. We have had milestone releases of
> Nova for years, but I challenge you to find me one non-trivial
> deployment that uses one of them. To my knowledge, based on user
> surv
>> There was some discussion that conflicted with reality a bit and I
>> think we need to resolve before too long, but shouldn't impact the
>> newton-based changes:
>>
>> We bounced around two different HTTP resources for returning one or
>> several resource providers in response to a launch reques
> No. POST /allocations/{consumer_uuid} is the thing that the resource
> tracker calls for the claim on the compute node.
>
> The POST /allocations is something we've been throwing around ideas on
> for an eventual call that the placement engine would expose for "claims
> in the scheduler".
Right
> Multitenant networking
> ==
I haven't reviewed this one much either, but it looks smallish and if
other people are good with it then I think it's probably something we
should do.
> Multi-compute usage via a hash ring
> ===
I'm obviously +2 on
> The only functional difference in the new code that happens in
> the gate
> is the iptables rule:
>
> local default_dev=""
> default_dev=$(ip route | grep ^default | awk '{print $5}')
> sudo iptables -t nat -A POSTROUTING -o
> I haven't been able to reproduce it either, but it's unclear how packets
> would get into a VM on an island since there is no router interface, and
> the VM can't respond even if it did get it.
>
> I do see outbound pings from the connected VM get to eth0, hit the
> masquerade rule, and continue
> This differs from Nova and Neutron's approaches to solve for rolling
> upgrades (which use oslo.versionedobjects), however Keystone is one of
> the few services that doesn't need to manage communication between
> multiple releases of multiple service components talking over the
> message bus (whi
>> Even in the case of projects using versioned objects, it still
>> means a SQL layer has to include functionality for both versions of
>> a particular schema change which itself is awkward.
That's not true. Nova doesn't have multiple models to straddle a
particular change. We just...
> It's sim
>> I don't think it's all that ambitious to think we can just use
>> tried and tested schema evolution techniques that work for everyone
>> else.
>
> People have been asking me for over a year how to do this, and I have
> no easy answer, I'm glad that you do. I would like to see some
> examples o
> While migrate_flavor_data seem to flavor migrate meta data of the VMs
> that were spawned before upgrade procedure, it doesn't seem to flavor
> migrate for the VMs that were spawned during the upgrade procedure more
> specifically after openstack controller upgrade and before compute
> upgrade. A
> Thanks Dan for your response. While I do run that before I start my
> move to liberty, what I see is that it doesn't seem to flavor migrate
> meta data for the VMs that are spawned after controller upgrade from
> juno to kilo and before all computes upgraded from juno to kilo. The
> current work
> So that is fine. However, correct me if I'm wrong but you're
> proposing just that these projects migrate to also use a new service
> layer with oslo.versionedobjects, because IIUC Nova/Neutron's
> approach is dependent on that area of indirection being present.
> Otherwise, if you meant som
> We know:
>
> * It pretty much does what we intend it to do: allocations are added
> and deleted on server create and delete.
> * On manipulations like a resize the allocations are not updated
> immediately, there is a delay until the heal periodic job does its
> thing.
We know one more th
> 2. Dan Smith mentioned another idea such that we could index the
> aggregate metadata keys like filter_tenant_id0, filter_tenant_id1,
> ... filter_tenant_idN and then combine those so you have one host
> aggregate filter_tenant_id* key per tenant.
Yep, and that's wha
201 - 300 of 371 matches
Mail list logo