Re: [openstack-dev] [stable][all] Revisiting the 6 month release cycle

2015-02-25 Thread Johannes Erdfelt
On Tue, Feb 24, 2015, Jeremy Stanley fu...@yuggoth.org wrote:
 On 2015-02-24 10:00:51 -0800 (-0800), Johannes Erdfelt wrote:
 [...]
  Recently, I have spent a lot more time waiting on reviews than I
  have spent writing the actual code.
 
 That's awesome, assuming what you mean here is that you've spent
 more time reviewing submitted code than writing more. That's where
 we're all falling down as a project and should be doing better, so I
 applaud your efforts in this area.

I think I understand what you're trying to do here, but to be clear, are
you saying that I only have myself to blame for how long it takes to
get code merged nowadays?

JE


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [stable][all] Revisiting the 6 month release cycle

2015-02-24 Thread Johannes Erdfelt
On Mon, Feb 23, 2015, Joe Gordon joe.gord...@gmail.com wrote:
 What this actually means:
 
- Stop approving blueprints for specific stable releases, instead just
approve them and target them to milestones.
   - Milestones stop becoming Kilo-1, Kilo-2, Kilo-3 etc. and just
   become 1,2,3,4,5,6,7,8,9 etc.
   - If something misses what was previously known as Kilo-3 it has to
   wait a week for what for milestone 4.
- Development focuses on milestones only. So 6 week cycle with say 1
week of stabilization, finish things up before each milestone

What is the motiviation for having milestones at all?

At least in the Nova world, it seems like milestones mean nothing at
all. It's just something John Garbutt spends a lot of his time updating
that doesn't appear to provide any value to anyone.

JE


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [stable][all] Revisiting the 6 month release cycle

2015-02-24 Thread Johannes Erdfelt
On Tue, Feb 24, 2015, Thierry Carrez thie...@openstack.org wrote:
 Agree on the pain of maintaining milestone plans though, which is why I
 propose we get rid of most of it in Liberty. That will actually be
 discussed at the cross-project meeting today:
 
 https://wiki.openstack.org/wiki/Release_Cycle_Management/Liberty_Tracking

I'm happy to see this.

Assignees may target their blueprint to a future milestone, as an
indication of when they intend to land it (not mandatory)

That seems useless to me. I have no control over when things land. I can
only control when my code is put up for review.

Recently, I have spent a lot more time waiting on reviews than I have
spent writing the actual code.

JE


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] The strange case of osapi_compute_unique_server_name_scope

2015-02-20 Thread Johannes Erdfelt
On Thu, Feb 19, 2015, Matthew Booth mbo...@redhat.com wrote:
 Assuming this configurability is required, is there any way we can
 instead use it to control a unique constraint in the db at service
 startup? This would be something akin to a db migration. How do we
 manage those?

Ignoring if this particular feature is useful or not, this is possible.

With sqlalchemy-migrate, there could be code to check the config
option at startup and add/remove the unique constraint. This would leave
some schema management out of the existing scripts, which would be
mildly ugly.

With my online schema changes patch, this is all driven by the model.
Similar code could add/remove the unique constraint to the model. At
startup, the schema could be compared against the model to ensure
everything matches.

Adding/removing a unique constraint at any time leaves open some user
experience problems with data that violates the constraint preventing it
from being created.

Presumably a tool could help operators deal with that.

All that said, it's kind of messy and nontrivial work, so I'd avoid
trying to support a feature like this if we really don't need to :)

JE


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][tc] SQL Schema Downgrades and Related Issues

2015-01-29 Thread Johannes Erdfelt
On Thu, Jan 29, 2015, Morgan Fainberg morgan.fainb...@gmail.com wrote:
 The concept that there is a utility that can (and in many cases
 willfully) cause permanent, and in some cases irrevocable, data loss
 from a simple command line interface sounds crazy when I try and
 explain it to someone.
 
 The more I work with the data stored in SQL, and the more I think we
 should really recommend the tried-and-true best practices when trying
 to revert from a migration: Restore your DB to a known good state.

You mean like restoring from backup?

Unless your code deploy fails before it has any chance of running, then
you could have had new instances started or instances changed and then
restoring from backups would lose data.

If you meant another way of restoring your data, then there are
some strategies that downgrades could employ that doesn't lose data,
but there is nothing that can handle 100% of cases.

All of that said, for the Rackspace Public Cloud, we have never rolled
back our deploy. We have always rolled forward for any fixes we needed.

From my perspective, I'd be fine with doing away with downgrades, but
I'm not sure how to document that deployers should roll forward if they
have any deploy problems.

JE


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo.db] PyMySQL review

2015-01-28 Thread Johannes Erdfelt
On Wed, Jan 28, 2015, Clint Byrum cl...@fewbar.com wrote:
 As is often the case with threading, a reason to avoid using it is
 that libraries often aren't able or willing to assert thread safety.
 
 That said, one way to fix that, is to fix those libraries that we do
 want to use, to be thread safe. :)

I floated this idea across some coworkers recently and they brought up a
similar concern, which is concurrency in general, both within our code
and dependencies.

I can't find many places in Nova (at least) that are concurrent in the
sense that one object will be used by multiple threads. nova-scheduler
is likely one place. nova-compute would likely be easy to fix if there
are any problems.

That said, I think the only way to know for sure is to try it out and
see. I'm going to hack up a proof of concept and see how difficult this
will be.

JE


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo.db] PyMySQL review

2015-01-28 Thread Johannes Erdfelt
On Wed, Jan 28, 2015, Mike Bayer mba...@redhat.com wrote:
 I can envision turning this driver into a total monster, adding
 C-speedups where needed but without getting in the way of async
 patching, adding new APIs for explicit async, and everything else.
 However, I’ve no idea what the developers have an appetite for.

This is great information. I appreciate the work on evaluating it.

Can I bring up the alternative of dropping eventlet and switching to
native threads?

We spend a lot of time working on the various incompatibilies between
eventlet and other libraries we use. It also restricts us by making it
difficult to use an entire class of python modules (that use C
extensions for performance, etc).

I personally have spent more time than I wish to admit fixing bugs in
eventlet and troubleshooting problems we've had.

And it's never been clear to me why we *need* to use eventlet or
green threads in general.

Our modern Nova appears to only be weakly tied to eventlet and greenlet.
I think we would spend less time replacing eventlet with native threads
than we'll spend in the future trying to fit our code and dependencies
into the eventlet shaped hole we currently have.

I'm not as familiar with the code in other OpenStack projects, but from
what I have seen, they appear to be similar to Nova and are only weakly
tied to eventlet/greenlet.

JE


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo.db] PyMySQL review

2015-01-28 Thread Johannes Erdfelt
On Wed, Jan 28, 2015, Vishvananda Ishaya vishvana...@gmail.com wrote:
 On Jan 28, 2015, at 4:03 PM, Doug Hellmann d...@doughellmann.com wrote:
  I hope someone who was around at the time will chime in with more detail
  about why green threads were deemed better than regular threads, and I
  look forward to seeing your analysis of a change. There is already a
  thread-based executor in oslo.messaging, which *should* be usable in the
  applications when you remove eventlet.
 
 Threading was never really considered. The initial version tried to get a
 working api server up as quickly as possible and it used tonado. This was
 quickly replaced with twisted since tornado was really new at the time and
 had bugs. We then switched to eventlet when swift joined the party so we
 didn’t have multiple concurrency stacks.
 
 By the time someone came up with the idea of using different concurrency
 models for the api server and the backend services, we were already pretty
 far down the greenthread path.

Not sure if it helps more than this explanation, but there was a
blueprint and accompanying wiki page that explains the move from twisted
to eventlet:

https://blueprints.launchpad.net/nova/+spec/unified-service-architecture

https://wiki.openstack.org/wiki/UnifiedServiceArchitecture

JE


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] design question : green thread model

2015-01-28 Thread Johannes Erdfelt
On Wed, Jan 28, 2015, murali reddy muralimmre...@gmail.com wrote:
 I am trying to understand how a nova component can be run parallely on a
 host. From the developer reference documentation it seems to indicate that
 all the openstack services use green thread model of threading. Is it the
 only model of parallelism for all the components or multiple processes can
 be used for a nova service on a host. Does nova.service which seems to do
 os.fork is being used to fork multiple processes for a nova service?

Multiple processes are used in some places, for instance, nova-api can
fork multiple processes. Each process would also use greenthreads as
well.

However, most services don't use multiple processes (at least in Nova).

JE


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] design question : green thread model

2015-01-28 Thread Johannes Erdfelt
On Wed, Jan 28, 2015, murali reddy muralimmre...@gmail.com wrote:
 On hosts with multi-core processors, it does not seem optimal to run a
 single service instance with just green thread. I understand that on
 controller node, we can run one or more nova services but still it does not
 seem to utilize multi-core processors.
 
 Is it not a nova scaling concern?

It certainly depends on the service.

nova-compute isn't CPU limited for instance, so utilizing multiple cores
isn't necessary.

nova-scheduler generally isn't CPU limited either in our usage (at
Rackspace), but we use cells and as a result, we run multiple independent
nova-scheduler services.

If you've seen some scaling problems, I know the Nova team is open to
reports. In some cases, patches wouldn't be hard to develop to start
multiple processes, but no one has ever reported a need beyond nova-api.

JE


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Adding new features to Kilo and future releases - DB upgrades

2015-01-26 Thread Johannes Erdfelt
On Thu, Jan 22, 2015, Kekane, Abhishek abhishek.kek...@nttdata.com wrote:
 With online schema changes/No downtime DB upgrades things would be
 much lot easier for OpenStack deployments.
 Big kudos to Johannes who initiated this feature. But as a service
 provider, I'm curious to understand what is the development process
 of adding new features to Kilo and future releases once the online
 schema changes is in.
 
 
 1. Will the committer be responsible of adding new procedures of
 upgrading db with minimal or zero downtime? or the online schema
 changes framework itself will detect whatever db changes are required
 on its own and the decision to apply db changes online or offline
 will be left solely with the service provider?

The online schema change code will compare the running schema and the
model and figure out what changes are needed to make the running schema
match the model (it actually leverages alembic for most of this). (This
automates much of the work currently done in sqlalchemy-migrate
scripts)

The scheduling of changes into the three phases is handled completely
internally to the online schema change patch. It understands which
changes are semantically safe (that can be safely applied when nova is
running) and locking safe (so it doesn't block access to the table for a
long time).

Unless you are working on the code that implements the online schema
changes, then a developer need not know how it operates.

Developers just need to make changes to the model and write
sqlalchemy-migrate scripts as we have always required.

Eventually, developers will no longer need to write sqlalchemy-migrate
scripts. This is likely to be 1 or 2 cycles away (certainly not in
Kilo).

There will be some minor restrictions on what kind of schema changes
will be allowed. As an example column renames won't be allowed because
they appear as a delete/add and we'll potentially lose data. However,
this can be done as an explicit add/delete along with a data migration,
if it's needed. Same thing with some column type changes.

I'll clarify these in the patch when it's put up for review.

 2. Is it possible to predict how much time it would take to upgrade db
 (expand/migrate/contract phases) for adding a new column, constraint.
 For example, adding a new column with NULL constraint would take less
 time than adding a default value.

This is difficult to estimate. It varies on how fast the database server
is (CPU, disk I/O, etc), the current load of the database, the number of
rows in the table and the size of the data in the columns.

However, I can develop some relative estimates. For instance, adding an
index on MySQL 5.6 only acquires a lock very briefly and does the rest
of the work in the background. It still produces load, but it doesn't
block access to the table. This would be considered online safe and
scheduled to the expand phase. The CREATE INDEX would appear to finish
very quickly.

However, other changes could potentially require a table rewrite which
could be long if it's a large table (eg instance_system_metadata table),
but very short if the table only has a handful of rows (eg
instance_types table).

I've written a test suite which takes a variety of database software
(MySQL, PostgreSQL, etc), versions and storage engines (InnoDB, TokuDB,
etc) and does tests to figure out which changes are online safe (as far
as locking goes).

I will be using this data to make better decisions on scheduling of
operations to ensure the expand and contract phases don't cause any
problems.

I can also take that data and make it available somewhere as well
and/or possibly annotate the output from the --dry-run option to explain
why some operations are scheduled to migrate instead of the expand and
contract phases.

JE


__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [hacking] proposed rules drop for 1.0

2014-12-09 Thread Johannes Erdfelt
On Tue, Dec 09, 2014, Sean Dague s...@dague.net wrote:
 This check should run on any version of python and give the same
 results. It does not, because it queries python to know what's in stdlib
 vs. not.

Just to underscore that it's difficult to get right, I found out recently
that hacking doesn't do a great job of figuring out what is a standard
library.

I've installed some libraries in 'develop' mode and recent hacking
thinks they are standard libraries and complains about the order.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [hacking] proposed rules drop for 1.0

2014-12-09 Thread Johannes Erdfelt
On Tue, Dec 09, 2014, Sean Dague s...@dague.net wrote:
 I'd like to propose that for hacking 1.0 we drop 2 groups of rules entirely.
 
 1 - the entire H8* group. This doesn't function on python code, it
 functions on git commit message, which makes it tough to run locally. It
 also would be a reason to prevent us from not rerunning tests on commit
 message changes (something we could do after the next gerrit update).

One of the problems with the H8* tests is that it can reject a commit
message generated by git itself.

I had a 'git revert' rejected because the first line was too long :(

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Spring cleaning nova-core

2014-12-07 Thread Johannes Erdfelt
On Mon, Dec 08, 2014, Michael Still mi...@stillhq.com wrote:
 There are other things happening behind the scenes as well -- we have
 a veto process for current cores when we propose a new core. It has
 been made clear to me that several current core members believe we
 have reached the maximum effective size for core, and that they will
 therefore veto new additions. Therefore, we need to make room in core
 for people who are able to keep up with our review workload.

I've heard this before, but I've never understood this.

Can you (or someone else) elaborate on why they believe that there is an
upper limit on the size of nova-core and why that is the current size?

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] sqlalchemy-migrate vs alembic for new database

2014-12-05 Thread Johannes Erdfelt
On Fri, Dec 05, 2014, Andrew Laski andrew.la...@rackspace.com wrote:
 The cells v2 effort is going to be introducing a new database into
 Nova.  This has been an opportunity to rethink and approach a few
 things differently, including how we should handle migrations. There
 have been discussions for a long time now about switching over to
 alembic for migrations so I want to ask, should we start using
 alembic from the start for this new database?
 
 The question was first raised by Dan Smith on
 https://review.openstack.org/#/c/135424/
 
 I do have some concern about having two databases managed in two
 different ways, but if the details are well hidden behind a
 nova-manage command I'm not sure it will actually matter in
 practice.

This would be a good time for people to review my proposed spec:

https://review.openstack.org/#/c/102545/

Not only does it help operators but it also helps developers since all
they would need to do in the future is update the model and DDL
statements are generated based on comparing the running schema with the
model.

BTW, it uses Alembic under the hood for most of the heavy lifting.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][oslo][db][docs] RFC: drop support for libpq 9.1

2014-10-06 Thread Johannes Erdfelt
On Mon, Oct 06, 2014, Ihar Hrachyshka ihrac...@redhat.com wrote:
 Maybe it is indeed wasteful, I don't have numbers; though the fact is
 we don't allow any migrations for databases with any non utf8 tables
 as of [1]. The code was copied in multiple projects (Nova, Glance
 among other things). Also, the same check is present in oslo.db that
 is used by most projects now.
 
 Also we have migration rules in multiple projects that end up with
 converting all tables to utf8. F.e. it's done in Nova [2].
 
 So we already run against utf8 tables. Though we don't ever tell users
 to create their databases with utf8 as default charset, opening a can
 of worms. We also don't ever tell users to at least set use_unicode=0
 when running against MySQLdb (which is the default and the only
 supported MySQL driver as of Juno) to avoid significant performance
 drop [3] (same is true for oursql driver, but that one is at least not
 recommended to users thru official docs).

This is a situation where Openstack is frustratingly inconsistent.

While we don't provide any guidance about the default charset, Nova now
creates all tables with the utf8 charset and provided a migration[1] to
fix deployments done before this change.

The same cannot be said for Glance. It inherited the utf8 check, but
never provided a migration to fix deployments done before this change.
It still creates tables with no default charset leading to a situation
where you can deploy Glance with default values but then end up not
being able to run any future migrations.

Glance did have a flag to disable that check, but it was recently
removed[2] with no automated migrations to resolve earlier deployments
(like many of ours).

This frustratingly got approved and merged after my -1 and with no
explanation why they were doing this to operators. It felt like I was
getting the gerrit equivalent of a middle finger.

All of that said, I'm 100% for making all of the projects more
consistent and use utf8 (where it makes sense).

[1]: https://review.openstack.org/#/c/3946/
[2]: https://review.openstack.org/#/c/120002/

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] formally distinguish server desired state from actual state?

2014-10-01 Thread Johannes Erdfelt
On Wed, Oct 01, 2014, Chris Friesen chris.frie...@windriver.com wrote:
 Currently in nova we have the vm_state, which according to the
 code comments is supposed to represent a VM's current stable (not
 transition) state, or what the customer expect the VM to be.
 
 However, we then added in an ERROR state.  How does this possibly
 make sense given the above definition?  Which customer would ever
 expect the VM to be in an error state?
 
 Given this, I wonder whether it might make sense to formally
 distinguish between the expected/desired state (i.e. the state that
 the customer wants the VM to be in), and the actual state (i.e. the
 state that nova thinks the VM is in).
 
 This would more easily allow for recovery actions, since if the
 actual state changes to ERROR (or similar) we would still have the
 expected/desired state available for reference when trying to take
 recovery actions.
 
 Thoughts?

I'm happy you brought this up because I've had a similar proposal in the
bouncing around in the back of my head lately.

ERROR is a pet peeve of mine because it doesn't tell you the operational
state of the instance. It may be running or it may not be running. It
also ends up complicating logic quite a bit (we have a very ugly patch
to allow users to revert resizes in ERROR).

Also, in a few places we have to store vm_state off into instance
metadata (key 'old_vm_state') so it can be restored to the correct state
(for things like RESCUED). This is fairly ugly.

I've wanted to sit down and work through all of the different vm_state
transitions and figure out to make it all less confusing. I just haven't
had the time to do it yet :(

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [neutron][all] switch from mysqldb to another eventlet aware mysql client

2014-09-12 Thread Johannes Erdfelt
On Fri, Sep 12, 2014, Doug Hellmann d...@doughellmann.com wrote:
 I don’t think we will want to retroactively change the migration scripts
 (that’s not something we generally like to do),

We don't allow semantic changes to migration scripts since people who
have already run it won't get those changes. However, we haven't been
shy about fixing bugs that prevent the migration script from running
(which this change would probably fall into).

 so we should look at changes needed to make sqlalchemy-migrate deal with
 them (by ignoring them, or working around the errors, or whatever).

That said, I agree that sqlalchemy-migrate shouldn't be changing in a
non-backwards compatible way.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] [Spec freeze exception] Online Schema Changes

2014-07-18 Thread Johannes Erdfelt
I'm requestion a spec freeze exception for online schema changes.

https://review.openstack.org/102545

This work is being done to try to minimize the downtime as part of
upgrades. Database migrations have historically been a source of long
periods of downtime. The spec is an attempt to start optimizing this
part by allowing deployers to perform most schema changes online, while
Nova is running.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] fair standards for all hypervisor drivers

2014-07-17 Thread Johannes Erdfelt
On Thu, Jul 17, 2014, Daniel P. Berrange berra...@redhat.com wrote:
 On Wed, Jul 16, 2014 at 09:44:55AM -0700, Johannes Erdfelt wrote:
  So that means the libvirt driver will be a mix of tested and untested
  features, but only the tested code paths will be enabled by default?
  
  The gate not only tests code as it gets merged, it tests to make sure it
  doesn't get broken in the future by other changes.
  
  What happens when it comes time to bump the default version_cap in the
  future? It looks like there could potentially be a scramble to fix code
  that has been merged but doesn't work now that it's being tested. Which
  potentially further slows down development since now unrelated code
  needs to be fixed.
  
  This sounds like we're actively weakening the gate we currently have.
 
 If the gate has libvirt 1.2.2 and a feature is added to Nova that
 depends on libvirt 1.2.5, then the gate is already not testing that
 codepath since it lacks the libvirt version neccessary to test it.
 The version cap should not be changing that, it is just making it
 more explicit that it hasn't been tested

It kind of helps. It's still implicit in that you need to look at what
features are enabled at what version and determine if it is being
tested.

But the behavior is still broken since code is still getting merged that
isn't tested. Saying that is by design doesn't help the fact that
potentially broken code exists.

Also, this explanation doesn't answer my question about what happens
when the gate finally gets around to actually testing those potentially
broken code paths.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] fair standards for all hypervisor drivers

2014-07-17 Thread Johannes Erdfelt
On Thu, Jul 17, 2014, Russell Bryant rbry...@redhat.com wrote:
 On 07/17/2014 02:31 PM, Johannes Erdfelt wrote:
  It kind of helps. It's still implicit in that you need to look at what
  features are enabled at what version and determine if it is being
  tested.
  
  But the behavior is still broken since code is still getting merged that
  isn't tested. Saying that is by design doesn't help the fact that
  potentially broken code exists.
 
 Well, it may not be tested in our CI yet, but that doesn't mean it's not
 tested some other way, at least.

I'm skeptical. Unless it's tested continuously, it'll likely break at
some time.

We seem to be selectively choosing the continuous part of CI. I'd
understand if it was reluctantly because of immediate problems but
this reads like it's acceptable long-term too.

 I think there are some good ideas in other parts of this thread to look
 at how we can more reguarly rev libvirt in the gate to mitigate this.
 
 There's also been work going on to get Fedora enabled in the gate, which
 is a distro that regularly carries a much more recent version of libvirt
 (among other things), so that's another angle that may help.

That's an improvement, but I'm still not sure I understand what the
workflow will be for developers.

Do they need to now wait for Fedora to ship a new version of libvirt?
Fedora is likely to help the problem because of how quickly it generally
ships new packages and their release schedule but it would still hold
back some features?

  Also, this explanation doesn't answer my question about what happens
  when the gate finally gets around to actually testing those potentially
  broken code paths.
 
 I think we would just test out the bump and make sure it's working fine
 before it's enabled for every job.  That would keep potential breakage
 localized to people working on debugging/fixing it until it's ready to go.

The downside is that new features for libvirt could be held back by
needing to fix other unrelated features. This is certainly not a bigger
problem than users potentially running untested code simply because they
are on a newer version of libvirt.

I understand we have an immediate problem and I see the short-term value
in the libvirt version cap.

I try to look at the long-term and unless it's clear to me that a
solution is proposed to be short-term and there are some understood
trade-offs then I'll question the long-term implications of it.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] fair standards for all hypervisor drivers

2014-07-16 Thread Johannes Erdfelt
On Wed, Jul 16, 2014, Mark McLoughlin mar...@redhat.com wrote:
 No, there are features or code paths of the libvirt 1.2.5+ driver that
 aren't as well tested as the class A designation implies. And we have
 a proposal to make sure these aren't used by default:
 
   https://review.openstack.org/107119
 
 i.e. to stray off the class A path, an operator has to opt into it by
 changing a configuration option that explains they will be enabling code
 paths which aren't yet tested upstream.

So that means the libvirt driver will be a mix of tested and untested
features, but only the tested code paths will be enabled by default?

The gate not only tests code as it gets merged, it tests to make sure it
doesn't get broken in the future by other changes.

What happens when it comes time to bump the default version_cap in the
future? It looks like there could potentially be a scramble to fix code
that has been merged but doesn't work now that it's being tested. Which
potentially further slows down development since now unrelated code
needs to be fixed.

This sounds like we're actively weakening the gate we currently have.

 However, not everything is tested now, nor is the tests we have
 foolproof. When you consider the number of configuration options we
 have, the supported distros, the ranges of library versions we claim to
 support, etc., etc. I don't think we can ever get to an everything is
 tested point.
 
 In the absence of that, I think we should aim to be more clear what *is*
 tested. The config option I suggest does that, which is a big part of
 its merit IMHO.

I like the sound of this especially since it's not clear right now at
all.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova][vmware] Convert to rescue by adding the rescue image and booting from it

2014-07-14 Thread Johannes Erdfelt
On Mon, Jul 14, 2014, Daniel P. Berrange berra...@redhat.com wrote:
 I think that I'd probably say there is an expectation that the rescue
 image will be different from the primary image the OS was booted from.

So every image would now need a corresponding rescue image?

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-13 Thread Johannes Erdfelt
On Fri, Jun 13, 2014, Russell Bryant rbry...@redhat.com wrote:
 On 06/13/2014 09:22 AM, Day, Phil wrote:
  I guess the question I’m really asking here is:  “Since we know resize
  down won’t work in all cases, and the failure if it does occur will be
  hard for the user to detect, should we just block it at the API layer
  and be consistent across all Hypervisors ?”
 
 +1 for consistency.

+1 for having written the code for the xenapi driver and not wishing
that on anyone else :)

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Do any hyperviors allow disk reduction as part of resize ?

2014-06-13 Thread Johannes Erdfelt
On Fri, Jun 13, 2014, Andrew Laski andrew.la...@rackspace.com wrote:
 
 On 06/13/2014 10:53 AM, Johannes Erdfelt wrote:
 On Fri, Jun 13, 2014, Russell Bryant rbry...@redhat.com wrote:
 On 06/13/2014 09:22 AM, Day, Phil wrote:
 I guess the question I’m really asking here is:  “Since we know resize
 down won’t work in all cases, and the failure if it does occur will be
 hard for the user to detect, should we just block it at the API layer
 and be consistent across all Hypervisors ?”
 +1 for consistency.
 +1 for having written the code for the xenapi driver and not wishing
 that on anyone else :)
 
 I'm also +1.  But this is a feature that's offered by some cloud
 providers so removing it may cause some pain even with a deprecation
 cycle.

Yeah, that's the hard part about this.

On the flip side, supporting it going forward will be a pain too.

The xenapi implementation only works on ext[234] filesystems. That rules
out *BSD, Windows and Linux distributions that don't use ext[234]. RHEL7
defaults to XFS for instance.

In some cases, we couldn't even support resize down (XFS doesn't support
it).

That is to go along with all of the other problems with resize down as
it currently stands.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Promoting healing script to scheme migration script?

2014-06-09 Thread Johannes Erdfelt
On Mon, Jun 09, 2014, Jakub Libosvar libos...@redhat.com wrote:
 I'd like to get some opinions on following idea:
 
 Because currently we have (thanks to Ann) WIP of healing script capable
 of changing database scheme by comparing tables in the database to
 models in current codebase, I started to think whether it could be used
 generally to db upgrades instead of generating migration scripts.

Do you have a link to these healing scripts?

 If I understand correctly the purpose of migration scripts used to be to:
 1) separate changes according plugins
 2) upgrade database scheme
 3) migrate data according the changed scheme
 
 Since we dropped on conditional migrations, we can cross out no.1).
 The healing script is capable of doing no.2) without any manual effort
 and without adding migration script.
 
 That means if we will decide to go along with using script for updating
 database scheme, migration scripts will be needed only for data
 migration (no.3)) which are from my experience rare.
 
 Also other benefit would be that we won't need to store all database
 models from Icehouse release which we probably will need in case we want
 to heal database in order to achieve idempotent Icehouse database
 scheme with Juno codebase.
 
 Please share your ideas and reveal potential glitches in the proposal.

I'm actually working on a project to implement declarative schema
migrations for Nova using the existing model we currently maintain.

The main goals for our project are to reduce the amount of work
maintaining the database schema but also to reduce the amount of
downtime during software upgrades by doing schema changes online (where
possible).

I'd like to see what other haves done and are working on the future so
we don't unnecessarily duplicate work :)

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Gate and Skipped Tests

2014-05-23 Thread Johannes Erdfelt
On Fri, May 23, 2014, Rick Harris rconradhar...@gmail.com wrote:
 On Thu, May 22, 2014 at 7:31 PM, Johannes Erdfelt johan...@erdfelt.comwrote:
 
  I noticed recently that some tests are being skipped in the Nova gate.
 
  Some will always be skipped, but others are conditional.
 
 I'd like to hear a bit more about why some will always be skipped.
 
 If it's a Python 2.6 vs Python 2.7 thing, perhaps we should forgo the
 conveniences of 2.7 in places so that we can avoid skipping _any_ tests.

As an example of a test skipped on Python 2.6:

image/test_glance.py:TestGlanceApiServers.test_get_ipv6_api_servers

# Python 2.6 can not parse ipv6 address correctly
@testtools.skipIf(sys.version_info  (2, 7), py27 or greater only)

Python 2.6 certainly has ipv6 support (just tested it) so I'm guessing
this is because of a third-party library or something? Could maybe be a
problem parsing IPv6 addresses from URLs (so in httplib?).

db/test_db_api.py:ArchiveTestCase.test_archive_deleted_rows_fk_constraint

This is an indirect Python 2.6 problem. The version of sqlite3 shipping
with Python 2.6 doesn't support foreign key constraints reliably (from
the comments).


There are some tests that get skipped if the host doesn't have IPv6
support at all:

test_service.py:TestWSGIService.test_service_random_port_with_ipv6


Some are tests are skipped only on Mac OS X. Is that a supported
configuration?

virt/test_virt_disk_vfs_localfs.py:VirtDiskVFSLocalFSTestPaths.test_check_safe_path
virt/test_virt_disk_vfs_localfs.py:VirtDiskVFSLocalFSTestPaths.test_check_unsafe_path


And of course the ZooKeeper example I mentioned:

servicegroup/test_zk_driver.py:ZKServiceGroupTestCase.setUp


There are some tests which are skipped unconditionally. These are
probably not as big of a risk since the behavior would be identical if
the test just wasn't there at all.

Some tests are skipped because of known bugs:

db/test_db_api.py:InstanceSystemMetadataTestCase.test_instance_system_metadata_update_nonexistent

This is because foreign key constraints are missing in sqlite so the
test can erroneously pass with that setup.


Some tests are skipped because of known missing features:

compute/test_compute_cells.py:CellsComputeAPITestCase.test_instance_metadata
compute/test_compute_cells.py:CellsComputeAPITestCase.test_evacuate


One skipped test is a little confusing. This test seems to be skipped
because it was too hard to write?

virt/test_virt_drivers.py:LibvirtConnTestCase.test_migrate_disk_and_power_off

  Any opinions on this?
 
 I'm in favor of asserting num_skipped_tests == 0 at the gate.
 
 A gate with a side-door that's always open isn't much of a gate.

That's a much better way of wording it :)

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] Gate and Skipped Tests

2014-05-22 Thread Johannes Erdfelt
I noticed recently that some tests are being skipped in the Nova gate.

Some will always be skipped, but others are conditional.

In particular the ZooKeeper driver tests are being skipped because an
underlying python module is missing.

It seems to me that we should want no tests to be conditionally skipped
in the gate. This could lead to fragile behavior where an underlying
environmental problem could cause tests to be erroneously skipped and
broken code could get merged.

Any opinions on this?

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] Logging exceptions and Python 3

2014-05-21 Thread Johannes Erdfelt
On Wed, May 21, 2014, John Dennis jden...@redhat.com wrote:
 But that's a bug in the logging implementation. Are we supposed to write
 perverse code just to avoid coding mistakes in other modules? Why not
 get the fundamental problem fixed?

It has been fixed, by making Python 3 :)

This is a problem in the Python 2 standard library.

I agree it kind of sucks. We've traditionally just worked around it, but
monkey patching might be a solution if the work arounds are onerous.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Neutron] Consistency between models and migrations

2014-05-19 Thread Johannes Erdfelt
On Tue, May 20, 2014, Collins, Sean sean_colli...@cable.comcast.com wrote:
 I've been looking at two reviews that Ann Kamyshnikova has proposed
 
 https://review.openstack.org/#/c/82073/
 
 https://review.openstack.org/#/c/80518/
 
 I think the changes are fundamentally a Good Thing™  - they appear to
 reduce the differences between the database models and their
 corresponding migrations – as well as fixing differences in the
 generated DDL between Postgres and MySQL.
 
 The only thing I'm concerned about, is how to prevent these
 inconsistencies from sneaking into the codebase in the future. The
 one review that fixes ForeignKey constraints that are missing a name
 argument which ends up failing to create indexes in Postgres – I can
 see myself repeating that mistake, it's very subtle.
 
 Should we have some sort of HACKING for database models and
 migrations? Are these problems subtle enough that they warrant
 changes to SQLAlchemy/Alembic?

On the Nova side of things, there has been similar concerns.

There is a nova-spec that is proposing adding a unit test to check the
schema versus the model:

https://review.openstack.org/#/c/85325/

This should work, but I think the underlying problem is DRY based. We
should not need to declare a schema in a model and then a set of
imperative tasks to get to that point. All too often they get
unsynchronized.

I informally proposed a different solution, moving schema migrations to
a declarative model. I wrote a proof of concept to show how something
like this would work:

https://github.com/jerdfelt/muscle

We already have a model written (but need some fixes to make it accurate
wrt to existing migrations), we should be able to emit ALTER TABLE
statements based on the existing schema to bring it into line with the
model.

This also has the added benefit of allowing certain schema migrations to
be done online, while services are still running. This can significantly
reduce downtime during deploys (a big concern for large deployments of
Nova).

There are some corner cases that do cause problems (renaming columns,
changing column types, etc). Those can either remain as traditional
migrations and/or discouraged.

Data migrations would still remain with sqlalchemy-migrate/alembic, but
there have some proposals about solving that problem too.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] Logging exceptions and Python 3

2014-05-17 Thread Johannes Erdfelt
On Fri, May 16, 2014, Victor Stinner victor.stin...@enovance.com wrote:
 See my documentation:
 https://wiki.openstack.org/wiki/Python3#logging_module_and_format_exceptions
 
  six.text_type(exc): always use Unicode. It may raise unicode error 
 depending 
 on the exception, be careful. Example of such error in python 2: 
 unicode(Exception(nonascii:\xe9)). 
 
 unicode(exc) works with such exception classes:
 ---
 class MyException1(Exception):
 pass
 
 exc = MyException1()
 exc.message = u\u20ac
 unicode(exc) #ok
 
 class MyException2(Exception):
 def __unicode__(self):
 return u\20ac
 
 exc = MyException2()
 unicode(exc) #ok
 ---
 
 If we want to format an exception as Unicode, we need a function trying 
 unicode(), or use str() and then guess the encoding. It means adding a new 
 safe function to Olso to format an exception.

This is unnecessarily complicated.

Strings should be decoded to unicode as soon as possible. When a request
is read from a user, it should be decoded to a unicode type. When it's
read from a file, it should be decoded to a unicode type.

Nothing should be stored internally as an encoded string.

If a third party library raises exceptions with strings in an encoding
other than ASCII, they should be shot :)

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] Logging exceptions and Python 3

2014-05-17 Thread Johannes Erdfelt
On Fri, May 16, 2014, Igor Kalnitsky ikalnit...@mirantis.com wrote:
  unicode(exc) (or six.text_type(exc)) works for all exceptions, built-in or
 custom.
 
 That's too much of a statement. Sometimes exceptions implement their own
 __str__ / __unitcode__
 methods, that return too many rubbish information or not enough. What do
 you do in that case?

I don't understand the problem.

What are you expecting from unicode(exc)? What exceptions don't meet
that expectation?

Using str(exc) or unicode(exc) is the standard Python convention to get
a useful string out of an exception. This is what the traceback module
does (at least in Python 2.7) for the last line in the traceback output.

It tries str first, then unicode if str fails. If it uses unicode, then
it backslash escapes it back to ASCII.

The behavior to try str first and then unicode second appears to be
because of legacy reasons.

I'm not entirely sure why it backslash escapes unicode back to ASCII.
Maybe to avoid a possible second exception when printing it?

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] Logging exceptions and Python 3

2014-05-16 Thread Johannes Erdfelt
On Fri, May 16, 2014, Igor Kalnitsky ikalnit...@mirantis.com wrote:
  According to http://legacy.python.org/dev/peps/pep-0352/ the message
  attribute of BaseException is deprecated since Python 2.6 and was
  dropped in Python 3.0.
 
 Some projects have custom exception hierarchy, with strictly defined
 attributes (e.g. message, or something else). In a previous mail, I
 mean exactly that case, not the case with a built-in exceptions.

That's a fragile assumption to make.

unicode(exc) (or six.text_type(exc)) works for all exceptions, built-in
or custom. I don't see the reason why it's being avoided.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo] Logging exceptions and Python 3

2014-05-16 Thread Johannes Erdfelt
On Thu, May 15, 2014, Victor Stinner victor.stin...@enovance.com wrote:
 I'm trying to define some rules to port OpenStack code to Python 3. I just 
 added a section in the Port Python 2 code to Python 3 about formatting 
 exceptions and the logging module:
 https://wiki.openstack.org/wiki/Python3#logging_module_and_format_exceptions
 
 The problem is that I don't know what is the best syntax to log exceptions. 
 Some projects convert the exception to Unicode, others use str(). I also saw 
 six.u(str(exc)) which is wrong IMO (it can raise unicode error if the message 
 contains a non-ASCII character).
 
 IMO the safest option is to use str(exc). For example, use 
 LOG.debug(str(exc)).
 
 Is there a reason to log the exception as Unicode on Python 2?

Because the exception uses unicode characters?

This isn't common, but it does happen and a lot of code in nova uses
unicode(exc) as a result.

Using str(exc) is bad because it will fail with any exception with
unicode characters.

six.u(exc) is bad too since it's only for text literals. It's mostly
obsolete too since Python 3.3 has the u prefix again and I don't think
any OpenStack projects target 3.0-3.2

six.text_type(exc) is the recommended solution.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][db][performance] Proposal: Get rid of soft deletion (step by step)

2014-03-12 Thread Johannes Erdfelt
On Wed, Mar 12, 2014, CARVER, PAUL pc2...@att.com wrote:
 I have personally witnessed someone (honestly, not me) select Terminate
 Instance when they meant Reboot Instance and that mistake is way too
 easy. I'm not sure if it was a brain mistake or mere slip of the mouse,
 but it's enough to make people really nervous in a production
 environment. If there's one thing you can count on about human beings,
 it's that they'll make mistakes sooner or later. Any system that
 assumes infallible human beings as a design criteria is making an
 invalid assumption.

I think there might be some confusion about what soft-delete we're
talking about.

Nova has two orthogonal soft-delete features:
1) Database rows are never deleted from the database. They are just
   marked as deleted via a column. This is unexposed to users and is an
   implementation detail in the current code.
2) Instance deletion can be deferred until a later time. This is called
   deferred-delete and soft-delete in the code. If the feature is
   enabled and the instance that has't been reclaimed, it can be
   restored with the 'nova restore' command.

This thread is about the database soft-delete and not the instance
soft-delete.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [all][db][performance] Proposal: Get rid of soft deletion (step by step)

2014-03-11 Thread Johannes Erdfelt
On Tue, Mar 11, 2014, Mike Wilson geekinu...@gmail.com wrote:
 Undeleting things is an important use case in my opinion. We do this in our
 environment on a regular basis. In that light I'm not sure that it would be
 appropriate just to log the deletion and git rid of the row. I would like
 to see it go to an archival table where it is easily restored.

I'm curious, what are you undeleting and why?

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] changing old migrations is verboten

2013-11-01 Thread Johannes Erdfelt
On Fri, Nov 01, 2013, Sean Dague s...@dague.net wrote:
 It's trading one source of bugs for another. I'd love to say we can
 have our cake and eat it to, but we really can't. And I very much
 fall on the side of getting migrations is hard, updating past
 migrations without ever forking the universe is really really hard,
 and we've completely screwed it up in the past, so lets not do it.

I understand what you're saying, but if the result of it is that we're
never going to touch old migrations, we're going to slowly build
technical debt.

I don't think it's an acceptable solution to throw up our hands and deal
with the pain.

We need to come up with a solution that allows us to stay agile while
also ensuring we don't break things.

 So I'm going to call a straight BS on that. In at least one of the
 cases columns were shortened from 256 to 255. In the average case
 would that be an issue? Probably not. However that's a truncation,
 and a completely working system at 256 length for those fields could
 go to non working with data truncation. Data loads matter. And we
 can't assume anything about the data in those fields that isn't
 enforced by the DB schema itself.

I assume this is the review you're talking about?

https://review.openstack.org/#/c/53471/3

FWIW, the old migrations *are* functionally identical. Those strings are
still 256 characters long.

It's the new migration that truncates data.

That said, I'm not sure I see the value in this particular cleanup
considering the fact it does truncate data (even if it's unlikely to
cause problems).

 I've watched us mess this up multiple times in the past when we were
 *sure* it was good. And as has been noticed recently, one of the
 collapses changes a fk name (by accident), which broke upgrades to
 havana for a whole class of people.
 
 So I think that we really should put a moratorium on touching past
 migrations until there is some sort of automatic validation that the
 new and old path are the same, with sufficiently complicated data
 that pushes the limits of those fields.
 
 Manual inspection by one person that their environment looks fine
 has never been a sufficient threshold for merging code.

I can get completely on board with that.

Does that mean you're softening your stance that migrations should never
be touched?

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] changing old migrations is verboten

2013-10-31 Thread Johannes Erdfelt
On Thu, Oct 31, 2013, Sean Dague s...@dague.net wrote:
 So there is a series of patches starting with -
 https://review.openstack.org/#/c/53417/ that go back and radically
 change existing migration files.
 
 This is really a no-no, unless there is a critical bug fix that
 absolutely requires it. Changing past migrations should be
 considered with the same level of weight as an N-2 backport, only
 done when there is huge upside to the change.
 
 I've -2ed the first 2 patches in the series, though that review
 applies to all of them (I figured a mailing list thread was probably
 more useful than -2ing everything in the series).
 
 There needs to be really solid discussion about the trade offs here
 before contemplating something as dangerous as this.

The most important thing for DB migrations is that they remain
functionality identical.

Historically we have allowed many changes to DB migrations that kept
them functionally identical to how they were before.

Looking through the commit history, here's a sampling of changes:

- _ was no longer monkey patched, necessitating a new import added
- fix bugs causing testing problems
- change copyright headers
- remove unused code (creating logger, imports, etc)
- fix bugs causing the migrations to fail to function (on PostgreSQL,
  downgrade bugs, etc)
- style changes (removing use of locals(), whitespace, etc)
- make migrations faster
- add comments to clarify code
- improve compatibility with newer versions of SQLAlchemy

The reviews you're referencing seem to fall into what we have
historically allowed.

That said, I do agree there needs to be a higher burden of proof that
the change being made is functionally identical to before.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Does DB schema hygiene warrant long migrations?

2013-10-24 Thread Johannes Erdfelt
On Fri, Oct 25, 2013, Michael Still mi...@stillhq.com wrote:
 Because I am a grumpy old man I have just -2'ed
 https://review.openstack.org/#/c/39685/ and I wanted to explain my
 rationale. Mostly I am hoping for a consensus to form -- if I am wrong
 then I'll happy remove my vote from this patch.
 
 This patch does the reasonably sensible thing of converting two
 columns from being text to varchar, which reduces their expense to the
 database. Given the data stored is already of limited length, it
 doesn't impact our functionality at all either.
 
 However, when I run it with medium sized (30 million instances)
 databases, the change does cause a 10 minute downtime. I don't
 personally think the change is worth such a large outage, but perhaps
 everyone else disagrees.

I'm not sure how you could have 30 million instances. That's a lot of
hardware! :)

However, in our Rackspace sized deploys (less than 30 million
instances), we've seen many migrations take longer than 10 minutes.

DB migrations are one of the biggest problems we've been facing lately.
Especially since a lot of migrations have been done over the past number
of months ended up causing a lot of pain considering the value they
bring.

For instance, migration 185 was particularly painful. It only renamed
the indexes, but it required rebuilding them. This took a long time for
such a simple task.

So I'm very interested in figuring out some sort of solution that makes
database migrations much less painful.

That said, I'm hesitant to say that cleanups like these shouldn't be
done. At a certain point we'll build a significant amount of technical
debt around the database that we're afraid to touch.

 PS: I could see a more complicated approach where we did these changes
 in flight by adding columns, using a periodic task to copy data to
 the new columns, and then dropping the old. That's a lot more
 complicated to implement though.

You mean an expand/contract style of migrations?

It's been discussed at previous summits, but it's a lot of work.

It's also at the mercy of the underlying database engine. For instance,
MySQL (depending the version and the underlying database engine) will
recreate the table when adding columns. This will grab a lock and take
a long time.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Does DB schema hygiene warrant long migrations?

2013-10-24 Thread Johannes Erdfelt
On Fri, Oct 25, 2013, Michael Still mi...@stillhq.com wrote:
 On Fri, Oct 25, 2013 at 8:19 AM, Johannes Erdfelt johan...@erdfelt.com 
 wrote:
  On Fri, Oct 25, 2013, Michael Still mi...@stillhq.com wrote:
 
  However, when I run it with medium sized (30 million instances)
  databases, the change does cause a 10 minute downtime. I don't
  personally think the change is worth such a large outage, but perhaps
  everyone else disagrees.
 
  I'm not sure how you could have 30 million instances. That's a lot of
  hardware! :)
 
 This has come up a couple of times on this thread, so I want to
 reinforce -- that database is a real user database. There are users
 out there _right_now_ with 30 million rows in their instance tables
 and using nova quite happily. Now, not all those instances are
 _running_, but they're still in the table.

Why no pruning?

The easiest way to reduce the amount of time migrations take to run is
to reduce the amount of rows that need to be migrated.

The amount of unnecessary data in tables has been steadily increasing.
I'm looking at you reservations table.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Frustrations with review wait times

2013-08-28 Thread Johannes Erdfelt
On Wed, Aug 28, 2013, Russell Bryant rbry...@redhat.com wrote:
 On 08/28/2013 12:25 PM, Davanum Srinivas wrote:
  If each little group had at least one active Nova core member, i think
  it would speed things up way faster IMHO. 
 
 Agreed, in theory.  However, we should not add someone just for the sake
 of having someone on the team from a certain area.  They need to be held
 to the same standards as the rest of the team.

Do you mean the nova-core standards?

I had a soft understanding that nova-core members were trusted to give
+2 and -2 reviews and that they actually needed to do reviews.

I did a quick search and didn't find anything more than that, but maybe
I missed a web page somewhere.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Glance] Replacing Glance DB code to Oslo DB code.

2013-08-20 Thread Johannes Erdfelt
On Tue, Aug 20, 2013, Flavio Percoco fla...@redhat.com wrote:
 There are a couple of things that would worry me about an hypothetic
 support for NoSQL but I guess one that I'd consider very critical is
 migrations. Some could argue asking whether we'd really need them or
 not  - when talking about NoSQL databases - but we do. Using a
 schemaless database wouldn't mean we don't have a schema. Migrations
 are not trivial for some NoSQL databases, plus, this would mean
 drivers, most probably, would have to have their own implementation.

Migrations aren't always about the schema. Take migrations 015 and 017
in glance for instance. They migrate data by fixing the URI and making
sure it's quoted correctly. The schema doesn't change, but the data
does.

This shares many of the same practical problems that schema migrations
have and would apply to NoSQL databases.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] Work around DB in OpenStack (Oslo, Nova, Cinder, Glance)

2013-07-03 Thread Johannes Erdfelt
On Wed, Jul 03, 2013, Michael Still mi...@stillhq.com wrote:
 On Wed, Jul 3, 2013 at 3:50 AM, Boris Pavlovic bo...@pavlovic.me wrote:
 
  Question:
Why we should put in oslo slqlalchemy-migrate monkey patches, when we are
  planing to switch to alembic?
 
  Answer:
 If we don’t put in oslo sqlalchemy-migrate monkey patches. We won't be
  able to work on 7 point at all until 8 and 10 points will be implemented in
  every project. Also work around 8 point is not finished, so we are not able
  to implement 10 points in any of project. So this blocks almost all work in
  all projects. I think that these 100-200 lines of code are not so big price
  for saving few cycles of time.
 
 We've talked in the past (Folsom summit?) about alembic, but I'm not
 aware of anyone who is actually working on it. Is someone working on
 moving us to alembic? If not, it seems unfair to block database work
 on something no one is actually working on.

I've started working on a non-alembic migration path that was discussed
at the Grizzly summit.

While alembic is better than sqlalchemy-migrate, it still requires long
downtimes when some migrations are run. We discussed moving to an
expand/contract cycle where migrations add new columns, allow migrations
to slowly (relatively speaking) migrate data over, then (possibly) remove
any old columns.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [oslo.config] Config files overriding CLI: The path of most surprise.

2013-07-01 Thread Johannes Erdfelt
On Mon, Jul 01, 2013, Clint Byrum cl...@fewbar.com wrote:
 I am writing today to challenge that notion, and also to suggest that even
 if that is the case, it is inappropriate to have oslo.config operate in
 such a profoundly different manner than basically any other config library
 or system software in general use. CLI options are _for config files_
 and if packagers are shipping configurations in systemd unit files,
 upstart jobs, or sysvinits, they are doing so to control the concerns
 of that particular invocation of whatever command they are running,
 and not to configure the software entirely.

I completely agree and I even replied to the mail you've referenced
expressing the same concern at the time.

I'm sorry I didn't push harder at the time to fix this trap that has
been unnecessarily set up.

If no one beats me to it by the end of the week, I'll put up a review to
fix this.

JE


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev