Re: [openstack-dev] Work around DB in OpenStack (Oslo, Nova, Cinder, Glance)
On Wed, 2013-07-10 at 14:14 -0600, John Griffith wrote: Given that Cinder doesn't have anybody actively engaged in this other than what's being proposed and worked on by Boris and folks, we'd be a willing candidate for most of these changes, particularly if they're accepted in Nova to begin with. The question of having it in oslo-incubator or not, I think ultimately that's likely to be the best thing, but as is evident by this thread it seems there are a number of things that are going to have to be sorted before that happens, and I'm not convinced that move things to OSLO first then fix is the right answer. In my opinion things should be pretty solid before they go into the OSLO repo, but that's just my 2 cents. AS is evident by the approval of the BP's in Cinder and the reviews on the patches that have been submitted thus far Cinder is fine going the direction/implementations that have been proposed by Boris. I would like to see the debate around the archiving strategy and use of alembic settled, but regardless on the Cinder side I would like to move forward and make progress and as there's no other real effort to move forward with improving the DB code in Cinder (which I think is needed and very valuable) I'm fine with most of what's being proposed. My conclusion from that (admittedly based on limited understanding) would be that everything Boris is proposing makes sense to copy from Nova to oslo-incubator so Cinder can re-use it, with the exception of the DB archiving strategy. i.e. we'd improve Nova's DB archiving strategy before having Cinder adopt it. Cheers, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Work around DB in OpenStack (Oslo, Nova, Cinder, Glance)
Mark, John, Nikola, Current in oslo we would like to put only 2 functions: 1) generic method for creating shadow table 2) generic method that the columns are same in shadow and main table So migration that adds shadow table could be done after all other works, when we finish improving of db-archiving utils (that moves deleted rows to shadow tables), to avoid problems that noticed Nikola. These 2 functions won't be affected and will be used in future in cinder, glance and they are already used in Nova. So I don't see any problem to push it into oslo at this moment. Best regards, Boris Pavlovic On Thu, Jul 11, 2013 at 11:25 AM, Mark McLoughlin mar...@redhat.com wrote: On Wed, 2013-07-10 at 14:14 -0600, John Griffith wrote: Given that Cinder doesn't have anybody actively engaged in this other than what's being proposed and worked on by Boris and folks, we'd be a willing candidate for most of these changes, particularly if they're accepted in Nova to begin with. The question of having it in oslo-incubator or not, I think ultimately that's likely to be the best thing, but as is evident by this thread it seems there are a number of things that are going to have to be sorted before that happens, and I'm not convinced that move things to OSLO first then fix is the right answer. In my opinion things should be pretty solid before they go into the OSLO repo, but that's just my 2 cents. AS is evident by the approval of the BP's in Cinder and the reviews on the patches that have been submitted thus far Cinder is fine going the direction/implementations that have been proposed by Boris. I would like to see the debate around the archiving strategy and use of alembic settled, but regardless on the Cinder side I would like to move forward and make progress and as there's no other real effort to move forward with improving the DB code in Cinder (which I think is needed and very valuable) I'm fine with most of what's being proposed. My conclusion from that (admittedly based on limited understanding) would be that everything Boris is proposing makes sense to copy from Nova to oslo-incubator so Cinder can re-use it, with the exception of the DB archiving strategy. i.e. we'd improve Nova's DB archiving strategy before having Cinder adopt it. Cheers, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Work around DB in OpenStack (Oslo, Nova, Cinder, Glance)
On Mon, 2013-07-08 at 14:15 +0200, Nikola Đipanov wrote: On 05/07/13 14:26, Boris Pavlovic wrote: Hi all, I would like to explain very high level steps of our work: 1) Sync work with DB in all projects (We have what we have, let it be in one place) 2) Refactor work with DB in one place (not independently in all projects) So I understand that our code around DB is not ideal, but let it be in one place at first. This is fine in principle, however I don't think we should push it without considering the details (where the devil is apparently). I am arguing that DB archiving should be re-done and is broken conceptually (example below), and I think it would be suboptimal (to say the least) to get it everywhere first and then fix it. Just saying a hand-wavy yeah, but once it's in Oslo we can fix it is wrong - especially for functionality that is younger than the time it will likely take it to 'graduate' Oslo. I'm not following this DB archiving debate closely enough to take a position either way, but I think what you're really arguing is that no other project should adopt this approach to DB archiving. I'm fine with saying that it shouldn't move into oslo-incubator if it will only be used in Nova. So - the debate to have is which projects are proposing to adopt this DB archiving strategy and whether it makes sense for them to adopt it as is and fix it up later, or adopt an entirely different approach. Cheers, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Work around DB in OpenStack (Oslo, Nova, Cinder, Glance)
Hi Mark, Nikola, David Our work is not just in case of unifying. It improves the situation in all project (not only in Nova). I would like to say my opinion about DB Archiving also ;) Let start from the problem, abstract solution, current solution, and why this solution is ok. *) Problem. Records from DB are not deleted at all, so our DB will die. *) Abstract solution We should somehow remove old records, I see only one solution, create shadow tables and have a utilities that are smart and could remove data in such way, that shadow and main table are absolutly independent. *) Current solution 1) Create shadow tables 2) Simple utils that move from table to shadow table deleted records *) Problems in current solution. If we just move deleted records to shadow table we have to do all joins (like in Nikola's migration). So the problem is not in approach of shadow tables, problem is in current utils that are not enough smart. And in oslo there is only code (that allows to create_shadow table and that check that shadow tables and main are synced) And one more nit such migrations (as made Nikola) are pretty rare. So I don't see any reason to block this DB Archiving code in oslo and block this approach. It could be improved not replaced. More than we are ready to improve it. Best regards, Boris Pavlovic On Tue, Jul 9, 2013 at 3:05 PM, Mark McLoughlin mar...@redhat.com wrote: On Mon, 2013-07-08 at 14:15 +0200, Nikola Đipanov wrote: On 05/07/13 14:26, Boris Pavlovic wrote: Hi all, I would like to explain very high level steps of our work: 1) Sync work with DB in all projects (We have what we have, let it be in one place) 2) Refactor work with DB in one place (not independently in all projects) So I understand that our code around DB is not ideal, but let it be in one place at first. This is fine in principle, however I don't think we should push it without considering the details (where the devil is apparently). I am arguing that DB archiving should be re-done and is broken conceptually (example below), and I think it would be suboptimal (to say the least) to get it everywhere first and then fix it. Just saying a hand-wavy yeah, but once it's in Oslo we can fix it is wrong - especially for functionality that is younger than the time it will likely take it to 'graduate' Oslo. I'm not following this DB archiving debate closely enough to take a position either way, but I think what you're really arguing is that no other project should adopt this approach to DB archiving. I'm fine with saying that it shouldn't move into oslo-incubator if it will only be used in Nova. So - the debate to have is which projects are proposing to adopt this DB archiving strategy and whether it makes sense for them to adopt it as is and fix it up later, or adopt an entirely different approach. Cheers, Mark. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Work around DB in OpenStack (Oslo, Nova, Cinder, Glance)
On 05/07/13 14:26, Boris Pavlovic wrote: Hi all, I would like to explain very high level steps of our work: 1) Sync work with DB in all projects (We have what we have, let it be in one place) 2) Refactor work with DB in one place (not independently in all projects) So I understand that our code around DB is not ideal, but let it be in one place at first. This is fine in principle, however I don't think we should push it without considering the details (where the devil is apparently). I am arguing that DB archiving should be re-done and is broken conceptually (example below), and I think it would be suboptimal (to say the least) to get it everywhere first and then fix it. Just saying a hand-wavy yeah, but once it's in Oslo we can fix it is wrong - especially for functionality that is younger than the time it will likely take it to 'graduate' Oslo. -- About DB archiving. -- Let me describe how it works for non familiar contributors: For each table (that have columns, indexes, unique constraints, fk and etc) we have shadow table that have only columns (without indexes, unique constraints, fk..) And then we have utility that makes next things: move from original table records (that are marked as deleted) to shadow This was done by David Ripton in Nova in Grizzly. - After a few months I found that there are tons of migrations for original table and there is no migration for shadow table. And implement this BP https://blueprints.launchpad.net/nova/+spec/db-improve-archiving that makes next: a) sync shadow tables with original b) add test that checks that: 1) for each original table we have shadow 2) we don't have extra shadow tables 3) shadow tables have same columns as original Why is this so important: 1) If shadow and original table are not synced there could be 2 results after shadow util was ran: a) it will fail b) (worst) it will break data in shadow table -- Also there is no exponential growth of JOINs when we are using shadow tables: In migrations we should: a) Do the same actions on columns (drop, alter) in main and shadow b) Do the same actions on tables (create/drop/rename) c) Do the same actions on data in Tables So you are doing separately actions on Main tables and Shadow tables, but after migration our tables should be synced. And it is easier to make the same actions 2 times on main and shadow table in one migration then in separated migrations. This is only true if you have one table with no relations that need to be considered. Here is an example of when it gets tricky - Say you have a table T1 and a migration that adds a column c1 that relies on some data from table T2 and T1 has a FK that points to T2. And say for the sake of argument that objects that are represented by rows in T1 and T2 have different life-times in the system (think instances and devices, groups, quotas, networks... this is common in our data model). In order to properly migrate and assign values to the newly created c1 you will need to: * Add the column c1 to the live T1 * join on live T2 *and* shadow T2 to get the data needed and populate the new column. * Add the column c1 to the shadow T1 * join on live T2 *and* shadow T2 to get the data needed and populate the new column. Hence - exponentially more joins, as I stated in my previous email. Now - this was the *simplest* possible example - things get potentially much more complicated if the new column relies on previous state of data (say - counters of some sort), if you need to get data from a third table (think many-to-many relationships) etc. If you need a real example - take a look at migration 186 in the current trunk. As I said in the previous email, and based on the examples above - this design decision (unconstrained rows) makes it difficult to reason about data in the system! I personally - as a developer working on the codebase - am not happy making this trade-off in favour of archiving in this way - and would like to see some design decisions changed, or at the very least a more broad consensus, that the state as-is is actually OK and we don't need to worry about it. - About the db_sync downtime (upgrading from one to another DB version) (IRC) DB Archiving just help us to reduce this time. One of possible variant (high level): 1) Move to shadow_tables our deleted rows This step is in the case of the workflow you describe here: 1) mandatory 2) completely defeating the purpose of unconstrained rows if in order to migrate we have to move *all* of them to shadow tables whcih may take a non-trivial amount of time. 2) Copy shadow_tables from schema - to tmp_schema 3) Drop data from shadow_tables 4) Make migrations on schema: a) As shadow tables are empty all migrations will be done really fast b) As our original tables (have) only non deleted rows migration will be done also much faster. 5) Run Nova 6)
Re: [openstack-dev] Work around DB in OpenStack (Oslo, Nova, Cinder, Glance)
Hi all, I would like to explain very high level steps of our work: 1) Sync work with DB in all projects (We have what we have, let it be in one place) 2) Refactor work with DB in one place (not independently in all projects) So I understand that our code around DB is not ideal, but let it be in one place at first. -- About DB archiving. -- Let me describe how it works for non familiar contributors: For each table (that have columns, indexes, unique constraints, fk and etc) we have shadow table that have only columns (without indexes, unique constraints, fk..) And then we have utility that makes next things: move from original table records (that are marked as deleted) to shadow This was done by David Ripton in Nova in Grizzly. - After a few months I found that there are tons of migrations for original table and there is no migration for shadow table. And implement this BP https://blueprints.launchpad.net/nova/+spec/db-improve-archiving that makes next: a) sync shadow tables with original b) add test that checks that: 1) for each original table we have shadow 2) we don't have extra shadow tables 3) shadow tables have same columns as original Why is this so important: 1) If shadow and original table are not synced there could be 2 results after shadow util was ran: a) it will fail b) (worst) it will break data in shadow table -- Also there is no exponential growth of JOINs when we are using shadow tables: In migrations we should: a) Do the same actions on columns (drop, alter) in main and shadow b) Do the same actions on tables (create/drop/rename) c) Do the same actions on data in Tables So you are doing separately actions on Main tables and Shadow tables, but after migration our tables should be synced. And it is easier to make the same actions 2 times on main and shadow table in one migration then in separated migrations. - About the db_sync downtime (upgrading from one to another DB version) (IRC) DB Archiving just help us to reduce this time. One of possible variant (high level): 1) Move to shadow_tables our deleted rows 2) Copy shadow_tables from schema - to tmp_schema 3) Drop data from shadow_tables 4) Make migrations on schema: a) As shadow tables are empty all migrations will be done really fast b) As our original tables (have) only non deleted rows migration will be done also much faster. 5) Run Nova 6) Make migration on tmp_schema 7) Copy from tmp_schema to shcema (if it is required for some reasons) So for example writing utitlites that will be able to do this will be very useful. -- So what I think about DB archiving. It is great things that helps us: 1) to reduce migrations downtime 2) to reduce count of rows in original table and improve performance And I think that tests that checks that original and shadow tables are synces is required here. Best regards, Boris Pavlovic On Fri, Jul 5, 2013 at 3:41 PM, Nikola Đipanov ndipa...@redhat.com wrote: On 02/07/13 19:50, Boris Pavlovic wrote: *) DB Archiving a) create shadow tables b) add tests that checks that shadow and main table are synced. c) add code that work with shadow tables. Hi Boris all, I have a few points regarding db archiving work that I am growing more concerned about, so I though I might mention them on this thread. I pointed them out ad-hoc on a recent review https://review.openstack.org/#/c/34643/ and there is some discussion there already, although was not very fruitful. I feel that there were a few design oversights and as a result it has a couple of rough edges I noticed. First issue is about the fact that shadow tables do not present a view of the world themselves but are just unconstrained rows copied from live tables. This is understandably done for performance reasons while archiving (with current design ideas in place), but also causes issues when migrations affect more than one table. Especially if data migrations need to look at more tables at once, the actual number of table joins needed in order to consider everything grows exponentially. It could be argued that these are not that common, but is something that will make development more difficult and migrations painful once it comes up. To put it shortly - this property generally makes it harder to reason about data. Second point (and it ties in with the first one since it makes it difficult to fix) - Maybe shadow table migrations should be kept separate, and made optional? Currently there is a check that will fail the tests unless the migration is done on both tables, which I think should be removed in favour of separate migrations. Developers should still migrate both of course - but deployers should be able to choose not to do it according to their needs/scale. I am sure there are people on this list that can chip in more on this subject (I've had a brief discussion with lifeless on this topic on IRC). I'm afraid that if you
Re: [openstack-dev] Work around DB in OpenStack (Oslo, Nova, Cinder, Glance)
On Wed, Jul 03, 2013, Michael Still mi...@stillhq.com wrote: On Wed, Jul 3, 2013 at 3:50 AM, Boris Pavlovic bo...@pavlovic.me wrote: Question: Why we should put in oslo slqlalchemy-migrate monkey patches, when we are planing to switch to alembic? Answer: If we don’t put in oslo sqlalchemy-migrate monkey patches. We won't be able to work on 7 point at all until 8 and 10 points will be implemented in every project. Also work around 8 point is not finished, so we are not able to implement 10 points in any of project. So this blocks almost all work in all projects. I think that these 100-200 lines of code are not so big price for saving few cycles of time. We've talked in the past (Folsom summit?) about alembic, but I'm not aware of anyone who is actually working on it. Is someone working on moving us to alembic? If not, it seems unfair to block database work on something no one is actually working on. I've started working on a non-alembic migration path that was discussed at the Grizzly summit. While alembic is better than sqlalchemy-migrate, it still requires long downtimes when some migrations are run. We discussed moving to an expand/contract cycle where migrations add new columns, allow migrations to slowly (relatively speaking) migrate data over, then (possibly) remove any old columns. JE ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Work around DB in OpenStack (Oslo, Nova, Cinder, Glance)
On Wed, Jul 3, 2013 at 6:50 AM, Michael Still mi...@stillhq.com wrote: On Wed, Jul 3, 2013 at 3:50 AM, Boris Pavlovic bo...@pavlovic.me wrote: Question: Why we should put in oslo slqlalchemy-migrate monkey patches, when we are planing to switch to alembic? Answer: If we don’t put in oslo sqlalchemy-migrate monkey patches. We won't be able to work on 7 point at all until 8 and 10 points will be implemented in every project. Also work around 8 point is not finished, so we are not able to implement 10 points in any of project. So this blocks almost all work in all projects. I think that these 100-200 lines of code are not so big price for saving few cycles of time. We've talked in the past (Folsom summit?) about alembic, but I'm not aware of anyone who is actually working on it. Is someone working on moving us to alembic? If not, it seems unfair to block database work on something no one is actually working on. That's not quite what happened. Unfortunately the conversation happened in gerrit, IRC, and email, so it's a little hard to piece together from the outside. I had several concerns with the nature of this change, not the least of which is it is monkey-patching a third-party library to add a feature instead of just modifying that library upstream. The patch I objected to (https://review.openstack.org/#/c/31016) modifies the sqlite driver inside sqlalchemy-migrate to support some migration patterns that it does not support natively. There's no blueprint linked from the commit message on the patch I was reviewing, so I didn't have the full background. The description of the patch, and the discussion in gerrit, initially led me to believe this was for unit tests for the migrations themselves. I pointed out that it didn't make any sense to test the migrations on a database no one would use in production, especially if we had to monkey patch the driver to make the migrations work in the first place. Boris clarified that the tests were the general nova tests, at which point I asked why nova was relying on the migrations to set up a database for its tests instead of just using the models. Sean cleared up the history on that point, and although I'm still not happy with the idea of putting code in oslo with the pre-declared plan to remove it (rather than consider it for graduation), I agreed that the pragmatic thing to do for now is to live with the monkey patched version of sqlalchemy-migrate. At this point, I have removed my -2 to the patch, but I haven't had a chance to fully review the code. I voted 0 to unblock it in case other reviewers had time to look at it before I was able to come back. That hasn't happened, but the patch is no longer blocked. Somewhere during that conversation, I suggested looking at alembic as an alternative, but alembic clearly states in its documentation that migrations on sqlite are not supported because of the database's limited support for alter statements, but that if someone wants to contribute those features patches would be welcome. If we do need this feature to support good unit tests of SQLalchemy-based projects, we should eventually move it out of oslo and into alembic, then move our migration scripts to use alembic. It would make the most sense to do that on a release boundary, when we normally collapse the migration scripts anyway. Even better would be if we could make the models and migration scripts produce databases that are compatible enough for testing the main project, and then run tests for the migrations themselves against real databases as a separate step. Based on the plan Boris has posted, it sounds like he is working toward both of these goals. Doug Michael ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Work around DB in OpenStack (Oslo, Nova, Cinder, Glance)
On 07/03/2013 07:26 AM, Johannes Erdfelt wrote: On Wed, Jul 03, 2013, Michael Still mi...@stillhq.com wrote: On Wed, Jul 3, 2013 at 3:50 AM, Boris Pavlovic bo...@pavlovic.me wrote: Question: Why we should put in oslo slqlalchemy-migrate monkey patches, when we are planing to switch to alembic? Answer: If we don’t put in oslo sqlalchemy-migrate monkey patches. We won't be able to work on 7 point at all until 8 and 10 points will be implemented in every project. Also work around 8 point is not finished, so we are not able to implement 10 points in any of project. So this blocks almost all work in all projects. I think that these 100-200 lines of code are not so big price for saving few cycles of time. We've talked in the past (Folsom summit?) about alembic, but I'm not aware of anyone who is actually working on it. Is someone working on moving us to alembic? If not, it seems unfair to block database work on something no one is actually working on. I've started working on a non-alembic migration path that was discussed at the Grizzly summit. While alembic is better than sqlalchemy-migrate, it still requires long downtimes when some migrations are run. We discussed moving to an expand/contract cycle where migrations add new columns, allow migrations to slowly (relatively speaking) migrate data over, then (possibly) remove any old columns. I think if you're working on a non-alembic plan and boris is working on an alembic plan, then something is going to be unhappy in the not-too-distant future. Can we get alignment on this? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] Work around DB in OpenStack (Oslo, Nova, Cinder, Glance)
On 07/02/2013 10:50 AM, Boris Pavlovic wrote: ### Goal ### We should fix work with DB, unify it in all projects and use oslo code for all common things. Just wanted to say a quick word that isn't about migrations... Thank you. This is all great, and I'm thrilled someone is taking on the task of fixing what is probably one of OpenStack's biggest nightmares. In more words: DB API *) Fully cover by tests. *) Run tests against all backends (now they are runed only against sqlite). *) Unique constraints (instead of select + insert) a) Provide unique constraints. b) Add missing unique constraints. *) DB Archiving a) create shadow tables b) add tests that checks that shadow and main table are synced. c) add code that work with shadow tables. *) DB API performance optimization a) Remove unused joins.. b) 1 query instead of N (where it is possible). c) Add methods that could improve performance. d) Drop unused methods. *) DB reconnect a) Don’t break huge task if we lost connection for a moment.. just retry DB query. *) DB Session cleanup a) do not use session parameter in public DB API methods. b) fix places where we are doing N queries in N transactions instead of 1. c) get only data that is used (e.g. len(query.all()) = query.count()). DB Migrations *) Test DB Migrations against all backends and real data. *) Fix: DB schemas after Migrations should be same in different backends *) Fix: hidden bugs, that are caused by wrong migrations: a) fix indexes. e.g. 152 migration in Nova drop all Indexes that has deleted column b) fix wrong types c) drop unused tables *) Switch from sqlalchemy-migrate to something that is not death (e.g. alembic). DB Models *) Fix: Schema that is created by Models should be the same as after migrations. *) Fix: Unit tests should be runed on DB that was created by Models not migrations. *) Add test that checks that Models are synced with migrations. Oslo Code *) Base Sqlalchemy Models. *) Work around engine and session. *) SqlAlchemy Utils - that helps us with migrations and tests. *) Test migrations Base. *) Use common test wrapper that allows us to run tests on different backends. ### Implementation ### This is really really huge task. And we are almost done with Nova=). In OpenStack for such work there is only one approach (“baby steps” development deriven). So we are making tons of patches that could be easy reviewed. But there is also minuses in such approach. It is pretty hard to track work on high level. And sometimes there are misunderstand. For example with oslo code. In few words at this moment we would like to add (for some time) in oslo monkey patching for sqlalchemy-migrate. And I got reasonable question from Doug Hellmann. Why? I answer because of our “baby steps”. But if you don’t have a list of baby steps it is pretty hard to understand why our baby steps need this thing. And why we don’t switch to alembic firstly. So I would like to describe our Road Map and write list of baby steps. --- OSLO *) (Merged) Base code for Models and sqlalchemy engine (session) *) (On review) Sqlalchemy utils that are used to: 1. Fix bugs in sqlalchemy-migrate 2. Base code for migrations that provides Unique Constraints. 3. Utils for db.archiving helps us to create and check shadow tables. *) (On review) Testtools wrapper We should have only one testtool wrapper in all projects. And this is the one of base steps in task of running tests against all backends. *) (On review) Test migrations base Base classes that provides us to test our migrations against all backends on real data *) (On review, not finished yet) DB Reconnect. *) (Not finished) Test that checks that schemas and models are synced --- ${PROJECT_NAME} In different projects we could work absolutely simultaneously, and first candidates are Glance and Cinder. But inside project we could also work simultaneously. Here is the workflow: 1) (SYNC) Use base code for Models and sqlalchemy engines (from oslo) 2) (SYNC) Use test migrations base (from oslo) 3) (SYNC) Use SqlAlchemy utils (from oslo) 4) (1 patch) Switch to OSLO DB code 5) (1 patch) Remove ported test migrations 6) (1 Migration) Provide unique constraints (change type
Re: [openstack-dev] Work around DB in OpenStack (Oslo, Nova, Cinder, Glance)
Hi Monty, I think if you're working on a non-alembic plan and boris is working on an alembic plan, then something is going to be unhappy in the not-too-distant future. Can we get alignment on this? As I said before, we are preparing our DB code to move from sqlalchemy-migrate to something another. There will be a tons of work before we will be able to rewrite or migration scripts to alembic or something else. And we are not sure that we would like to use alembic=) Best regards, Boris Pavlovic On Wed, Jul 3, 2013 at 9:30 PM, Monty Taylor mord...@inaugust.com wrote: On 07/02/2013 10:50 AM, Boris Pavlovic wrote: ### Goal ### We should fix work with DB, unify it in all projects and use oslo code for all common things. Just wanted to say a quick word that isn't about migrations... Thank you. This is all great, and I'm thrilled someone is taking on the task of fixing what is probably one of OpenStack's biggest nightmares. In more words: DB API *) Fully cover by tests. *) Run tests against all backends (now they are runed only against sqlite). *) Unique constraints (instead of select + insert) a) Provide unique constraints. b) Add missing unique constraints. *) DB Archiving a) create shadow tables b) add tests that checks that shadow and main table are synced. c) add code that work with shadow tables. *) DB API performance optimization a) Remove unused joins.. b) 1 query instead of N (where it is possible). c) Add methods that could improve performance. d) Drop unused methods. *) DB reconnect a) Don’t break huge task if we lost connection for a moment.. just retry DB query. *) DB Session cleanup a) do not use session parameter in public DB API methods. b) fix places where we are doing N queries in N transactions instead of 1. c) get only data that is used (e.g. len(query.all()) = query.count()). DB Migrations *) Test DB Migrations against all backends and real data. *) Fix: DB schemas after Migrations should be same in different backends *) Fix: hidden bugs, that are caused by wrong migrations: a) fix indexes. e.g. 152 migration in Nova drop all Indexes that has deleted column b) fix wrong types c) drop unused tables *) Switch from sqlalchemy-migrate to something that is not death (e.g. alembic). DB Models *) Fix: Schema that is created by Models should be the same as after migrations. *) Fix: Unit tests should be runed on DB that was created by Models not migrations. *) Add test that checks that Models are synced with migrations. Oslo Code *) Base Sqlalchemy Models. *) Work around engine and session. *) SqlAlchemy Utils - that helps us with migrations and tests. *) Test migrations Base. *) Use common test wrapper that allows us to run tests on different backends. ### Implementation ### This is really really huge task. And we are almost done with Nova=). In OpenStack for such work there is only one approach (“baby steps” development deriven). So we are making tons of patches that could be easy reviewed. But there is also minuses in such approach. It is pretty hard to track work on high level. And sometimes there are misunderstand. For example with oslo code. In few words at this moment we would like to add (for some time) in oslo monkey patching for sqlalchemy-migrate. And I got reasonable question from Doug Hellmann. Why? I answer because of our “baby steps”. But if you don’t have a list of baby steps it is pretty hard to understand why our baby steps need this thing. And why we don’t switch to alembic firstly. So I would like to describe our Road Map and write list of baby steps. --- OSLO *) (Merged) Base code for Models and sqlalchemy engine (session) *) (On review) Sqlalchemy utils that are used to: 1. Fix bugs in sqlalchemy-migrate 2. Base code for migrations that provides Unique Constraints. 3. Utils for db.archiving helps us to create and check shadow tables. *) (On review) Testtools wrapper We should have only one testtool wrapper in all projects. And this is the one of base steps in task of running tests against all backends. *) (On review) Test migrations base Base classes that provides us to test our migrations against all backends on real data
Re: [openstack-dev] Work around DB in OpenStack (Oslo, Nova, Cinder, Glance)
One small addition I would suggest is a step to remove the unused sqlalchemy-migrate code once this is all done. That's my main concern with moving it to Oslo right now. Also, is this a formal blueprint(s)? Seems like it should be. -Ben On 2013-07-02 12:50, Boris Pavlovic wrote: ### Goal ### We should fix work with DB, unify it in all projects and use oslo code for all common things. In more words: DB API *) Fully cover by tests. *) Run tests against all backends (now they are runed only against sqlite). *) Unique constraints (instead of select + insert) a) Provide unique constraints. b) Add missing unique constraints. *) DB Archiving a) create shadow tables b) add tests that checks that shadow and main table are synced. c) add code that work with shadow tables. *) DB API performance optimization a) Remove unused joins. b) 1 query instead of N (where it is possible). c) Add methods that could improve performance. d) Drop unused methods. *) DB reconnect a) Don't break huge task if we lost connection for a moment. just retry DB query. *) DB Session cleanup a) do not use session parameter in public DB API methods. b) fix places where we are doing N queries in N transactions instead of 1. c) get only data that is used (e.g. len(query.all()) = query.count()). DB Migrations *) Test DB Migrations against all backends and real data. *) Fix: DB schemas after Migrations should be same in different backends *) Fix: hidden bugs, that are caused by wrong migrations: a) fix indexes. e.g. 152 migration in Nova drop all Indexes that has deleted column b) fix wrong types c) drop unused tables *) Switch from sqlalchemy-migrate to something that is not death (e.g. alembic). DB Models *) Fix: Schema that is created by Models should be the same as after migrations. *) Fix: Unit tests should be runed on DB that was created by Models not migrations. *) Add test that checks that Models are synced with migrations. Oslo Code *) Base Sqlalchemy Models. *) Work around engine and session. *) SqlAlchemy Utils - that helps us with migrations and tests. *) Test migrations Base. *) Use common test wrapper that allows us to run tests on different backends. ### Implementation ### This is really really huge task. And we are almost done with Nova=). In OpenStack for such work there is only one approach (baby steps development deriven). So we are making tons of patches that could be easy reviewed. But there is also minuses in such approach. It is pretty hard to track work on high level. And sometimes there are misunderstand. For example with oslo code. In few words at this moment we would like to add (for some time) in oslo monkey patching for sqlalchemy-migrate. And I got reasonable question from Doug Hellmann. Why? I answer because of our baby steps. But if you don't have a list of baby steps it is pretty hard to understand why our baby steps need this thing. And why we don't switch to alembic firstly. So I would like to describe our Road Map and write list of baby steps. --- OSLO *) (Merged) Base code for Models and sqlalchemy engine (session) *) (On review) Sqlalchemy utils that are used to: 1. Fix bugs in sqlalchemy-migrate 2. Base code for migrations that provides Unique Constraints. 3. Utils for db.archiving helps us to create and check shadow tables. *) (On review) Testtools wrapper We should have only one testtool wrapper in all projects. And this is the one of base steps in task of running tests against all backends. *) (On review) Test migrations base Base classes that provides us to test our migrations against all backends on real data *) (On review, not finished yet) DB Reconnect. *) (Not finished) Test that checks that schemas and models are synced --- ${PROJECT_NAME} In different projects we could work absolutely simultaneously, and first candidates are Glance and Cinder. But inside project we could also work simultaneously. Here is the workflow: 1) (SYNC) Use base code for Models and sqlalchemy engines (from oslo) 2) (SYNC) Use test migrations base (from oslo) 3) (SYNC) Use SqlAlchemy utils (from oslo) 4) (1 patch) Switch to OSLO DB code 5) (1 patch) Remove ported test migrations 6) (1 Migration) Provide unique constraints (change type of deleted column) 7) (1 Migration) Add shadow tables a) Create shadow tables b) Add test that checks that they are synced always 8) (N Migrations) UniqueConstraint/Session/Optimization workflow: a) (1 patch) Add/Improve/Refactor tests for part of api (that is connected with model) b) (1 patch) Fix session c) (1 patch)