Re: [openstack-dev] [nova] Question about fixing missing soft deleted rows

2016-09-15 Thread Sylvain Bauza



Le 15/09/2016 14:21, Sean Dague a écrit :

On 09/14/2016 09:21 PM, Matt Riedemann wrote:

I'm looking for other input on a question I have in this change:

https://review.openstack.org/#/c/345191/4/nova/db/sqlalchemy/api.py

We've had a few patches like this where we don't (soft) delete entries
related to an instance when that instance record is (soft) deleted.
These then cause the archive command to fail because of the referential
constraint.

Then we go in and add a new entry in the instance_destroy method so we
start (soft) deleting *new* things, but we don't cleanup anything old.

In the change above this is working around the fact we might have
lingering consoles entries for an instance that's being archived.

One suggestion I made was adding a database migration that soft deletes
any console entries where the related instance is deleted (deleted !=
0). Is that a bad idea? It's not a schema migration, it's data cleanup
so archive works. We could do the same thing with a nova-manage command,
but we don't know that someone has run it like they do with the DB
migrations.

Another idea is doing it in the nova-manage db online_data_migrations
command which should be run on upgrade. If we landed something like that
in say Ocata, then we could remove the TODO in the archive code in Pike.

Other thoughts?

Is there a reason that archive doesn't go hunt for these references
first and delete them? I kind of assumed it would handle all the cleanup
logic itself, including this sort of integrity issue.


That's the alternative approach I was thinking in my email. Yeah, maybe 
reconciling the DB when archiving the rows seems the better approach 
instead of a periodic call for fixing that.




The data migration would still take time, and a table lock, even though
it's just deletes, so that feels like it should be avoided.


Well, that could be done thru cursors so that the lock is not really a 
problem, but by thinking about that, I tend to agree with you on the 
best approach which could be fixing the archive command.


-Sylvain



-Sean




__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Question about fixing missing soft deleted rows

2016-09-15 Thread Sean Dague
On 09/14/2016 09:21 PM, Matt Riedemann wrote:
> I'm looking for other input on a question I have in this change:
> 
> https://review.openstack.org/#/c/345191/4/nova/db/sqlalchemy/api.py
> 
> We've had a few patches like this where we don't (soft) delete entries
> related to an instance when that instance record is (soft) deleted.
> These then cause the archive command to fail because of the referential
> constraint.
> 
> Then we go in and add a new entry in the instance_destroy method so we
> start (soft) deleting *new* things, but we don't cleanup anything old.
> 
> In the change above this is working around the fact we might have
> lingering consoles entries for an instance that's being archived.
> 
> One suggestion I made was adding a database migration that soft deletes
> any console entries where the related instance is deleted (deleted !=
> 0). Is that a bad idea? It's not a schema migration, it's data cleanup
> so archive works. We could do the same thing with a nova-manage command,
> but we don't know that someone has run it like they do with the DB
> migrations.
> 
> Another idea is doing it in the nova-manage db online_data_migrations
> command which should be run on upgrade. If we landed something like that
> in say Ocata, then we could remove the TODO in the archive code in Pike.
> 
> Other thoughts?

Is there a reason that archive doesn't go hunt for these references
first and delete them? I kind of assumed it would handle all the cleanup
logic itself, including this sort of integrity issue.

The data migration would still take time, and a table lock, even though
it's just deletes, so that feels like it should be avoided.

-Sean

-- 
Sean Dague
http://dague.net

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Question about fixing missing soft deleted rows

2016-09-15 Thread Sylvain Bauza



Le 15/09/2016 03:21, Matt Riedemann a écrit :

I'm looking for other input on a question I have in this change:

https://review.openstack.org/#/c/345191/4/nova/db/sqlalchemy/api.py

We've had a few patches like this where we don't (soft) delete entries 
related to an instance when that instance record is (soft) deleted. 
These then cause the archive command to fail because of the 
referential constraint.


Then we go in and add a new entry in the instance_destroy method so we 
start (soft) deleting *new* things, but we don't cleanup anything old.


In the change above this is working around the fact we might have 
lingering consoles entries for an instance that's being archived.


One suggestion I made was adding a database migration that soft 
deletes any console entries where the related instance is deleted 
(deleted != 0). Is that a bad idea? It's not a schema migration, it's 
data cleanup so archive works. We could do the same thing with a 
nova-manage command, but we don't know that someone has run it like 
they do with the DB migrations.


Another idea is doing it in the nova-manage db online_data_migrations 
command which should be run on upgrade. If we landed something like 
that in say Ocata, then we could remove the TODO in the archive code 
in Pike.


Other thoughts?



The real problem I see with a nova-manage db online_data_migrations or a 
database migration is that we are not fixing the root problem, which is 
that anytime someone is soft-deleting an instance, we're not also (soft 
or hard) deleting the related entries.


If we were providing that script, it would be needed to call it 
idempotentically, right?
If so, MHO is that a single nova-manage command (or subcommand) could be 
enough, using a marker and a limit.


The other alternative I could see is that the archive command would be 
healing the DB by deleting the entries that should have been 
(soft/hard)deleted if the related instance is soft-deleted too.


-Sylvain



__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Question about fixing missing soft deleted rows

2016-09-15 Thread Clint Byrum
Excerpts from Matt Riedemann's message of 2016-09-14 20:21:09 -0500:
> I'm looking for other input on a question I have in this change:
> 
> https://review.openstack.org/#/c/345191/4/nova/db/sqlalchemy/api.py
> 
> We've had a few patches like this where we don't (soft) delete entries 
> related to an instance when that instance record is (soft) deleted. 
> These then cause the archive command to fail because of the referential 
> constraint.
> 
> Then we go in and add a new entry in the instance_destroy method so we 
> start (soft) deleting *new* things, but we don't cleanup anything old.
> 
> In the change above this is working around the fact we might have 
> lingering consoles entries for an instance that's being archived.
> 
> One suggestion I made was adding a database migration that soft deletes 
> any console entries where the related instance is deleted (deleted != 
> 0). Is that a bad idea? It's not a schema migration, it's data cleanup 
> so archive works. We could do the same thing with a nova-manage command, 
> but we don't know that someone has run it like they do with the DB 
> migrations.
> 
> Another idea is doing it in the nova-manage db online_data_migrations 
> command which should be run on upgrade. If we landed something like that 
> in say Ocata, then we could remove the TODO in the archive code in Pike.

In a former life doing highly scalable MySQL, we ditched all of the FK
checks because they were just extra work on write. Instead we introduced
workers that would walk tables and apply rules. They'd do something like
this:

SELECT * FROM book WHERE id > ? ORDER BY id LIMIT 10

And then check those 10 records for any referential integrity issues. If
they found a true orphan like you describe, they'd archive it and move
on. These workers would also sleep a bit between queries (usually about
half as long as the last query took) so they were never a constant drain
on the database. Then after sleeping, the worker takes the last id, and
passes it in. So it basically walks the table by id. If it ever gets
less than 10 records, it goes back to the minimum ID.

Doing this with many million record tables was quite effective,
generally most rows that were archived by something like this were
created by manual manipulation of the database, or legitimate bugs in
the software.

The benefit of doing this is you get to choose how much write and read
capacity you want to commit to consistency.

So, a thought is, rather than one-shot db migration script things, a
worker could be written that just crawls around the various project
databases and reports on, or fixes, known issues.
> 
> Other thoughts?
> 

__
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev