+1. Let's do it.

If we need to add some extra tests to protect against regressions, then so be 
it. I will help.
I also think better use could be made of the notifications system. A properly 
defined topic namespace would go a long way to assist that.

-----Original Message-----
From: openstack-bounces+dave.haynes=hp....@lists.launchpad.net 
[mailto:openstack-bounces+dave.haynes=hp....@lists.launchpad.net] On Behalf Of 
Pitucha, Stanislaw Izaak
Sent: 02 October 2012 17:43
To: openstack@lists.launchpad.net
Subject: [Openstack] Discussion / proposal: deleted column marker

Hi all,
I'd like to open a discussion on a topic that's been bugging me for a number of 
reasons - soft deletes (by that I mean marking rows with deleted=1 in the
db) and related - actions audit. Some research and speculations first...
To be honest I could not find any reason why the feature is there in the first 
place. Here's the commit that introduced the 'deleted' columns:
https://github.com/openstack/nova/commit/ae6905b9f1ef97206ee3c8722cec3b26fc0
64f38 - unfortunately the description says only "Refactored orm to support 
atomic actions". So the guessing part starts here. These are the possible uses 
for soft-deletion of the database records that I could come up with:

1. safety net (recover data what was deleted by accident) 2. audit / log 
(preserve the information about past data) 3. some kind of micro-optimisation 
where update is more useful than deletion
- be it speed or ease of handling foreign constraints (or not handling them 
straight away more likely) 4. ... no... that's all

But I think there's a number of issues with that approach. First - what are the 
issues with the possible uses above. Then - issues that I can see otherwise. 
Point by point:

1. Soft-deletion probably makes some restoration possible, but I doubt there's 
much that could be done without full analysis of the situation.
Mainly because the database is only about metainformation - the actual data 
users care about either goes away (ephemeral disks, memory, ...) or not 
(volumes, networks, ...) and is not recoverable. Since resources like ips and 
volumes can be just reused in other instances, not all recovery is possible 
anyway. Most hardcore fixes could be done by reinserting the 
original/reconstructed data just as easily as verifying what's safe to 
undelete. Both actions require looking at existing data and locking out 
information so it doesn't get reused while we're messing with the previous 
state.

2. Soft-deleted records are not great as a source of old information. This is 
connected to the previous point - some resources are just reused / rewritten 
instead of created and deleted. For example there's no record of what happens 
with old floating ips - the information gets overwritten when the IP is 
reassigned to the new instance, so the useful bits are gone.

3. This is the only thing I could come up with related to the commit message 
itself and the "support atomic actions" part. Maybe it was sometimes easier to 
mark something as deleted rather than managing and properly ordering deletes of 
a number of related entries.

So with that out of the way, here's a number of issues related to soft-deletes 
that I run into myself:

4. Indexing all this data on a busy system is getting a bit silly. Unless you 
do your own cleanup of old entries, you will end up in a situation where 
looking up instances on a host actually looks through thousands of "deleted"
rows even if only around 20 or so can be live and interesting. I know it's not 
a huge deal, but still an unnecessary cpu cycle burning.

5. Some things are just not possible to do in a safe and portable way at the 
moment. For example adding a new network and fixed IPs (there's a bug for that 
https://bugs.launchpad.net/nova/+bug/755138). I tried to fix this situation, 
but actually discovered that this is not possible to do using only sessions and 
with the 'deleted' column in place. There are ways to do it in a specific 
database (you can lock the whole table in mysql for example), but it's not 
portable then. The best you can do easily is limit the issue and hope that two 
inserts in different sessions won't happen at the same time. This could be 
easily done with an unique constraint if the 'deleted' column wasn't there.
I haven't checked, but guess that anything that can be named (and should have a 
unique name) has the same problem - security groups, keys, instances, ...

6. The amount of data grows pretty quickly in a busy environment. It has to be 
cleaned up, but due to some constraints, it can't be done easily in one go. 
Cleanup triggers help here, but that's some additional work that needs 
maintenance during schema changes. Schema changes themselves get interesting 
when you're actually spending time converting mostly rows you really don't care 
about. There were also instances where migration over many steps failed for 
some reason on very old rows (virtual interface related, can't recall which 
step was it at the moment). 

7. Not directly related, but I'll get back to that in the summary: owners of 
bigger deployments will either want to or are required to hold some record of 
various events and customer information. For example to handle security abuse 
reports, it would be great to know who owned a specific floating IP at a 
specific moment.


So what's my point?
Any use case I can find right now is not really improved by the current schema. 
It doesn't look like there are many benefits, but there are definitely some 
downsides.

Does anyone know why soft-delete is still in place?
Are there any reasons it can't / shouldn't be removed at this time?
If it's possible to remove it, would you miss it?


If the answers are all "no", my proposal for a future release is to:
- Check if the `deleted` column can be removed. Make sure other rows are either 
disconnected (update ref to null) or just deleted as needed and all data is 
really being removed.
- Add a system similar to notifications, but for auditable events - who did 
what with which resource at what time - in some semi-structured way that allows 
reviewing and summaries (basic information as separate columns + a description 
as a text message). I saw there was some blueprint for a cloudaudit api 
(https://blueprints.launchpad.net/openstack-common/+spec/cloud-audit-api),
but not much happened with it for a year, so I'm assuming it's dead now.

This would allow both proper cleanup of the data and retention of what's really 
necessary. It would also make it possible to use unique constraints where 
they're really needed (mainly IP descriptions) to prevent silly mistakes.

Any additional external processing of deleted records would be easier to do 
because the database trigger could be just set on the delete action.
Thoughts, comments and critique welcome :) Let me know what you think about 
these issues.

Regards,
Stanisław Pitucha
Cloud Services
Hewlett Packard



_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp

Reply via email to