On 11/25/2014 09:34 PM, Mike Bayer wrote:
On Nov 25, 2014, at 8:15 PM, Ahmed RAHAL <[email protected]> wrote:

Hi,

Le 2014-11-24 17:20, Michael Still a écrit :
Heya,

This is a new database, so its our big chance to get this right. So,
ideas welcome...

Some initial proposals:

  - we do what we do in the current nova database -- we have a deleted
column, and we set it to true when we delete the instance.

  - we have shadow tables and we move delete rows to a shadow table.

  - something else super clever I haven't thought of.

Some random thoughts that came to mind ...

1/ as far as I remember, you rarely want to delete a row
- it's usually a heavy DB operation (well, was back then)
- it's destructive (but we may want that)
- it creates fragmentation (less of a problem depending on db engine)
- it can break foreign key relations if not done the proper way

deleting records with foreign key dependencies is a known quantity.  Those 
items are all related and being able to delete everything related is a 
well-solved problem, both via ON DELETE cascades as well as standard ORM 
features.

++

2/ updating a row to 'deleted=1'
- gives an opportunity to set a useful deletion time-stamp
I would even argue that setting the deleted_at field would suffice to declare a row 
'deleted' (as in 'not NULL'). I know, "explicit is better than implicit" …

the logic that’s used is that “deleted” is set to the primary key of the 
record, this is to allow UNIQUE constraints to be set up that serve on the 
non-deleted rows only (e.g. UNIQUE on “x” + “deleted” is possible when there 
are multiple “deleted” rows with “x”).

Indeed. Because people want to be able to name an instance one thing, delete it, and immediately name another instance the same thing. Ugh -- what an annoying use case, IMO. Better to just delete the row out of the database after archival/audit log of the operation.

- the update operation is not destructive
- an admin/DBA can decide when and how to purge/archive rows

3/ moving the row at deletion
- you want to avoid additional steps to complete an operation, thus avoid 
creating a new record while deleting one
- even if you wrap things into a transaction, not being able to create a row 
somewhere can make your delete transaction fail
- if I were to archive all deleted rows, at scale I'd probably move them to 
another db server altogether

if you’re really “archiving”, I’d just dump out a log of what occurred to a 
textual log file, then you archive the files.  There’s no need for a pure 
“audit trail” to even be in the relational DB.

Precisely. Why is the RDBMS the thing that is used for archival/audit logging? Why not a NoSQL store or a centralized log facility? All that would be needed would be for us to standardize on the format of the archival record, standardize on the things to provide with the archival record (for instance system metadata, etc), and then write a simple module that would write an archival record to some backend data store.

Then we could rid ourselves of the awfulness of the shadow tables and all of the read_deleted=yes crap.

Best,
-jay

Now, I for one would keep the current mark-as-deleted model.

I however perfectly get the problem of massive churn with instance 
creation/deletion.

is there?   inserting and updating rows is a normal thing in relational DBs.


So, let's be crazy, why not have a config option 'on_delete=mark_delete', 
'on_delete=purge' or 'on_delete=archive' and let the admin choose ? (is that 
feasible ?)

I’m -1 on that.  The need for records to be soft-deleted or not, and if those 
soft-deletes need to be accessible in the application, should be decided up 
front.  Adding a multiplicity of options just makes the code that much more 
complicated and fragments its behaviors and test coverage.   The suggestion 
basically tries to avoid making a decision and I think more thought should be 
put into what is truly needed.


This would especially come handy if the admin decides the global cells database 
may not need to keep track of deleted instances, the cell-local nova database 
being the absolute reference for that.

why would an admin decide that this is, or is not, needed?   if the deleted 
data isn’t needed by the live app, it should just be dumped to an archive.  
admins can set how often that archive should be purged, but IMHO the “pipeline” 
of these records should be straight; there shouldn’t be junctions and switches 
that cause there to be multiple data paths.   It leads to too much complexity.


_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to