> On Nov 25, 2014, at 8:15 PM, Ahmed RAHAL <ara...@iweb.com> wrote:
> 
> Hi,
> 
> Le 2014-11-24 17:20, Michael Still a écrit :
>> Heya,
>> 
>> This is a new database, so its our big chance to get this right. So,
>> ideas welcome...
>> 
>> Some initial proposals:
>> 
>>  - we do what we do in the current nova database -- we have a deleted
>> column, and we set it to true when we delete the instance.
>> 
>>  - we have shadow tables and we move delete rows to a shadow table.
>> 
>>  - something else super clever I haven't thought of.
> 
> Some random thoughts that came to mind ...
> 
> 1/ as far as I remember, you rarely want to delete a row
> - it's usually a heavy DB operation (well, was back then)
> - it's destructive (but we may want that)
> - it creates fragmentation (less of a problem depending on db engine)
> - it can break foreign key relations if not done the proper way

deleting records with foreign key dependencies is a known quantity.  Those 
items are all related and being able to delete everything related is a 
well-solved problem, both via ON DELETE cascades as well as standard ORM 
features.


> 
> 2/ updating a row to 'deleted=1'
> - gives an opportunity to set a useful deletion time-stamp
> I would even argue that setting the deleted_at field would suffice to declare 
> a row 'deleted' (as in 'not NULL'). I know, "explicit is better than 
> implicit" …

the logic that’s used is that “deleted” is set to the primary key of the 
record, this is to allow UNIQUE constraints to be set up that serve on the 
non-deleted rows only (e.g. UNIQUE on “x” + “deleted” is possible when there 
are multiple “deleted” rows with “x”).

> - the update operation is not destructive
> - an admin/DBA can decide when and how to purge/archive rows
> 
> 3/ moving the row at deletion
> - you want to avoid additional steps to complete an operation, thus avoid 
> creating a new record while deleting one
> - even if you wrap things into a transaction, not being able to create a row 
> somewhere can make your delete transaction fail
> - if I were to archive all deleted rows, at scale I'd probably move them to 
> another db server altogether

if you’re really “archiving”, I’d just dump out a log of what occurred to a 
textual log file, then you archive the files.  There’s no need for a pure 
“audit trail” to even be in the relational DB.


> Now, I for one would keep the current mark-as-deleted model.
> 
> I however perfectly get the problem of massive churn with instance 
> creation/deletion.

is there?   inserting and updating rows is a normal thing in relational DBs.


> So, let's be crazy, why not have a config option 'on_delete=mark_delete', 
> 'on_delete=purge' or 'on_delete=archive' and let the admin choose ? (is that 
> feasible ?)

I’m -1 on that.  The need for records to be soft-deleted or not, and if those 
soft-deletes need to be accessible in the application, should be decided up 
front.  Adding a multiplicity of options just makes the code that much more 
complicated and fragments its behaviors and test coverage.   The suggestion 
basically tries to avoid making a decision and I think more thought should be 
put into what is truly needed.


> This would especially come handy if the admin decides the global cells 
> database may not need to keep track of deleted instances, the cell-local nova 
> database being the absolute reference for that.

why would an admin decide that this is, or is not, needed?   if the deleted 
data isn’t needed by the live app, it should just be dumped to an archive.  
admins can set how often that archive should be purged, but IMHO the “pipeline” 
of these records should be straight; there shouldn’t be junctions and switches 
that cause there to be multiple data paths.   It leads to too much complexity.


_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to