On 11/25/2014 09:34 PM, Mike Bayer wrote:
On Nov 25, 2014, at 8:15 PM, Ahmed RAHAL <[email protected]> wrote:
Hi,
Le 2014-11-24 17:20, Michael Still a écrit :
Heya,
This is a new database, so its our big chance to get this right. So,
ideas welcome...
Some initial proposals:
- we do what we do in the current nova database -- we have a deleted
column, and we set it to true when we delete the instance.
- we have shadow tables and we move delete rows to a shadow table.
- something else super clever I haven't thought of.
Some random thoughts that came to mind ...
1/ as far as I remember, you rarely want to delete a row
- it's usually a heavy DB operation (well, was back then)
- it's destructive (but we may want that)
- it creates fragmentation (less of a problem depending on db engine)
- it can break foreign key relations if not done the proper way
deleting records with foreign key dependencies is a known quantity. Those
items are all related and being able to delete everything related is a
well-solved problem, both via ON DELETE cascades as well as standard ORM
features.
++
2/ updating a row to 'deleted=1'
- gives an opportunity to set a useful deletion time-stamp
I would even argue that setting the deleted_at field would suffice to declare a row
'deleted' (as in 'not NULL'). I know, "explicit is better than implicit" …
the logic that’s used is that “deleted” is set to the primary key of the
record, this is to allow UNIQUE constraints to be set up that serve on the
non-deleted rows only (e.g. UNIQUE on “x” + “deleted” is possible when there
are multiple “deleted” rows with “x”).
Indeed. Because people want to be able to name an instance one thing,
delete it, and immediately name another instance the same thing. Ugh --
what an annoying use case, IMO. Better to just delete the row out of the
database after archival/audit log of the operation.
- the update operation is not destructive
- an admin/DBA can decide when and how to purge/archive rows
3/ moving the row at deletion
- you want to avoid additional steps to complete an operation, thus avoid
creating a new record while deleting one
- even if you wrap things into a transaction, not being able to create a row
somewhere can make your delete transaction fail
- if I were to archive all deleted rows, at scale I'd probably move them to
another db server altogether
if you’re really “archiving”, I’d just dump out a log of what occurred to a
textual log file, then you archive the files. There’s no need for a pure
“audit trail” to even be in the relational DB.
Precisely. Why is the RDBMS the thing that is used for archival/audit
logging? Why not a NoSQL store or a centralized log facility? All that
would be needed would be for us to standardize on the format of the
archival record, standardize on the things to provide with the archival
record (for instance system metadata, etc), and then write a simple
module that would write an archival record to some backend data store.
Then we could rid ourselves of the awfulness of the shadow tables and
all of the read_deleted=yes crap.
Best,
-jay
Now, I for one would keep the current mark-as-deleted model.
I however perfectly get the problem of massive churn with instance
creation/deletion.
is there? inserting and updating rows is a normal thing in relational DBs.
So, let's be crazy, why not have a config option 'on_delete=mark_delete',
'on_delete=purge' or 'on_delete=archive' and let the admin choose ? (is that
feasible ?)
I’m -1 on that. The need for records to be soft-deleted or not, and if those
soft-deletes need to be accessible in the application, should be decided up
front. Adding a multiplicity of options just makes the code that much more
complicated and fragments its behaviors and test coverage. The suggestion
basically tries to avoid making a decision and I think more thought should be
put into what is truly needed.
This would especially come handy if the admin decides the global cells database
may not need to keep track of deleted instances, the cell-local nova database
being the absolute reference for that.
why would an admin decide that this is, or is not, needed? if the deleted
data isn’t needed by the live app, it should just be dumped to an archive.
admins can set how often that archive should be purged, but IMHO the “pipeline”
of these records should be straight; there shouldn’t be junctions and switches
that cause there to be multiple data paths. It leads to too much complexity.
_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
_______________________________________________
OpenStack-dev mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev