Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-28 Thread Jay Pipes

On 11/27/2014 04:20 PM, Michael Still wrote:

On Fri, Nov 28, 2014 at 2:59 AM, Jay Pipes jaypi...@gmail.com wrote:

On 11/26/2014 04:24 PM, Mike Bayer wrote:


Precisely. Why is the RDBMS the thing that is used for
archival/audit logging? Why not a NoSQL store or a centralized log
facility? All that would be needed would be for us to standardize
on the format of the archival record, standardize on the things to
provide with the archival record (for instance system metadata,
etc), and then write a simple module that would write an archival
record to some backend data store.

Then we could rid ourselves of the awfulness of the shadow tables
and all of the read_deleted=yes crap.




+1000 - if we’re really looking to “do this right”, as the original
message suggested, this would be “right”.  If you don’t need these
rows in the app (and it would be very nice if you didn’t), dump it
out to an archive file / non-relational datastore.   As mentioned
elsewhere, this is entirely acceptable for organizations that are
“obliged” to store records for auditing purposes.   Nova even already
has a dictionary format for everything set up with nova objects, so
dumping these dictionaries out as JSON would be the way to go.



OK, spec added:

https://review.openstack.org/137669


At this point I don't think we should block the cells reworking effort
on this spec. I'm happy for people to pursue this, but I think its
unlikely to be work that is completed in kilo. We can transition the
new cells databases at the same time we fix the main database.


No disagreement at all. The proposed spec is a monster one, and we can 
certainly make a lot of progress in Kilo, but I wouldn't expect it to be 
completed any time soon.


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-27 Thread Michael Still
On Fri, Nov 28, 2014 at 2:59 AM, Jay Pipes jaypi...@gmail.com wrote:
 On 11/26/2014 04:24 PM, Mike Bayer wrote:

 Precisely. Why is the RDBMS the thing that is used for
 archival/audit logging? Why not a NoSQL store or a centralized log
 facility? All that would be needed would be for us to standardize
 on the format of the archival record, standardize on the things to
 provide with the archival record (for instance system metadata,
 etc), and then write a simple module that would write an archival
 record to some backend data store.

 Then we could rid ourselves of the awfulness of the shadow tables
 and all of the read_deleted=yes crap.



 +1000 - if we’re really looking to “do this right”, as the original
 message suggested, this would be “right”.  If you don’t need these
 rows in the app (and it would be very nice if you didn’t), dump it
 out to an archive file / non-relational datastore.   As mentioned
 elsewhere, this is entirely acceptable for organizations that are
 “obliged” to store records for auditing purposes.   Nova even already
 has a dictionary format for everything set up with nova objects, so
 dumping these dictionaries out as JSON would be the way to go.


 OK, spec added:

 https://review.openstack.org/137669

At this point I don't think we should block the cells reworking effort
on this spec. I'm happy for people to pursue this, but I think its
unlikely to be work that is completed in kilo. We can transition the
new cells databases at the same time we fix the main database.

Michael

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-26 Thread Andrew Laski


On 11/25/2014 11:54 AM, Solly Ross wrote:

I can't comment on other projects, but Nova definitely needs the soft
delete in the main nova database. Perhaps not for every table, but
there is definitely code in the code base which uses it right now.
Search for read_deleted=True if you're curious.

Just to save people a bit of time, it's actually `read_deleted='yes'`
or `read_deleted=yes` for many cases.

Just to give people a quick overview:

A cursory glance (no pun intended) seems to indicate that quite a few of
these are reading potentially deleted flavors.  For this case, it makes
sense to keep things in one table (as we do).

There are also quite a few that seem to be making sure deleted things
are properly cleaned up.  In this case, 'deleted' acts as a CLEANUP
state, so it makes just as much sense to keep the deleted rows in a
separate table.


For this case in particular, the concern is that operators might need
to find where an instance was running once it is deleted to be able to
diagnose issues reported by users. I think that's a valid use case of
this particular data.


This is a new database, so its our big chance to get this right. So,
ideas welcome...

Some initial proposals:

- we do what we do in the current nova database -- we have a deleted
column, and we set it to true when we delete the instance.

- we have shadow tables and we move delete rows to a shadow table.


Both approaches are viable, but as the soft-delete column is widespread, it
would be thorny for this new app to use some totally different scheme,
unless the notion is that all schemes should move to the audit table
approach (which I wouldn’t mind, but it would be a big job).FTR, the
audit table approach is usually what I prefer for greenfield development,
if all that’s needed is forensic capabilities at the database inspection
level, and not as much active GUI-based “deleted” flags.   That is, if you
really don’t need to query the history tables very often except when
debugging an issue offline.  The reason its preferable is because those
rows are still “deleted” from your main table, and they don’t get in the
way of querying.   But if you need to refer to these history rows in
context of the application, that means you need to get them mapped in such
a way that they behave like the primary rows, which overall is a more
difficult approach than just using the soft delete column.

I think it does really come down here to how you intend to use the soft-delete
functionality in Cells.  If you just are using it to debug or audit, then I 
think
the right way to go would be either the audit table (potentially can store more
lifecycle data, but could end up taking up more space) or a separate shadow
table (takes up less space).

If you are going to use the soft delete for application functionality, I would
consider differentiating between deleted and we still have things left to
clean up, since this seems to be mixing two different requirements into one.


The case that spawned this discussion is one where deleted rows are not 
needed for application functionality.  So I'm going to update the 
proposed schema there to not include a 'deleted' column. Fortunately 
there's still some time before the question of how to handle deletes 
needs to be fully sorted out.



That said, I have a lot of plans to send improvements down the way of the
existing approach of “soft delete column” into projects, from the querying
POV, so that criteria to filter out soft delete can be done in a much more
robust fashion (see
https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event).
But this is still more complex and less performant than if the rows are
just gone totally, off in a history table somewhere (again, provided you
really don’t need to look at those history rows in an application context,
otherwise it gets all complicated again).

Interesting. I hadn't seen consistency between the two databases as
trumping doing this less horribly, but it sounds like its more of a
thing that I thought.

Thanks,
Michael

--
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-26 Thread Jay Pipes

On 11/25/2014 09:34 PM, Mike Bayer wrote:

On Nov 25, 2014, at 8:15 PM, Ahmed RAHAL ara...@iweb.com wrote:

Hi,

Le 2014-11-24 17:20, Michael Still a écrit :

Heya,

This is a new database, so its our big chance to get this right. So,
ideas welcome...

Some initial proposals:

  - we do what we do in the current nova database -- we have a deleted
column, and we set it to true when we delete the instance.

  - we have shadow tables and we move delete rows to a shadow table.

  - something else super clever I haven't thought of.


Some random thoughts that came to mind ...

1/ as far as I remember, you rarely want to delete a row
- it's usually a heavy DB operation (well, was back then)
- it's destructive (but we may want that)
- it creates fragmentation (less of a problem depending on db engine)
- it can break foreign key relations if not done the proper way


deleting records with foreign key dependencies is a known quantity.  Those 
items are all related and being able to delete everything related is a 
well-solved problem, both via ON DELETE cascades as well as standard ORM 
features.


++


2/ updating a row to 'deleted=1'
- gives an opportunity to set a useful deletion time-stamp
I would even argue that setting the deleted_at field would suffice to declare a row 
'deleted' (as in 'not NULL'). I know, explicit is better than implicit …


the logic that’s used is that “deleted” is set to the primary key of the 
record, this is to allow UNIQUE constraints to be set up that serve on the 
non-deleted rows only (e.g. UNIQUE on “x” + “deleted” is possible when there 
are multiple “deleted” rows with “x”).


Indeed. Because people want to be able to name an instance one thing, 
delete it, and immediately name another instance the same thing. Ugh -- 
what an annoying use case, IMO. Better to just delete the row out of the 
database after archival/audit log of the operation.



- the update operation is not destructive
- an admin/DBA can decide when and how to purge/archive rows

3/ moving the row at deletion
- you want to avoid additional steps to complete an operation, thus avoid 
creating a new record while deleting one
- even if you wrap things into a transaction, not being able to create a row 
somewhere can make your delete transaction fail
- if I were to archive all deleted rows, at scale I'd probably move them to 
another db server altogether


if you’re really “archiving”, I’d just dump out a log of what occurred to a 
textual log file, then you archive the files.  There’s no need for a pure 
“audit trail” to even be in the relational DB.


Precisely. Why is the RDBMS the thing that is used for archival/audit 
logging? Why not a NoSQL store or a centralized log facility? All that 
would be needed would be for us to standardize on the format of the 
archival record, standardize on the things to provide with the archival 
record (for instance system metadata, etc), and then write a simple 
module that would write an archival record to some backend data store.


Then we could rid ourselves of the awfulness of the shadow tables and 
all of the read_deleted=yes crap.


Best,
-jay


Now, I for one would keep the current mark-as-deleted model.

I however perfectly get the problem of massive churn with instance 
creation/deletion.


is there?   inserting and updating rows is a normal thing in relational DBs.



So, let's be crazy, why not have a config option 'on_delete=mark_delete', 
'on_delete=purge' or 'on_delete=archive' and let the admin choose ? (is that 
feasible ?)


I’m -1 on that.  The need for records to be soft-deleted or not, and if those 
soft-deletes need to be accessible in the application, should be decided up 
front.  Adding a multiplicity of options just makes the code that much more 
complicated and fragments its behaviors and test coverage.   The suggestion 
basically tries to avoid making a decision and I think more thought should be 
put into what is truly needed.



This would especially come handy if the admin decides the global cells database 
may not need to keep track of deleted instances, the cell-local nova database 
being the absolute reference for that.


why would an admin decide that this is, or is not, needed?   if the deleted 
data isn’t needed by the live app, it should just be dumped to an archive.  
admins can set how often that archive should be purged, but IMHO the “pipeline” 
of these records should be straight; there shouldn’t be junctions and switches 
that cause there to be multiple data paths.   It leads to too much complexity.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-26 Thread Belmiro Moreira
Hi,
my experience is that soft delete is important to keep record of deleted
instances and its characteristics.
In fact in my organization we are obliged to keep these records for several
months.
However, it would be nice that after few months we were able to purge the
DB with a nova tool.

In the particular case of this cells table my major concern is that having
a delete field maybe means that top and children databases need to be
synchronized. Looking into the current cells design having duplicated
information in different databases is one of the main issues.

Belmiro


On Wed, Nov 26, 2014 at 4:40 PM, Andrew Laski andrew.la...@rackspace.com
wrote:


 On 11/25/2014 11:54 AM, Solly Ross wrote:

 I can't comment on other projects, but Nova definitely needs the soft
 delete in the main nova database. Perhaps not for every table, but
 there is definitely code in the code base which uses it right now.
 Search for read_deleted=True if you're curious.

 Just to save people a bit of time, it's actually `read_deleted='yes'`
 or `read_deleted=yes` for many cases.

 Just to give people a quick overview:

 A cursory glance (no pun intended) seems to indicate that quite a few of
 these are reading potentially deleted flavors.  For this case, it makes
 sense to keep things in one table (as we do).

 There are also quite a few that seem to be making sure deleted things
 are properly cleaned up.  In this case, 'deleted' acts as a CLEANUP
 state, so it makes just as much sense to keep the deleted rows in a
 separate table.

  For this case in particular, the concern is that operators might need
 to find where an instance was running once it is deleted to be able to
 diagnose issues reported by users. I think that's a valid use case of
 this particular data.

  This is a new database, so its our big chance to get this right. So,
 ideas welcome...

 Some initial proposals:

 - we do what we do in the current nova database -- we have a deleted
 column, and we set it to true when we delete the instance.

 - we have shadow tables and we move delete rows to a shadow table.


 Both approaches are viable, but as the soft-delete column is
 widespread, it
 would be thorny for this new app to use some totally different scheme,
 unless the notion is that all schemes should move to the audit table
 approach (which I wouldn’t mind, but it would be a big job).FTR, the
 audit table approach is usually what I prefer for greenfield
 development,
 if all that’s needed is forensic capabilities at the database inspection
 level, and not as much active GUI-based “deleted” flags.   That is, if
 you
 really don’t need to query the history tables very often except when
 debugging an issue offline.  The reason its preferable is because those
 rows are still “deleted” from your main table, and they don’t get in the
 way of querying.   But if you need to refer to these history rows in
 context of the application, that means you need to get them mapped in
 such
 a way that they behave like the primary rows, which overall is a more
 difficult approach than just using the soft delete column.

 I think it does really come down here to how you intend to use the
 soft-delete
 functionality in Cells.  If you just are using it to debug or audit, then
 I think
 the right way to go would be either the audit table (potentially can
 store more
 lifecycle data, but could end up taking up more space) or a separate
 shadow
 table (takes up less space).

 If you are going to use the soft delete for application functionality, I
 would
 consider differentiating between deleted and we still have things left
 to
 clean up, since this seems to be mixing two different requirements into
 one.


 The case that spawned this discussion is one where deleted rows are not
 needed for application functionality.  So I'm going to update the proposed
 schema there to not include a 'deleted' column. Fortunately there's still
 some time before the question of how to handle deletes needs to be fully
 sorted out.


  That said, I have a lot of plans to send improvements down the way of the
 existing approach of “soft delete column” into projects, from the
 querying
 POV, so that criteria to filter out soft delete can be done in a much
 more
 robust fashion (see
 https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-
 heuristic-inspector-event).
 But this is still more complex and less performant than if the rows are
 just gone totally, off in a history table somewhere (again, provided you
 really don’t need to look at those history rows in an application
 context,
 otherwise it gets all complicated again).

 Interesting. I hadn't seen consistency between the two databases as
 trumping doing this less horribly, but it sounds like its more of a
 thing that I thought.

 Thanks,
 Michael

 --
 Rackspace Australia

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 

Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-26 Thread Mike Bayer

 
 Precisely. Why is the RDBMS the thing that is used for archival/audit 
 logging? Why not a NoSQL store or a centralized log facility? All that would 
 be needed would be for us to standardize on the format of the archival 
 record, standardize on the things to provide with the archival record (for 
 instance system metadata, etc), and then write a simple module that would 
 write an archival record to some backend data store.
 
 Then we could rid ourselves of the awfulness of the shadow tables and all of 
 the read_deleted=yes crap.


+1000 - if we’re really looking to “do this right”, as the original message 
suggested, this would be “right”.  If you don’t need these rows in the app (and 
it would be very nice if you didn’t), dump it out to an archive file / 
non-relational datastore.   As mentioned elsewhere, this is entirely acceptable 
for organizations that are “obliged” to store records for auditing purposes.   
Nova even already has a dictionary format for everything set up with nova 
objects, so dumping these dictionaries out as JSON would be the way to go.





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-26 Thread Andrew Laski


On 11/26/2014 03:39 PM, Belmiro Moreira wrote:

Hi,
my experience is that soft delete is important to keep record of 
deleted instances and its characteristics.
In fact in my organization we are obliged to keep these records for 
several months.
However, it would be nice that after few months we were able to purge 
the DB with a nova tool.


I think that any solution for this needs to keep the deleted data 
available in some form.  Is it important for you that the deleted data 
be in the same table as non deleted rows, or could they be moved into 
another table?  And would it matter if the format of the row changed 
during a move?





In the particular case of this cells table my major concern is that 
having a delete field maybe means that top and children databases 
need to be synchronized. Looking into the current cells design having 
duplicated information in different databases is one of the main issues.


Agreed.  I think this can be solved by ensuring that instance deletion 
is only about setting the deleted column in the cell instance table.  
The instance mapping being deleted makes no statement about whether or 
not the instance is deleted, though it would be a bug to delete it 
before the instance was deleted.




Belmiro


On Wed, Nov 26, 2014 at 4:40 PM, Andrew Laski 
andrew.la...@rackspace.com mailto:andrew.la...@rackspace.com wrote:



On 11/25/2014 11:54 AM, Solly Ross wrote:

I can't comment on other projects, but Nova definitely
needs the soft
delete in the main nova database. Perhaps not for every
table, but
there is definitely code in the code base which uses it
right now.
Search for read_deleted=True if you're curious.

Just to save people a bit of time, it's actually
`read_deleted='yes'`
or `read_deleted=yes` for many cases.

Just to give people a quick overview:

A cursory glance (no pun intended) seems to indicate that
quite a few of
these are reading potentially deleted flavors.  For this case,
it makes
sense to keep things in one table (as we do).

There are also quite a few that seem to be making sure deleted
things
are properly cleaned up.  In this case, 'deleted' acts as a
CLEANUP
state, so it makes just as much sense to keep the deleted rows
in a
separate table.

For this case in particular, the concern is that operators
might need
to find where an instance was running once it is deleted
to be able to
diagnose issues reported by users. I think that's a valid
use case of
this particular data.

This is a new database, so its our big chance to
get this right. So,
ideas welcome...

Some initial proposals:

- we do what we do in the current nova database --
we have a deleted
column, and we set it to true when we delete the
instance.

- we have shadow tables and we move delete rows to
a shadow table.


Both approaches are viable, but as the soft-delete
column is widespread, it
would be thorny for this new app to use some totally
different scheme,
unless the notion is that all schemes should move to
the audit table
approach (which I wouldn’t mind, but it would be a big
job).FTR, the
audit table approach is usually what I prefer for
greenfield development,
if all that’s needed is forensic capabilities at the
database inspection
level, and not as much active GUI-based “deleted”
flags.   That is, if you
really don’t need to query the history tables very
often except when
debugging an issue offline.  The reason its preferable
is because those
rows are still “deleted” from your main table, and
they don’t get in the
way of querying.   But if you need to refer to these
history rows in
context of the application, that means you need to get
them mapped in such
a way that they behave like the primary rows, which
overall is a more
difficult approach than just using the soft delete column.

I think it does really come down here to how you intend to use
the soft-delete
functionality in Cells.  If you just are using it to debug or
audit, then I think
the right way to go would be either the audit table
(potentially can store more

Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-26 Thread Joshua Harlow

Mike Bayer wrote:

Precisely. Why is the RDBMS the thing that is used for archival/audit logging? 
Why not a NoSQL store or a centralized log facility? All that would be needed 
would be for us to standardize on the format of the archival record, 
standardize on the things to provide with the archival record (for instance 
system metadata, etc), and then write a simple module that would write an 
archival record to some backend data store.

Then we could rid ourselves of the awfulness of the shadow tables and all of 
the read_deleted=yes crap.



+1000 - if we’re really looking to “do this right”, as the original message 
suggested, this would be “right”.  If you don’t need these rows in the app (and 
it would be very nice if you didn’t), dump it out to an archive file / 
non-relational datastore.   As mentioned elsewhere, this is entirely acceptable 
for organizations that are “obliged” to store records for auditing purposes.   
Nova even already has a dictionary format for everything set up with nova 
objects, so dumping these dictionaries out as JSON would be the way to go.




+ 1001; dump it out to some data warehouse, put it to HDFS, do something 
else with long term storage IMHO; just let's avoid continuing to turn a 
database into a data warehouse, they are really not the same thing and 
don't have the same requirements, constraints ...


I've always been confused why some of the openstack tables tried to do 
both roles with a deleted=1|0 field. The part that has also been 
confusing to me is has anyone actually tried switching a deleted=1 field 
back to deleted=0 without application logic to do this; if so how did u 
manage to pull that off correctly without knowing the inner details of 
the application itself (how did u do this atomically so that the users 
*actively* running against the api would not start to receive weird 
responses and failures...)?






___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-25 Thread Ahmed RAHAL

Hi,

Le 2014-11-24 17:20, Michael Still a écrit :

Heya,

This is a new database, so its our big chance to get this right. So,
ideas welcome...

Some initial proposals:

  - we do what we do in the current nova database -- we have a deleted
column, and we set it to true when we delete the instance.

  - we have shadow tables and we move delete rows to a shadow table.

  - something else super clever I haven't thought of.


Some random thoughts that came to mind ...

1/ as far as I remember, you rarely want to delete a row
- it's usually a heavy DB operation (well, was back then)
- it's destructive (but we may want that)
- it creates fragmentation (less of a problem depending on db engine)
- it can break foreign key relations if not done the proper way

2/ updating a row to 'deleted=1'
- gives an opportunity to set a useful deletion time-stamp
I would even argue that setting the deleted_at field would suffice to 
declare a row 'deleted' (as in 'not NULL'). I know, explicit is better 
than implicit ...

- the update operation is not destructive
- an admin/DBA can decide when and how to purge/archive rows

3/ moving the row at deletion
- you want to avoid additional steps to complete an operation, thus 
avoid creating a new record while deleting one
- even if you wrap things into a transaction, not being able to create a 
row somewhere can make your delete transaction fail
- if I were to archive all deleted rows, at scale I'd probably move them 
to another db server altogether



Now, I for one would keep the current mark-as-deleted model.

I however perfectly get the problem of massive churn with instance 
creation/deletion.
So, let's be crazy, why not have a config option 
'on_delete=mark_delete', 'on_delete=purge' or 'on_delete=archive' and 
let the admin choose ? (is that feasible ?)


This would especially come handy if the admin decides the global cells 
database may not need to keep track of deleted instances, the cell-local 
nova database being the absolute reference for that.


HTH,

Ahmed.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-25 Thread Mike Bayer

 On Nov 25, 2014, at 8:15 PM, Ahmed RAHAL ara...@iweb.com wrote:
 
 Hi,
 
 Le 2014-11-24 17:20, Michael Still a écrit :
 Heya,
 
 This is a new database, so its our big chance to get this right. So,
 ideas welcome...
 
 Some initial proposals:
 
  - we do what we do in the current nova database -- we have a deleted
 column, and we set it to true when we delete the instance.
 
  - we have shadow tables and we move delete rows to a shadow table.
 
  - something else super clever I haven't thought of.
 
 Some random thoughts that came to mind ...
 
 1/ as far as I remember, you rarely want to delete a row
 - it's usually a heavy DB operation (well, was back then)
 - it's destructive (but we may want that)
 - it creates fragmentation (less of a problem depending on db engine)
 - it can break foreign key relations if not done the proper way

deleting records with foreign key dependencies is a known quantity.  Those 
items are all related and being able to delete everything related is a 
well-solved problem, both via ON DELETE cascades as well as standard ORM 
features.


 
 2/ updating a row to 'deleted=1'
 - gives an opportunity to set a useful deletion time-stamp
 I would even argue that setting the deleted_at field would suffice to declare 
 a row 'deleted' (as in 'not NULL'). I know, explicit is better than 
 implicit …

the logic that’s used is that “deleted” is set to the primary key of the 
record, this is to allow UNIQUE constraints to be set up that serve on the 
non-deleted rows only (e.g. UNIQUE on “x” + “deleted” is possible when there 
are multiple “deleted” rows with “x”).

 - the update operation is not destructive
 - an admin/DBA can decide when and how to purge/archive rows
 
 3/ moving the row at deletion
 - you want to avoid additional steps to complete an operation, thus avoid 
 creating a new record while deleting one
 - even if you wrap things into a transaction, not being able to create a row 
 somewhere can make your delete transaction fail
 - if I were to archive all deleted rows, at scale I'd probably move them to 
 another db server altogether

if you’re really “archiving”, I’d just dump out a log of what occurred to a 
textual log file, then you archive the files.  There’s no need for a pure 
“audit trail” to even be in the relational DB.


 Now, I for one would keep the current mark-as-deleted model.
 
 I however perfectly get the problem of massive churn with instance 
 creation/deletion.

is there?   inserting and updating rows is a normal thing in relational DBs.


 So, let's be crazy, why not have a config option 'on_delete=mark_delete', 
 'on_delete=purge' or 'on_delete=archive' and let the admin choose ? (is that 
 feasible ?)

I’m -1 on that.  The need for records to be soft-deleted or not, and if those 
soft-deletes need to be accessible in the application, should be decided up 
front.  Adding a multiplicity of options just makes the code that much more 
complicated and fragments its behaviors and test coverage.   The suggestion 
basically tries to avoid making a decision and I think more thought should be 
put into what is truly needed.


 This would especially come handy if the admin decides the global cells 
 database may not need to keep track of deleted instances, the cell-local nova 
 database being the absolute reference for that.

why would an admin decide that this is, or is not, needed?   if the deleted 
data isn’t needed by the live app, it should just be dumped to an archive.  
admins can set how often that archive should be purged, but IMHO the “pipeline” 
of these records should be straight; there shouldn’t be junctions and switches 
that cause there to be multiple data paths.   It leads to too much complexity.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-24 Thread Michael Still
Heya,

Review https://review.openstack.org/#/c/135644/4 proposes the addition
of a new database for our improved implementation of cells in Nova.
However, there's an outstanding question about how to handle soft
delete of rows -- we believe that we need to soft delete for forensic
purposes.

This is a new database, so its our big chance to get this right. So,
ideas welcome...

Some initial proposals:

 - we do what we do in the current nova database -- we have a deleted
column, and we set it to true when we delete the instance.

 - we have shadow tables and we move delete rows to a shadow table.

 - something else super clever I haven't thought of.

Ideas?

Michael

-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-24 Thread Michael Still
On Tue, Nov 25, 2014 at 11:14 AM, Mike Bayer mba...@redhat.com wrote:

 On Nov 24, 2014, at 5:20 PM, Michael Still mi...@stillhq.com wrote:

 Heya,

 Review https://review.openstack.org/#/c/135644/4 proposes the addition
 of a new database for our improved implementation of cells in Nova.
 However, there's an outstanding question about how to handle soft
 delete of rows -- we believe that we need to soft delete for forensic
 purposes.

 Everytime I talk to people about the soft delete thing, I hear the usual 
 refrain “we thought we needed it, but we didn’t and now it’s just overbuilt 
 cruft we want to get rid of”.

 Not saying you don’t have a need here but you definitely have this need, not 
 just following the herd right?   Soft delete makes things a lot less 
 convenient.

I can't comment on other projects, but Nova definitely needs the soft
delete in the main nova database. Perhaps not for every table, but
there is definitely code in the code base which uses it right now.
Search for read_deleted=True if you're curious.

For this case in particular, the concern is that operators might need
to find where an instance was running once it is deleted to be able to
diagnose issues reported by users. I think that's a valid use case of
this particular data.

 This is a new database, so its our big chance to get this right. So,
 ideas welcome...

 Some initial proposals:

 - we do what we do in the current nova database -- we have a deleted
 column, and we set it to true when we delete the instance.

 - we have shadow tables and we move delete rows to a shadow table.


 Both approaches are viable, but as the soft-delete column is widespread, it 
 would be thorny for this new app to use some totally different scheme, unless 
 the notion is that all schemes should move to the audit table approach (which 
 I wouldn’t mind, but it would be a big job).FTR, the audit table approach 
 is usually what I prefer for greenfield development, if all that’s needed is 
 forensic capabilities at the database inspection level, and not as much 
 active GUI-based “deleted” flags.   That is, if you really don’t need to 
 query the history tables very often except when debugging an issue offline.  
 The reason its preferable is because those rows are still “deleted” from your 
 main table, and they don’t get in the way of querying.   But if you need to 
 refer to these history rows in context of the application, that means you 
 need to get them mapped in such a way that they behave like the primary rows, 
 which overall is a more difficult approach than just using the soft delete 
 column.

 That said, I have a lot of plans to send improvements down the way of the 
 existing approach of “soft delete column” into projects, from the querying 
 POV, so that criteria to filter out soft delete can be done in a much more 
 robust fashion (see 
 https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event).
But this is still more complex and less performant than if the rows are 
 just gone totally, off in a history table somewhere (again, provided you 
 really don’t need to look at those history rows in an application context, 
 otherwise it gets all complicated again).

Interesting. I hadn't seen consistency between the two databases as
trumping doing this less horribly, but it sounds like its more of a
thing that I thought.

Thanks,
Michael

-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-24 Thread Mike Bayer

 On Nov 24, 2014, at 7:32 PM, Michael Still mi...@stillhq.com wrote:
 
 Interesting. I hadn't seen consistency between the two databases as
 trumping doing this less horribly, but it sounds like its more of a
 thing that I thought.

it really depends on what you need to do.  if you need to get a result set of 
all entities, deleted or not, consider the difference between a SELECT for all 
rows from a single table, easy, vs. doing a UNION from primary table to history 
table, matching up all the columns that hopefully do in fact match up 
(awkward), and then dealing with joining out to related tables if you need that 
as well (very awkward from a UNION).

if you have any plans to consume these rows in the app i’d advise just doing it 
like all the other tables.  if we want to change that approach, we’d do it 
en-masse at some point and you’d get it for free.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev