Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-28 Thread Jay Pipes

On 11/27/2014 04:20 PM, Michael Still wrote:

On Fri, Nov 28, 2014 at 2:59 AM, Jay Pipes  wrote:

On 11/26/2014 04:24 PM, Mike Bayer wrote:


Precisely. Why is the RDBMS the thing that is used for
archival/audit logging? Why not a NoSQL store or a centralized log
facility? All that would be needed would be for us to standardize
on the format of the archival record, standardize on the things to
provide with the archival record (for instance system metadata,
etc), and then write a simple module that would write an archival
record to some backend data store.

Then we could rid ourselves of the awfulness of the shadow tables
and all of the read_deleted=yes crap.




+1000 - if we’re really looking to “do this right”, as the original
message suggested, this would be “right”.  If you don’t need these
rows in the app (and it would be very nice if you didn’t), dump it
out to an archive file / non-relational datastore.   As mentioned
elsewhere, this is entirely acceptable for organizations that are
“obliged” to store records for auditing purposes.   Nova even already
has a dictionary format for everything set up with nova objects, so
dumping these dictionaries out as JSON would be the way to go.



OK, spec added:

https://review.openstack.org/137669


At this point I don't think we should block the cells reworking effort
on this spec. I'm happy for people to pursue this, but I think its
unlikely to be work that is completed in kilo. We can transition the
new cells databases at the same time we fix the main database.


No disagreement at all. The proposed spec is a monster one, and we can 
certainly make a lot of progress in Kilo, but I wouldn't expect it to be 
completed any time soon.


Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-27 Thread Michael Still
On Fri, Nov 28, 2014 at 2:59 AM, Jay Pipes  wrote:
> On 11/26/2014 04:24 PM, Mike Bayer wrote:
>>>
>>> Precisely. Why is the RDBMS the thing that is used for
>>> archival/audit logging? Why not a NoSQL store or a centralized log
>>> facility? All that would be needed would be for us to standardize
>>> on the format of the archival record, standardize on the things to
>>> provide with the archival record (for instance system metadata,
>>> etc), and then write a simple module that would write an archival
>>> record to some backend data store.
>>>
>>> Then we could rid ourselves of the awfulness of the shadow tables
>>> and all of the read_deleted=yes crap.
>>
>>
>>
>> +1000 - if we’re really looking to “do this right”, as the original
>> message suggested, this would be “right”.  If you don’t need these
>> rows in the app (and it would be very nice if you didn’t), dump it
>> out to an archive file / non-relational datastore.   As mentioned
>> elsewhere, this is entirely acceptable for organizations that are
>> “obliged” to store records for auditing purposes.   Nova even already
>> has a dictionary format for everything set up with nova objects, so
>> dumping these dictionaries out as JSON would be the way to go.
>
>
> OK, spec added:
>
> https://review.openstack.org/137669

At this point I don't think we should block the cells reworking effort
on this spec. I'm happy for people to pursue this, but I think its
unlikely to be work that is completed in kilo. We can transition the
new cells databases at the same time we fix the main database.

Michael

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-27 Thread Jay Pipes

On 11/26/2014 04:24 PM, Mike Bayer wrote:

Precisely. Why is the RDBMS the thing that is used for
archival/audit logging? Why not a NoSQL store or a centralized log
facility? All that would be needed would be for us to standardize
on the format of the archival record, standardize on the things to
provide with the archival record (for instance system metadata,
etc), and then write a simple module that would write an archival
record to some backend data store.

Then we could rid ourselves of the awfulness of the shadow tables
and all of the read_deleted=yes crap.



+1000 - if we’re really looking to “do this right”, as the original
message suggested, this would be “right”.  If you don’t need these
rows in the app (and it would be very nice if you didn’t), dump it
out to an archive file / non-relational datastore.   As mentioned
elsewhere, this is entirely acceptable for organizations that are
“obliged” to store records for auditing purposes.   Nova even already
has a dictionary format for everything set up with nova objects, so
dumping these dictionaries out as JSON would be the way to go.


OK, spec added:

https://review.openstack.org/137669

Best,
-jay

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-26 Thread Joshua Harlow

Mike Bayer wrote:

Precisely. Why is the RDBMS the thing that is used for archival/audit logging? 
Why not a NoSQL store or a centralized log facility? All that would be needed 
would be for us to standardize on the format of the archival record, 
standardize on the things to provide with the archival record (for instance 
system metadata, etc), and then write a simple module that would write an 
archival record to some backend data store.

Then we could rid ourselves of the awfulness of the shadow tables and all of 
the read_deleted=yes crap.



+1000 - if we’re really looking to “do this right”, as the original message 
suggested, this would be “right”.  If you don’t need these rows in the app (and 
it would be very nice if you didn’t), dump it out to an archive file / 
non-relational datastore.   As mentioned elsewhere, this is entirely acceptable 
for organizations that are “obliged” to store records for auditing purposes.   
Nova even already has a dictionary format for everything set up with nova 
objects, so dumping these dictionaries out as JSON would be the way to go.




+ 1001; dump it out to some data warehouse, put it to HDFS, do something 
else with long term storage IMHO; just let's avoid continuing to turn a 
database into a data warehouse, they are really not the same thing and 
don't have the same requirements, constraints ...


I've always been confused why some of the openstack tables tried to do 
both roles with a deleted=1|0 field. The part that has also been 
confusing to me is has anyone actually tried switching a deleted=1 field 
back to deleted=0 without application logic to do this; if so how did u 
manage to pull that off correctly without knowing the inner details of 
the application itself (how did u do this atomically so that the users 
*actively* running against the api would not start to receive weird 
responses and failures...)?






___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-26 Thread Andrew Laski


On 11/26/2014 03:39 PM, Belmiro Moreira wrote:

Hi,
my experience is that "soft delete" is important to keep record of 
deleted instances and its characteristics.
In fact in my organization we are obliged to keep these records for 
several months.
However, it would be nice that after few months we were able to purge 
the DB with a nova tool.


I think that any solution for this needs to keep the deleted data 
available in some form.  Is it important for you that the deleted data 
be in the same table as non deleted rows, or could they be moved into 
another table?  And would it matter if the format of the row changed 
during a move?





In the particular case of this cells table my major concern is that 
having a "delete" field maybe means that top and children databases 
need to be synchronized. Looking into the current cells design having 
duplicated information in different databases is one of the main issues.


Agreed.  I think this can be solved by ensuring that instance deletion 
is only about setting the deleted column in the cell instance table.  
The instance mapping being deleted makes no statement about whether or 
not the instance is deleted, though it would be a bug to delete it 
before the instance was deleted.




Belmiro


On Wed, Nov 26, 2014 at 4:40 PM, Andrew Laski 
mailto:andrew.la...@rackspace.com>> wrote:



On 11/25/2014 11:54 AM, Solly Ross wrote:

I can't comment on other projects, but Nova definitely
needs the soft
delete in the main nova database. Perhaps not for every
table, but
there is definitely code in the code base which uses it
right now.
Search for read_deleted=True if you're curious.

Just to save people a bit of time, it's actually
`read_deleted='yes'`
or `read_deleted="yes"` for many cases.

Just to give people a quick overview:

A cursory glance (no pun intended) seems to indicate that
quite a few of
these are reading potentially deleted flavors.  For this case,
it makes
sense to keep things in one table (as we do).

There are also quite a few that seem to be making sure deleted
"things"
are properly cleaned up.  In this case, 'deleted' acts as a
"CLEANUP"
state, so it makes just as much sense to keep the deleted rows
in a
separate table.

For this case in particular, the concern is that operators
might need
to find where an instance was running once it is deleted
to be able to
diagnose issues reported by users. I think that's a valid
use case of
this particular data.

This is a new database, so its our big chance to
get this right. So,
ideas welcome...

Some initial proposals:

- we do what we do in the current nova database --
we have a deleted
column, and we set it to true when we delete the
instance.

- we have shadow tables and we move delete rows to
a shadow table.


Both approaches are viable, but as the soft-delete
column is widespread, it
would be thorny for this new app to use some totally
different scheme,
unless the notion is that all schemes should move to
the audit table
approach (which I wouldn’t mind, but it would be a big
job).FTR, the
audit table approach is usually what I prefer for
greenfield development,
if all that’s needed is forensic capabilities at the
database inspection
level, and not as much active GUI-based “deleted”
flags.   That is, if you
really don’t need to query the history tables very
often except when
debugging an issue offline.  The reason its preferable
is because those
rows are still “deleted” from your main table, and
they don’t get in the
way of querying.   But if you need to refer to these
history rows in
context of the application, that means you need to get
them mapped in such
a way that they behave like the primary rows, which
overall is a more
difficult approach than just using the soft delete column.

I think it does really come down here to how you intend to use
the soft-delete
functionality in Cells.  If you just are using it to debug or
audit, then I think
the right way to go would be either the audit table
(potentially can store more
lifecyc

Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-26 Thread Mike Bayer

> 
> Precisely. Why is the RDBMS the thing that is used for archival/audit 
> logging? Why not a NoSQL store or a centralized log facility? All that would 
> be needed would be for us to standardize on the format of the archival 
> record, standardize on the things to provide with the archival record (for 
> instance system metadata, etc), and then write a simple module that would 
> write an archival record to some backend data store.
> 
> Then we could rid ourselves of the awfulness of the shadow tables and all of 
> the read_deleted=yes crap.


+1000 - if we’re really looking to “do this right”, as the original message 
suggested, this would be “right”.  If you don’t need these rows in the app (and 
it would be very nice if you didn’t), dump it out to an archive file / 
non-relational datastore.   As mentioned elsewhere, this is entirely acceptable 
for organizations that are “obliged” to store records for auditing purposes.   
Nova even already has a dictionary format for everything set up with nova 
objects, so dumping these dictionaries out as JSON would be the way to go.





___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-26 Thread Belmiro Moreira
Hi,
my experience is that "soft delete" is important to keep record of deleted
instances and its characteristics.
In fact in my organization we are obliged to keep these records for several
months.
However, it would be nice that after few months we were able to purge the
DB with a nova tool.

In the particular case of this cells table my major concern is that having
a "delete" field maybe means that top and children databases need to be
synchronized. Looking into the current cells design having duplicated
information in different databases is one of the main issues.

Belmiro


On Wed, Nov 26, 2014 at 4:40 PM, Andrew Laski 
wrote:

>
> On 11/25/2014 11:54 AM, Solly Ross wrote:
>
>> I can't comment on other projects, but Nova definitely needs the soft
>>> delete in the main nova database. Perhaps not for every table, but
>>> there is definitely code in the code base which uses it right now.
>>> Search for read_deleted=True if you're curious.
>>>
>> Just to save people a bit of time, it's actually `read_deleted='yes'`
>> or `read_deleted="yes"` for many cases.
>>
>> Just to give people a quick overview:
>>
>> A cursory glance (no pun intended) seems to indicate that quite a few of
>> these are reading potentially deleted flavors.  For this case, it makes
>> sense to keep things in one table (as we do).
>>
>> There are also quite a few that seem to be making sure deleted "things"
>> are properly cleaned up.  In this case, 'deleted' acts as a "CLEANUP"
>> state, so it makes just as much sense to keep the deleted rows in a
>> separate table.
>>
>>  For this case in particular, the concern is that operators might need
>>> to find where an instance was running once it is deleted to be able to
>>> diagnose issues reported by users. I think that's a valid use case of
>>> this particular data.
>>>
>>>  This is a new database, so its our big chance to get this right. So,
> ideas welcome...
>
> Some initial proposals:
>
> - we do what we do in the current nova database -- we have a deleted
> column, and we set it to true when we delete the instance.
>
> - we have shadow tables and we move delete rows to a shadow table.
>

 Both approaches are viable, but as the soft-delete column is
 widespread, it
 would be thorny for this new app to use some totally different scheme,
 unless the notion is that all schemes should move to the audit table
 approach (which I wouldn’t mind, but it would be a big job).FTR, the
 audit table approach is usually what I prefer for greenfield
 development,
 if all that’s needed is forensic capabilities at the database inspection
 level, and not as much active GUI-based “deleted” flags.   That is, if
 you
 really don’t need to query the history tables very often except when
 debugging an issue offline.  The reason its preferable is because those
 rows are still “deleted” from your main table, and they don’t get in the
 way of querying.   But if you need to refer to these history rows in
 context of the application, that means you need to get them mapped in
 such
 a way that they behave like the primary rows, which overall is a more
 difficult approach than just using the soft delete column.

>>> I think it does really come down here to how you intend to use the
>> soft-delete
>> functionality in Cells.  If you just are using it to debug or audit, then
>> I think
>> the right way to go would be either the audit table (potentially can
>> store more
>> lifecycle data, but could end up taking up more space) or a separate
>> shadow
>> table (takes up less space).
>>
>> If you are going to use the soft delete for application functionality, I
>> would
>> consider differentiating between "deleted" and "we still have things left
>> to
>> clean up", since this seems to be mixing two different requirements into
>> one.
>>
>
> The case that spawned this discussion is one where deleted rows are not
> needed for application functionality.  So I'm going to update the proposed
> schema there to not include a 'deleted' column. Fortunately there's still
> some time before the question of how to handle deletes needs to be fully
> sorted out.
>
>
>  That said, I have a lot of plans to send improvements down the way of the
 existing approach of “soft delete column” into projects, from the
 querying
 POV, so that criteria to filter out soft delete can be done in a much
 more
 robust fashion (see
 https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-
 heuristic-inspector-event).
 But this is still more complex and less performant than if the rows are
 just gone totally, off in a history table somewhere (again, provided you
 really don’t need to look at those history rows in an application
 context,
 otherwise it gets all complicated again).

>>> Interesting. I hadn't seen consistency between the two databases as
>>> trumping doing this less horr

Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-26 Thread Jay Pipes

On 11/25/2014 09:34 PM, Mike Bayer wrote:

On Nov 25, 2014, at 8:15 PM, Ahmed RAHAL  wrote:

Hi,

Le 2014-11-24 17:20, Michael Still a écrit :

Heya,

This is a new database, so its our big chance to get this right. So,
ideas welcome...

Some initial proposals:

  - we do what we do in the current nova database -- we have a deleted
column, and we set it to true when we delete the instance.

  - we have shadow tables and we move delete rows to a shadow table.

  - something else super clever I haven't thought of.


Some random thoughts that came to mind ...

1/ as far as I remember, you rarely want to delete a row
- it's usually a heavy DB operation (well, was back then)
- it's destructive (but we may want that)
- it creates fragmentation (less of a problem depending on db engine)
- it can break foreign key relations if not done the proper way


deleting records with foreign key dependencies is a known quantity.  Those 
items are all related and being able to delete everything related is a 
well-solved problem, both via ON DELETE cascades as well as standard ORM 
features.


++


2/ updating a row to 'deleted=1'
- gives an opportunity to set a useful deletion time-stamp
I would even argue that setting the deleted_at field would suffice to declare a row 
'deleted' (as in 'not NULL'). I know, "explicit is better than implicit" …


the logic that’s used is that “deleted” is set to the primary key of the 
record, this is to allow UNIQUE constraints to be set up that serve on the 
non-deleted rows only (e.g. UNIQUE on “x” + “deleted” is possible when there 
are multiple “deleted” rows with “x”).


Indeed. Because people want to be able to name an instance one thing, 
delete it, and immediately name another instance the same thing. Ugh -- 
what an annoying use case, IMO. Better to just delete the row out of the 
database after archival/audit log of the operation.



- the update operation is not destructive
- an admin/DBA can decide when and how to purge/archive rows

3/ moving the row at deletion
- you want to avoid additional steps to complete an operation, thus avoid 
creating a new record while deleting one
- even if you wrap things into a transaction, not being able to create a row 
somewhere can make your delete transaction fail
- if I were to archive all deleted rows, at scale I'd probably move them to 
another db server altogether


if you’re really “archiving”, I’d just dump out a log of what occurred to a 
textual log file, then you archive the files.  There’s no need for a pure 
“audit trail” to even be in the relational DB.


Precisely. Why is the RDBMS the thing that is used for archival/audit 
logging? Why not a NoSQL store or a centralized log facility? All that 
would be needed would be for us to standardize on the format of the 
archival record, standardize on the things to provide with the archival 
record (for instance system metadata, etc), and then write a simple 
module that would write an archival record to some backend data store.


Then we could rid ourselves of the awfulness of the shadow tables and 
all of the read_deleted=yes crap.


Best,
-jay


Now, I for one would keep the current mark-as-deleted model.

I however perfectly get the problem of massive churn with instance 
creation/deletion.


is there?   inserting and updating rows is a normal thing in relational DBs.



So, let's be crazy, why not have a config option 'on_delete=mark_delete', 
'on_delete=purge' or 'on_delete=archive' and let the admin choose ? (is that 
feasible ?)


I’m -1 on that.  The need for records to be soft-deleted or not, and if those 
soft-deletes need to be accessible in the application, should be decided up 
front.  Adding a multiplicity of options just makes the code that much more 
complicated and fragments its behaviors and test coverage.   The suggestion 
basically tries to avoid making a decision and I think more thought should be 
put into what is truly needed.



This would especially come handy if the admin decides the global cells database 
may not need to keep track of deleted instances, the cell-local nova database 
being the absolute reference for that.


why would an admin decide that this is, or is not, needed?   if the deleted 
data isn’t needed by the live app, it should just be dumped to an archive.  
admins can set how often that archive should be purged, but IMHO the “pipeline” 
of these records should be straight; there shouldn’t be junctions and switches 
that cause there to be multiple data paths.   It leads to too much complexity.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-26 Thread Andrew Laski


On 11/25/2014 11:54 AM, Solly Ross wrote:

I can't comment on other projects, but Nova definitely needs the soft
delete in the main nova database. Perhaps not for every table, but
there is definitely code in the code base which uses it right now.
Search for read_deleted=True if you're curious.

Just to save people a bit of time, it's actually `read_deleted='yes'`
or `read_deleted="yes"` for many cases.

Just to give people a quick overview:

A cursory glance (no pun intended) seems to indicate that quite a few of
these are reading potentially deleted flavors.  For this case, it makes
sense to keep things in one table (as we do).

There are also quite a few that seem to be making sure deleted "things"
are properly cleaned up.  In this case, 'deleted' acts as a "CLEANUP"
state, so it makes just as much sense to keep the deleted rows in a
separate table.


For this case in particular, the concern is that operators might need
to find where an instance was running once it is deleted to be able to
diagnose issues reported by users. I think that's a valid use case of
this particular data.


This is a new database, so its our big chance to get this right. So,
ideas welcome...

Some initial proposals:

- we do what we do in the current nova database -- we have a deleted
column, and we set it to true when we delete the instance.

- we have shadow tables and we move delete rows to a shadow table.


Both approaches are viable, but as the soft-delete column is widespread, it
would be thorny for this new app to use some totally different scheme,
unless the notion is that all schemes should move to the audit table
approach (which I wouldn’t mind, but it would be a big job).FTR, the
audit table approach is usually what I prefer for greenfield development,
if all that’s needed is forensic capabilities at the database inspection
level, and not as much active GUI-based “deleted” flags.   That is, if you
really don’t need to query the history tables very often except when
debugging an issue offline.  The reason its preferable is because those
rows are still “deleted” from your main table, and they don’t get in the
way of querying.   But if you need to refer to these history rows in
context of the application, that means you need to get them mapped in such
a way that they behave like the primary rows, which overall is a more
difficult approach than just using the soft delete column.

I think it does really come down here to how you intend to use the soft-delete
functionality in Cells.  If you just are using it to debug or audit, then I 
think
the right way to go would be either the audit table (potentially can store more
lifecycle data, but could end up taking up more space) or a separate shadow
table (takes up less space).

If you are going to use the soft delete for application functionality, I would
consider differentiating between "deleted" and "we still have things left to
clean up", since this seems to be mixing two different requirements into one.


The case that spawned this discussion is one where deleted rows are not 
needed for application functionality.  So I'm going to update the 
proposed schema there to not include a 'deleted' column. Fortunately 
there's still some time before the question of how to handle deletes 
needs to be fully sorted out.



That said, I have a lot of plans to send improvements down the way of the
existing approach of “soft delete column” into projects, from the querying
POV, so that criteria to filter out soft delete can be done in a much more
robust fashion (see
https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event).
But this is still more complex and less performant than if the rows are
just gone totally, off in a history table somewhere (again, provided you
really don’t need to look at those history rows in an application context,
otherwise it gets all complicated again).

Interesting. I hadn't seen consistency between the two databases as
trumping doing this less horribly, but it sounds like its more of a
thing that I thought.

Thanks,
Michael

--
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-25 Thread Mike Bayer

> On Nov 25, 2014, at 8:15 PM, Ahmed RAHAL  wrote:
> 
> Hi,
> 
> Le 2014-11-24 17:20, Michael Still a écrit :
>> Heya,
>> 
>> This is a new database, so its our big chance to get this right. So,
>> ideas welcome...
>> 
>> Some initial proposals:
>> 
>>  - we do what we do in the current nova database -- we have a deleted
>> column, and we set it to true when we delete the instance.
>> 
>>  - we have shadow tables and we move delete rows to a shadow table.
>> 
>>  - something else super clever I haven't thought of.
> 
> Some random thoughts that came to mind ...
> 
> 1/ as far as I remember, you rarely want to delete a row
> - it's usually a heavy DB operation (well, was back then)
> - it's destructive (but we may want that)
> - it creates fragmentation (less of a problem depending on db engine)
> - it can break foreign key relations if not done the proper way

deleting records with foreign key dependencies is a known quantity.  Those 
items are all related and being able to delete everything related is a 
well-solved problem, both via ON DELETE cascades as well as standard ORM 
features.


> 
> 2/ updating a row to 'deleted=1'
> - gives an opportunity to set a useful deletion time-stamp
> I would even argue that setting the deleted_at field would suffice to declare 
> a row 'deleted' (as in 'not NULL'). I know, "explicit is better than 
> implicit" …

the logic that’s used is that “deleted” is set to the primary key of the 
record, this is to allow UNIQUE constraints to be set up that serve on the 
non-deleted rows only (e.g. UNIQUE on “x” + “deleted” is possible when there 
are multiple “deleted” rows with “x”).

> - the update operation is not destructive
> - an admin/DBA can decide when and how to purge/archive rows
> 
> 3/ moving the row at deletion
> - you want to avoid additional steps to complete an operation, thus avoid 
> creating a new record while deleting one
> - even if you wrap things into a transaction, not being able to create a row 
> somewhere can make your delete transaction fail
> - if I were to archive all deleted rows, at scale I'd probably move them to 
> another db server altogether

if you’re really “archiving”, I’d just dump out a log of what occurred to a 
textual log file, then you archive the files.  There’s no need for a pure 
“audit trail” to even be in the relational DB.


> Now, I for one would keep the current mark-as-deleted model.
> 
> I however perfectly get the problem of massive churn with instance 
> creation/deletion.

is there?   inserting and updating rows is a normal thing in relational DBs.


> So, let's be crazy, why not have a config option 'on_delete=mark_delete', 
> 'on_delete=purge' or 'on_delete=archive' and let the admin choose ? (is that 
> feasible ?)

I’m -1 on that.  The need for records to be soft-deleted or not, and if those 
soft-deletes need to be accessible in the application, should be decided up 
front.  Adding a multiplicity of options just makes the code that much more 
complicated and fragments its behaviors and test coverage.   The suggestion 
basically tries to avoid making a decision and I think more thought should be 
put into what is truly needed.


> This would especially come handy if the admin decides the global cells 
> database may not need to keep track of deleted instances, the cell-local nova 
> database being the absolute reference for that.

why would an admin decide that this is, or is not, needed?   if the deleted 
data isn’t needed by the live app, it should just be dumped to an archive.  
admins can set how often that archive should be purged, but IMHO the “pipeline” 
of these records should be straight; there shouldn’t be junctions and switches 
that cause there to be multiple data paths.   It leads to too much complexity.


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-25 Thread Ahmed RAHAL

Hi,

Le 2014-11-24 17:20, Michael Still a écrit :

Heya,

This is a new database, so its our big chance to get this right. So,
ideas welcome...

Some initial proposals:

  - we do what we do in the current nova database -- we have a deleted
column, and we set it to true when we delete the instance.

  - we have shadow tables and we move delete rows to a shadow table.

  - something else super clever I haven't thought of.


Some random thoughts that came to mind ...

1/ as far as I remember, you rarely want to delete a row
- it's usually a heavy DB operation (well, was back then)
- it's destructive (but we may want that)
- it creates fragmentation (less of a problem depending on db engine)
- it can break foreign key relations if not done the proper way

2/ updating a row to 'deleted=1'
- gives an opportunity to set a useful deletion time-stamp
I would even argue that setting the deleted_at field would suffice to 
declare a row 'deleted' (as in 'not NULL'). I know, "explicit is better 
than implicit" ...

- the update operation is not destructive
- an admin/DBA can decide when and how to purge/archive rows

3/ moving the row at deletion
- you want to avoid additional steps to complete an operation, thus 
avoid creating a new record while deleting one
- even if you wrap things into a transaction, not being able to create a 
row somewhere can make your delete transaction fail
- if I were to archive all deleted rows, at scale I'd probably move them 
to another db server altogether



Now, I for one would keep the current mark-as-deleted model.

I however perfectly get the problem of massive churn with instance 
creation/deletion.
So, let's be crazy, why not have a config option 
'on_delete=mark_delete', 'on_delete=purge' or 'on_delete=archive' and 
let the admin choose ? (is that feasible ?)


This would especially come handy if the admin decides the global cells 
database may not need to keep track of deleted instances, the cell-local 
nova database being the absolute reference for that.


HTH,

Ahmed.

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-25 Thread Solly Ross
> I can't comment on other projects, but Nova definitely needs the soft
> delete in the main nova database. Perhaps not for every table, but
> there is definitely code in the code base which uses it right now.
> Search for read_deleted=True if you're curious.

Just to save people a bit of time, it's actually `read_deleted='yes'`
or `read_deleted="yes"` for many cases.

Just to give people a quick overview:

A cursory glance (no pun intended) seems to indicate that quite a few of
these are reading potentially deleted flavors.  For this case, it makes
sense to keep things in one table (as we do).

There are also quite a few that seem to be making sure deleted "things"
are properly cleaned up.  In this case, 'deleted' acts as a "CLEANUP"
state, so it makes just as much sense to keep the deleted rows in a
separate table.

> 
> For this case in particular, the concern is that operators might need
> to find where an instance was running once it is deleted to be able to
> diagnose issues reported by users. I think that's a valid use case of
> this particular data.
> 
> >> This is a new database, so its our big chance to get this right. So,
> >> ideas welcome...
> >>
> >> Some initial proposals:
> >>
> >> - we do what we do in the current nova database -- we have a deleted
> >> column, and we set it to true when we delete the instance.
> >>
> >> - we have shadow tables and we move delete rows to a shadow table.
> >
> >
> > Both approaches are viable, but as the soft-delete column is widespread, it
> > would be thorny for this new app to use some totally different scheme,
> > unless the notion is that all schemes should move to the audit table
> > approach (which I wouldn’t mind, but it would be a big job).FTR, the
> > audit table approach is usually what I prefer for greenfield development,
> > if all that’s needed is forensic capabilities at the database inspection
> > level, and not as much active GUI-based “deleted” flags.   That is, if you
> > really don’t need to query the history tables very often except when
> > debugging an issue offline.  The reason its preferable is because those
> > rows are still “deleted” from your main table, and they don’t get in the
> > way of querying.   But if you need to refer to these history rows in
> > context of the application, that means you need to get them mapped in such
> > a way that they behave like the primary rows, which overall is a more
> > difficult approach than just using the soft delete column.

I think it does really come down here to how you intend to use the soft-delete
functionality in Cells.  If you just are using it to debug or audit, then I 
think
the right way to go would be either the audit table (potentially can store more
lifecycle data, but could end up taking up more space) or a separate shadow
table (takes up less space).

If you are going to use the soft delete for application functionality, I would
consider differentiating between "deleted" and "we still have things left to
clean up", since this seems to be mixing two different requirements into one.

> >
> > That said, I have a lot of plans to send improvements down the way of the
> > existing approach of “soft delete column” into projects, from the querying
> > POV, so that criteria to filter out soft delete can be done in a much more
> > robust fashion (see
> > https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event).
> > But this is still more complex and less performant than if the rows are
> > just gone totally, off in a history table somewhere (again, provided you
> > really don’t need to look at those history rows in an application context,
> > otherwise it gets all complicated again).
> 
> Interesting. I hadn't seen consistency between the two databases as
> trumping doing this less horribly, but it sounds like its more of a
> thing that I thought.
> 
> Thanks,
> Michael
> 
> --
> Rackspace Australia
> 
> ___
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-24 Thread Mike Bayer

> On Nov 24, 2014, at 7:32 PM, Michael Still  wrote:
> 
> Interesting. I hadn't seen consistency between the two databases as
> trumping doing this less horribly, but it sounds like its more of a
> thing that I thought.

it really depends on what you need to do.  if you need to get a result set of 
all entities, deleted or not, consider the difference between a SELECT for all 
rows from a single table, easy, vs. doing a UNION from primary table to history 
table, matching up all the columns that hopefully do in fact match up 
(awkward), and then dealing with joining out to related tables if you need that 
as well (very awkward from a UNION).

if you have any plans to consume these rows in the app i’d advise just doing it 
like all the other tables.  if we want to change that approach, we’d do it 
en-masse at some point and you’d get it for free.



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-24 Thread Michael Still
On Tue, Nov 25, 2014 at 11:14 AM, Mike Bayer  wrote:
>
>> On Nov 24, 2014, at 5:20 PM, Michael Still  wrote:
>>
>> Heya,
>>
>> Review https://review.openstack.org/#/c/135644/4 proposes the addition
>> of a new database for our improved implementation of cells in Nova.
>> However, there's an outstanding question about how to handle soft
>> delete of rows -- we believe that we need to soft delete for forensic
>> purposes.
>
> Everytime I talk to people about the soft delete thing, I hear the usual 
> refrain “we thought we needed it, but we didn’t and now it’s just overbuilt 
> cruft we want to get rid of”.
>
> Not saying you don’t have a need here but you definitely have this need, not 
> just following the herd right?   Soft delete makes things a lot less 
> convenient.

I can't comment on other projects, but Nova definitely needs the soft
delete in the main nova database. Perhaps not for every table, but
there is definitely code in the code base which uses it right now.
Search for read_deleted=True if you're curious.

For this case in particular, the concern is that operators might need
to find where an instance was running once it is deleted to be able to
diagnose issues reported by users. I think that's a valid use case of
this particular data.

>> This is a new database, so its our big chance to get this right. So,
>> ideas welcome...
>>
>> Some initial proposals:
>>
>> - we do what we do in the current nova database -- we have a deleted
>> column, and we set it to true when we delete the instance.
>>
>> - we have shadow tables and we move delete rows to a shadow table.
>
>
> Both approaches are viable, but as the soft-delete column is widespread, it 
> would be thorny for this new app to use some totally different scheme, unless 
> the notion is that all schemes should move to the audit table approach (which 
> I wouldn’t mind, but it would be a big job).FTR, the audit table approach 
> is usually what I prefer for greenfield development, if all that’s needed is 
> forensic capabilities at the database inspection level, and not as much 
> active GUI-based “deleted” flags.   That is, if you really don’t need to 
> query the history tables very often except when debugging an issue offline.  
> The reason its preferable is because those rows are still “deleted” from your 
> main table, and they don’t get in the way of querying.   But if you need to 
> refer to these history rows in context of the application, that means you 
> need to get them mapped in such a way that they behave like the primary rows, 
> which overall is a more difficult approach than just using the soft delete 
> column.
>
> That said, I have a lot of plans to send improvements down the way of the 
> existing approach of “soft delete column” into projects, from the querying 
> POV, so that criteria to filter out soft delete can be done in a much more 
> robust fashion (see 
> https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event).
>But this is still more complex and less performant than if the rows are 
> just gone totally, off in a history table somewhere (again, provided you 
> really don’t need to look at those history rows in an application context, 
> otherwise it gets all complicated again).

Interesting. I hadn't seen consistency between the two databases as
trumping doing this less horribly, but it sounds like its more of a
thing that I thought.

Thanks,
Michael

-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-24 Thread Mike Bayer

> On Nov 24, 2014, at 5:20 PM, Michael Still  wrote:
> 
> Heya,
> 
> Review https://review.openstack.org/#/c/135644/4 proposes the addition
> of a new database for our improved implementation of cells in Nova.
> However, there's an outstanding question about how to handle soft
> delete of rows -- we believe that we need to soft delete for forensic
> purposes.

Everytime I talk to people about the soft delete thing, I hear the usual 
refrain “we thought we needed it, but we didn’t and now it’s just overbuilt 
cruft we want to get rid of”.

Not saying you don’t have a need here but you definitely have this need, not 
just following the herd right?   Soft delete makes things a lot less convenient.

> 
> This is a new database, so its our big chance to get this right. So,
> ideas welcome...
> 
> Some initial proposals:
> 
> - we do what we do in the current nova database -- we have a deleted
> column, and we set it to true when we delete the instance.
> 
> - we have shadow tables and we move delete rows to a shadow table.


Both approaches are viable, but as the soft-delete column is widespread, it 
would be thorny for this new app to use some totally different scheme, unless 
the notion is that all schemes should move to the audit table approach (which I 
wouldn’t mind, but it would be a big job).FTR, the audit table approach is 
usually what I prefer for greenfield development, if all that’s needed is 
forensic capabilities at the database inspection level, and not as much active 
GUI-based “deleted” flags.   That is, if you really don’t need to query the 
history tables very often except when debugging an issue offline.  The reason 
its preferable is because those rows are still “deleted” from your main table, 
and they don’t get in the way of querying.   But if you need to refer to these 
history rows in context of the application, that means you need to get them 
mapped in such a way that they behave like the primary rows, which overall is a 
more difficult approach than just using the soft delete column.

That said, I have a lot of plans to send improvements down the way of the 
existing approach of “soft delete column” into projects, from the querying POV, 
so that criteria to filter out soft delete can be done in a much more robust 
fashion (see 
https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event).
   But this is still more complex and less performant than if the rows are just 
gone totally, off in a history table somewhere (again, provided you really 
don’t need to look at those history rows in an application context, otherwise 
it gets all complicated again).



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database

2014-11-24 Thread Kevin L. Mitchell
On Tue, 2014-11-25 at 09:20 +1100, Michael Still wrote:
>  - we do what we do in the current nova database -- we have a deleted
> column, and we set it to true when we delete the instance.

Actually, current nova uses the
oslo.db.sqlalchemy.models.SoftDeleteMixin class, which defines the
columns 'deleted_at' (a DateTime) and 'deleted' (an *integer*).  It also
defines a 'soft_delete()' method, which sets the 'deleted' column to the
row 'id'.  As I understand it, this is to keep from breaking uniqueness
constraints; you factor in 'deleted' in your uniqueness constraint, and
you can have as many identical deleted records as you want…

>  - we have shadow tables and we move delete rows to a shadow table.
> 
>  - something else super clever I haven't thought of.

Well, one thought might be to create a single 'audit' table with a
couple of columns—a timestamp, say, and some sort of description of the
change, perhaps as a JSON object.  On a 'delete' operation, you could
store the values of the row into this audit table.

From an operator's standpoint, this could provide the required auditing
and perhaps even a limited DR solution, while centralizing the data you
need to monitor in a single location, which makes it easier to trim the
data at intervals as needed.  While I've proposed this as a soft-delete
solution, it would also provide the ability to record other changes to
objects; one could even include a column to record who performed the
change.  And of course I've suggested this as a DB table, but we could
also consider the merits of ditching the table and doing the same thing
as some sort of notification through the notifications system…
-- 
Kevin L. Mitchell 
Rackspace


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev