Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
On 11/27/2014 04:20 PM, Michael Still wrote: On Fri, Nov 28, 2014 at 2:59 AM, Jay Pipes wrote: On 11/26/2014 04:24 PM, Mike Bayer wrote: Precisely. Why is the RDBMS the thing that is used for archival/audit logging? Why not a NoSQL store or a centralized log facility? All that would be needed would be for us to standardize on the format of the archival record, standardize on the things to provide with the archival record (for instance system metadata, etc), and then write a simple module that would write an archival record to some backend data store. Then we could rid ourselves of the awfulness of the shadow tables and all of the read_deleted=yes crap. +1000 - if we’re really looking to “do this right”, as the original message suggested, this would be “right”. If you don’t need these rows in the app (and it would be very nice if you didn’t), dump it out to an archive file / non-relational datastore. As mentioned elsewhere, this is entirely acceptable for organizations that are “obliged” to store records for auditing purposes. Nova even already has a dictionary format for everything set up with nova objects, so dumping these dictionaries out as JSON would be the way to go. OK, spec added: https://review.openstack.org/137669 At this point I don't think we should block the cells reworking effort on this spec. I'm happy for people to pursue this, but I think its unlikely to be work that is completed in kilo. We can transition the new cells databases at the same time we fix the main database. No disagreement at all. The proposed spec is a monster one, and we can certainly make a lot of progress in Kilo, but I wouldn't expect it to be completed any time soon. Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
On Fri, Nov 28, 2014 at 2:59 AM, Jay Pipes wrote: > On 11/26/2014 04:24 PM, Mike Bayer wrote: >>> >>> Precisely. Why is the RDBMS the thing that is used for >>> archival/audit logging? Why not a NoSQL store or a centralized log >>> facility? All that would be needed would be for us to standardize >>> on the format of the archival record, standardize on the things to >>> provide with the archival record (for instance system metadata, >>> etc), and then write a simple module that would write an archival >>> record to some backend data store. >>> >>> Then we could rid ourselves of the awfulness of the shadow tables >>> and all of the read_deleted=yes crap. >> >> >> >> +1000 - if we’re really looking to “do this right”, as the original >> message suggested, this would be “right”. If you don’t need these >> rows in the app (and it would be very nice if you didn’t), dump it >> out to an archive file / non-relational datastore. As mentioned >> elsewhere, this is entirely acceptable for organizations that are >> “obliged” to store records for auditing purposes. Nova even already >> has a dictionary format for everything set up with nova objects, so >> dumping these dictionaries out as JSON would be the way to go. > > > OK, spec added: > > https://review.openstack.org/137669 At this point I don't think we should block the cells reworking effort on this spec. I'm happy for people to pursue this, but I think its unlikely to be work that is completed in kilo. We can transition the new cells databases at the same time we fix the main database. Michael ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
On 11/26/2014 04:24 PM, Mike Bayer wrote: Precisely. Why is the RDBMS the thing that is used for archival/audit logging? Why not a NoSQL store or a centralized log facility? All that would be needed would be for us to standardize on the format of the archival record, standardize on the things to provide with the archival record (for instance system metadata, etc), and then write a simple module that would write an archival record to some backend data store. Then we could rid ourselves of the awfulness of the shadow tables and all of the read_deleted=yes crap. +1000 - if we’re really looking to “do this right”, as the original message suggested, this would be “right”. If you don’t need these rows in the app (and it would be very nice if you didn’t), dump it out to an archive file / non-relational datastore. As mentioned elsewhere, this is entirely acceptable for organizations that are “obliged” to store records for auditing purposes. Nova even already has a dictionary format for everything set up with nova objects, so dumping these dictionaries out as JSON would be the way to go. OK, spec added: https://review.openstack.org/137669 Best, -jay ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
Mike Bayer wrote: Precisely. Why is the RDBMS the thing that is used for archival/audit logging? Why not a NoSQL store or a centralized log facility? All that would be needed would be for us to standardize on the format of the archival record, standardize on the things to provide with the archival record (for instance system metadata, etc), and then write a simple module that would write an archival record to some backend data store. Then we could rid ourselves of the awfulness of the shadow tables and all of the read_deleted=yes crap. +1000 - if we’re really looking to “do this right”, as the original message suggested, this would be “right”. If you don’t need these rows in the app (and it would be very nice if you didn’t), dump it out to an archive file / non-relational datastore. As mentioned elsewhere, this is entirely acceptable for organizations that are “obliged” to store records for auditing purposes. Nova even already has a dictionary format for everything set up with nova objects, so dumping these dictionaries out as JSON would be the way to go. + 1001; dump it out to some data warehouse, put it to HDFS, do something else with long term storage IMHO; just let's avoid continuing to turn a database into a data warehouse, they are really not the same thing and don't have the same requirements, constraints ... I've always been confused why some of the openstack tables tried to do both roles with a deleted=1|0 field. The part that has also been confusing to me is has anyone actually tried switching a deleted=1 field back to deleted=0 without application logic to do this; if so how did u manage to pull that off correctly without knowing the inner details of the application itself (how did u do this atomically so that the users *actively* running against the api would not start to receive weird responses and failures...)? ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
On 11/26/2014 03:39 PM, Belmiro Moreira wrote: Hi, my experience is that "soft delete" is important to keep record of deleted instances and its characteristics. In fact in my organization we are obliged to keep these records for several months. However, it would be nice that after few months we were able to purge the DB with a nova tool. I think that any solution for this needs to keep the deleted data available in some form. Is it important for you that the deleted data be in the same table as non deleted rows, or could they be moved into another table? And would it matter if the format of the row changed during a move? In the particular case of this cells table my major concern is that having a "delete" field maybe means that top and children databases need to be synchronized. Looking into the current cells design having duplicated information in different databases is one of the main issues. Agreed. I think this can be solved by ensuring that instance deletion is only about setting the deleted column in the cell instance table. The instance mapping being deleted makes no statement about whether or not the instance is deleted, though it would be a bug to delete it before the instance was deleted. Belmiro On Wed, Nov 26, 2014 at 4:40 PM, Andrew Laski mailto:andrew.la...@rackspace.com>> wrote: On 11/25/2014 11:54 AM, Solly Ross wrote: I can't comment on other projects, but Nova definitely needs the soft delete in the main nova database. Perhaps not for every table, but there is definitely code in the code base which uses it right now. Search for read_deleted=True if you're curious. Just to save people a bit of time, it's actually `read_deleted='yes'` or `read_deleted="yes"` for many cases. Just to give people a quick overview: A cursory glance (no pun intended) seems to indicate that quite a few of these are reading potentially deleted flavors. For this case, it makes sense to keep things in one table (as we do). There are also quite a few that seem to be making sure deleted "things" are properly cleaned up. In this case, 'deleted' acts as a "CLEANUP" state, so it makes just as much sense to keep the deleted rows in a separate table. For this case in particular, the concern is that operators might need to find where an instance was running once it is deleted to be able to diagnose issues reported by users. I think that's a valid use case of this particular data. This is a new database, so its our big chance to get this right. So, ideas welcome... Some initial proposals: - we do what we do in the current nova database -- we have a deleted column, and we set it to true when we delete the instance. - we have shadow tables and we move delete rows to a shadow table. Both approaches are viable, but as the soft-delete column is widespread, it would be thorny for this new app to use some totally different scheme, unless the notion is that all schemes should move to the audit table approach (which I wouldn’t mind, but it would be a big job).FTR, the audit table approach is usually what I prefer for greenfield development, if all that’s needed is forensic capabilities at the database inspection level, and not as much active GUI-based “deleted” flags. That is, if you really don’t need to query the history tables very often except when debugging an issue offline. The reason its preferable is because those rows are still “deleted” from your main table, and they don’t get in the way of querying. But if you need to refer to these history rows in context of the application, that means you need to get them mapped in such a way that they behave like the primary rows, which overall is a more difficult approach than just using the soft delete column. I think it does really come down here to how you intend to use the soft-delete functionality in Cells. If you just are using it to debug or audit, then I think the right way to go would be either the audit table (potentially can store more lifecyc
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
> > Precisely. Why is the RDBMS the thing that is used for archival/audit > logging? Why not a NoSQL store or a centralized log facility? All that would > be needed would be for us to standardize on the format of the archival > record, standardize on the things to provide with the archival record (for > instance system metadata, etc), and then write a simple module that would > write an archival record to some backend data store. > > Then we could rid ourselves of the awfulness of the shadow tables and all of > the read_deleted=yes crap. +1000 - if we’re really looking to “do this right”, as the original message suggested, this would be “right”. If you don’t need these rows in the app (and it would be very nice if you didn’t), dump it out to an archive file / non-relational datastore. As mentioned elsewhere, this is entirely acceptable for organizations that are “obliged” to store records for auditing purposes. Nova even already has a dictionary format for everything set up with nova objects, so dumping these dictionaries out as JSON would be the way to go. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
Hi, my experience is that "soft delete" is important to keep record of deleted instances and its characteristics. In fact in my organization we are obliged to keep these records for several months. However, it would be nice that after few months we were able to purge the DB with a nova tool. In the particular case of this cells table my major concern is that having a "delete" field maybe means that top and children databases need to be synchronized. Looking into the current cells design having duplicated information in different databases is one of the main issues. Belmiro On Wed, Nov 26, 2014 at 4:40 PM, Andrew Laski wrote: > > On 11/25/2014 11:54 AM, Solly Ross wrote: > >> I can't comment on other projects, but Nova definitely needs the soft >>> delete in the main nova database. Perhaps not for every table, but >>> there is definitely code in the code base which uses it right now. >>> Search for read_deleted=True if you're curious. >>> >> Just to save people a bit of time, it's actually `read_deleted='yes'` >> or `read_deleted="yes"` for many cases. >> >> Just to give people a quick overview: >> >> A cursory glance (no pun intended) seems to indicate that quite a few of >> these are reading potentially deleted flavors. For this case, it makes >> sense to keep things in one table (as we do). >> >> There are also quite a few that seem to be making sure deleted "things" >> are properly cleaned up. In this case, 'deleted' acts as a "CLEANUP" >> state, so it makes just as much sense to keep the deleted rows in a >> separate table. >> >> For this case in particular, the concern is that operators might need >>> to find where an instance was running once it is deleted to be able to >>> diagnose issues reported by users. I think that's a valid use case of >>> this particular data. >>> >>> This is a new database, so its our big chance to get this right. So, > ideas welcome... > > Some initial proposals: > > - we do what we do in the current nova database -- we have a deleted > column, and we set it to true when we delete the instance. > > - we have shadow tables and we move delete rows to a shadow table. > Both approaches are viable, but as the soft-delete column is widespread, it would be thorny for this new app to use some totally different scheme, unless the notion is that all schemes should move to the audit table approach (which I wouldn’t mind, but it would be a big job).FTR, the audit table approach is usually what I prefer for greenfield development, if all that’s needed is forensic capabilities at the database inspection level, and not as much active GUI-based “deleted” flags. That is, if you really don’t need to query the history tables very often except when debugging an issue offline. The reason its preferable is because those rows are still “deleted” from your main table, and they don’t get in the way of querying. But if you need to refer to these history rows in context of the application, that means you need to get them mapped in such a way that they behave like the primary rows, which overall is a more difficult approach than just using the soft delete column. >>> I think it does really come down here to how you intend to use the >> soft-delete >> functionality in Cells. If you just are using it to debug or audit, then >> I think >> the right way to go would be either the audit table (potentially can >> store more >> lifecycle data, but could end up taking up more space) or a separate >> shadow >> table (takes up less space). >> >> If you are going to use the soft delete for application functionality, I >> would >> consider differentiating between "deleted" and "we still have things left >> to >> clean up", since this seems to be mixing two different requirements into >> one. >> > > The case that spawned this discussion is one where deleted rows are not > needed for application functionality. So I'm going to update the proposed > schema there to not include a 'deleted' column. Fortunately there's still > some time before the question of how to handle deletes needs to be fully > sorted out. > > > That said, I have a lot of plans to send improvements down the way of the existing approach of “soft delete column” into projects, from the querying POV, so that criteria to filter out soft delete can be done in a much more robust fashion (see https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query- heuristic-inspector-event). But this is still more complex and less performant than if the rows are just gone totally, off in a history table somewhere (again, provided you really don’t need to look at those history rows in an application context, otherwise it gets all complicated again). >>> Interesting. I hadn't seen consistency between the two databases as >>> trumping doing this less horr
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
On 11/25/2014 09:34 PM, Mike Bayer wrote: On Nov 25, 2014, at 8:15 PM, Ahmed RAHAL wrote: Hi, Le 2014-11-24 17:20, Michael Still a écrit : Heya, This is a new database, so its our big chance to get this right. So, ideas welcome... Some initial proposals: - we do what we do in the current nova database -- we have a deleted column, and we set it to true when we delete the instance. - we have shadow tables and we move delete rows to a shadow table. - something else super clever I haven't thought of. Some random thoughts that came to mind ... 1/ as far as I remember, you rarely want to delete a row - it's usually a heavy DB operation (well, was back then) - it's destructive (but we may want that) - it creates fragmentation (less of a problem depending on db engine) - it can break foreign key relations if not done the proper way deleting records with foreign key dependencies is a known quantity. Those items are all related and being able to delete everything related is a well-solved problem, both via ON DELETE cascades as well as standard ORM features. ++ 2/ updating a row to 'deleted=1' - gives an opportunity to set a useful deletion time-stamp I would even argue that setting the deleted_at field would suffice to declare a row 'deleted' (as in 'not NULL'). I know, "explicit is better than implicit" … the logic that’s used is that “deleted” is set to the primary key of the record, this is to allow UNIQUE constraints to be set up that serve on the non-deleted rows only (e.g. UNIQUE on “x” + “deleted” is possible when there are multiple “deleted” rows with “x”). Indeed. Because people want to be able to name an instance one thing, delete it, and immediately name another instance the same thing. Ugh -- what an annoying use case, IMO. Better to just delete the row out of the database after archival/audit log of the operation. - the update operation is not destructive - an admin/DBA can decide when and how to purge/archive rows 3/ moving the row at deletion - you want to avoid additional steps to complete an operation, thus avoid creating a new record while deleting one - even if you wrap things into a transaction, not being able to create a row somewhere can make your delete transaction fail - if I were to archive all deleted rows, at scale I'd probably move them to another db server altogether if you’re really “archiving”, I’d just dump out a log of what occurred to a textual log file, then you archive the files. There’s no need for a pure “audit trail” to even be in the relational DB. Precisely. Why is the RDBMS the thing that is used for archival/audit logging? Why not a NoSQL store or a centralized log facility? All that would be needed would be for us to standardize on the format of the archival record, standardize on the things to provide with the archival record (for instance system metadata, etc), and then write a simple module that would write an archival record to some backend data store. Then we could rid ourselves of the awfulness of the shadow tables and all of the read_deleted=yes crap. Best, -jay Now, I for one would keep the current mark-as-deleted model. I however perfectly get the problem of massive churn with instance creation/deletion. is there? inserting and updating rows is a normal thing in relational DBs. So, let's be crazy, why not have a config option 'on_delete=mark_delete', 'on_delete=purge' or 'on_delete=archive' and let the admin choose ? (is that feasible ?) I’m -1 on that. The need for records to be soft-deleted or not, and if those soft-deletes need to be accessible in the application, should be decided up front. Adding a multiplicity of options just makes the code that much more complicated and fragments its behaviors and test coverage. The suggestion basically tries to avoid making a decision and I think more thought should be put into what is truly needed. This would especially come handy if the admin decides the global cells database may not need to keep track of deleted instances, the cell-local nova database being the absolute reference for that. why would an admin decide that this is, or is not, needed? if the deleted data isn’t needed by the live app, it should just be dumped to an archive. admins can set how often that archive should be purged, but IMHO the “pipeline” of these records should be straight; there shouldn’t be junctions and switches that cause there to be multiple data paths. It leads to too much complexity. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
On 11/25/2014 11:54 AM, Solly Ross wrote: I can't comment on other projects, but Nova definitely needs the soft delete in the main nova database. Perhaps not for every table, but there is definitely code in the code base which uses it right now. Search for read_deleted=True if you're curious. Just to save people a bit of time, it's actually `read_deleted='yes'` or `read_deleted="yes"` for many cases. Just to give people a quick overview: A cursory glance (no pun intended) seems to indicate that quite a few of these are reading potentially deleted flavors. For this case, it makes sense to keep things in one table (as we do). There are also quite a few that seem to be making sure deleted "things" are properly cleaned up. In this case, 'deleted' acts as a "CLEANUP" state, so it makes just as much sense to keep the deleted rows in a separate table. For this case in particular, the concern is that operators might need to find where an instance was running once it is deleted to be able to diagnose issues reported by users. I think that's a valid use case of this particular data. This is a new database, so its our big chance to get this right. So, ideas welcome... Some initial proposals: - we do what we do in the current nova database -- we have a deleted column, and we set it to true when we delete the instance. - we have shadow tables and we move delete rows to a shadow table. Both approaches are viable, but as the soft-delete column is widespread, it would be thorny for this new app to use some totally different scheme, unless the notion is that all schemes should move to the audit table approach (which I wouldn’t mind, but it would be a big job).FTR, the audit table approach is usually what I prefer for greenfield development, if all that’s needed is forensic capabilities at the database inspection level, and not as much active GUI-based “deleted” flags. That is, if you really don’t need to query the history tables very often except when debugging an issue offline. The reason its preferable is because those rows are still “deleted” from your main table, and they don’t get in the way of querying. But if you need to refer to these history rows in context of the application, that means you need to get them mapped in such a way that they behave like the primary rows, which overall is a more difficult approach than just using the soft delete column. I think it does really come down here to how you intend to use the soft-delete functionality in Cells. If you just are using it to debug or audit, then I think the right way to go would be either the audit table (potentially can store more lifecycle data, but could end up taking up more space) or a separate shadow table (takes up less space). If you are going to use the soft delete for application functionality, I would consider differentiating between "deleted" and "we still have things left to clean up", since this seems to be mixing two different requirements into one. The case that spawned this discussion is one where deleted rows are not needed for application functionality. So I'm going to update the proposed schema there to not include a 'deleted' column. Fortunately there's still some time before the question of how to handle deletes needs to be fully sorted out. That said, I have a lot of plans to send improvements down the way of the existing approach of “soft delete column” into projects, from the querying POV, so that criteria to filter out soft delete can be done in a much more robust fashion (see https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event). But this is still more complex and less performant than if the rows are just gone totally, off in a history table somewhere (again, provided you really don’t need to look at those history rows in an application context, otherwise it gets all complicated again). Interesting. I hadn't seen consistency between the two databases as trumping doing this less horribly, but it sounds like its more of a thing that I thought. Thanks, Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
> On Nov 25, 2014, at 8:15 PM, Ahmed RAHAL wrote: > > Hi, > > Le 2014-11-24 17:20, Michael Still a écrit : >> Heya, >> >> This is a new database, so its our big chance to get this right. So, >> ideas welcome... >> >> Some initial proposals: >> >> - we do what we do in the current nova database -- we have a deleted >> column, and we set it to true when we delete the instance. >> >> - we have shadow tables and we move delete rows to a shadow table. >> >> - something else super clever I haven't thought of. > > Some random thoughts that came to mind ... > > 1/ as far as I remember, you rarely want to delete a row > - it's usually a heavy DB operation (well, was back then) > - it's destructive (but we may want that) > - it creates fragmentation (less of a problem depending on db engine) > - it can break foreign key relations if not done the proper way deleting records with foreign key dependencies is a known quantity. Those items are all related and being able to delete everything related is a well-solved problem, both via ON DELETE cascades as well as standard ORM features. > > 2/ updating a row to 'deleted=1' > - gives an opportunity to set a useful deletion time-stamp > I would even argue that setting the deleted_at field would suffice to declare > a row 'deleted' (as in 'not NULL'). I know, "explicit is better than > implicit" … the logic that’s used is that “deleted” is set to the primary key of the record, this is to allow UNIQUE constraints to be set up that serve on the non-deleted rows only (e.g. UNIQUE on “x” + “deleted” is possible when there are multiple “deleted” rows with “x”). > - the update operation is not destructive > - an admin/DBA can decide when and how to purge/archive rows > > 3/ moving the row at deletion > - you want to avoid additional steps to complete an operation, thus avoid > creating a new record while deleting one > - even if you wrap things into a transaction, not being able to create a row > somewhere can make your delete transaction fail > - if I were to archive all deleted rows, at scale I'd probably move them to > another db server altogether if you’re really “archiving”, I’d just dump out a log of what occurred to a textual log file, then you archive the files. There’s no need for a pure “audit trail” to even be in the relational DB. > Now, I for one would keep the current mark-as-deleted model. > > I however perfectly get the problem of massive churn with instance > creation/deletion. is there? inserting and updating rows is a normal thing in relational DBs. > So, let's be crazy, why not have a config option 'on_delete=mark_delete', > 'on_delete=purge' or 'on_delete=archive' and let the admin choose ? (is that > feasible ?) I’m -1 on that. The need for records to be soft-deleted or not, and if those soft-deletes need to be accessible in the application, should be decided up front. Adding a multiplicity of options just makes the code that much more complicated and fragments its behaviors and test coverage. The suggestion basically tries to avoid making a decision and I think more thought should be put into what is truly needed. > This would especially come handy if the admin decides the global cells > database may not need to keep track of deleted instances, the cell-local nova > database being the absolute reference for that. why would an admin decide that this is, or is not, needed? if the deleted data isn’t needed by the live app, it should just be dumped to an archive. admins can set how often that archive should be purged, but IMHO the “pipeline” of these records should be straight; there shouldn’t be junctions and switches that cause there to be multiple data paths. It leads to too much complexity. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
Hi, Le 2014-11-24 17:20, Michael Still a écrit : Heya, This is a new database, so its our big chance to get this right. So, ideas welcome... Some initial proposals: - we do what we do in the current nova database -- we have a deleted column, and we set it to true when we delete the instance. - we have shadow tables and we move delete rows to a shadow table. - something else super clever I haven't thought of. Some random thoughts that came to mind ... 1/ as far as I remember, you rarely want to delete a row - it's usually a heavy DB operation (well, was back then) - it's destructive (but we may want that) - it creates fragmentation (less of a problem depending on db engine) - it can break foreign key relations if not done the proper way 2/ updating a row to 'deleted=1' - gives an opportunity to set a useful deletion time-stamp I would even argue that setting the deleted_at field would suffice to declare a row 'deleted' (as in 'not NULL'). I know, "explicit is better than implicit" ... - the update operation is not destructive - an admin/DBA can decide when and how to purge/archive rows 3/ moving the row at deletion - you want to avoid additional steps to complete an operation, thus avoid creating a new record while deleting one - even if you wrap things into a transaction, not being able to create a row somewhere can make your delete transaction fail - if I were to archive all deleted rows, at scale I'd probably move them to another db server altogether Now, I for one would keep the current mark-as-deleted model. I however perfectly get the problem of massive churn with instance creation/deletion. So, let's be crazy, why not have a config option 'on_delete=mark_delete', 'on_delete=purge' or 'on_delete=archive' and let the admin choose ? (is that feasible ?) This would especially come handy if the admin decides the global cells database may not need to keep track of deleted instances, the cell-local nova database being the absolute reference for that. HTH, Ahmed. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
> I can't comment on other projects, but Nova definitely needs the soft > delete in the main nova database. Perhaps not for every table, but > there is definitely code in the code base which uses it right now. > Search for read_deleted=True if you're curious. Just to save people a bit of time, it's actually `read_deleted='yes'` or `read_deleted="yes"` for many cases. Just to give people a quick overview: A cursory glance (no pun intended) seems to indicate that quite a few of these are reading potentially deleted flavors. For this case, it makes sense to keep things in one table (as we do). There are also quite a few that seem to be making sure deleted "things" are properly cleaned up. In this case, 'deleted' acts as a "CLEANUP" state, so it makes just as much sense to keep the deleted rows in a separate table. > > For this case in particular, the concern is that operators might need > to find where an instance was running once it is deleted to be able to > diagnose issues reported by users. I think that's a valid use case of > this particular data. > > >> This is a new database, so its our big chance to get this right. So, > >> ideas welcome... > >> > >> Some initial proposals: > >> > >> - we do what we do in the current nova database -- we have a deleted > >> column, and we set it to true when we delete the instance. > >> > >> - we have shadow tables and we move delete rows to a shadow table. > > > > > > Both approaches are viable, but as the soft-delete column is widespread, it > > would be thorny for this new app to use some totally different scheme, > > unless the notion is that all schemes should move to the audit table > > approach (which I wouldn’t mind, but it would be a big job).FTR, the > > audit table approach is usually what I prefer for greenfield development, > > if all that’s needed is forensic capabilities at the database inspection > > level, and not as much active GUI-based “deleted” flags. That is, if you > > really don’t need to query the history tables very often except when > > debugging an issue offline. The reason its preferable is because those > > rows are still “deleted” from your main table, and they don’t get in the > > way of querying. But if you need to refer to these history rows in > > context of the application, that means you need to get them mapped in such > > a way that they behave like the primary rows, which overall is a more > > difficult approach than just using the soft delete column. I think it does really come down here to how you intend to use the soft-delete functionality in Cells. If you just are using it to debug or audit, then I think the right way to go would be either the audit table (potentially can store more lifecycle data, but could end up taking up more space) or a separate shadow table (takes up less space). If you are going to use the soft delete for application functionality, I would consider differentiating between "deleted" and "we still have things left to clean up", since this seems to be mixing two different requirements into one. > > > > That said, I have a lot of plans to send improvements down the way of the > > existing approach of “soft delete column” into projects, from the querying > > POV, so that criteria to filter out soft delete can be done in a much more > > robust fashion (see > > https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event). > > But this is still more complex and less performant than if the rows are > > just gone totally, off in a history table somewhere (again, provided you > > really don’t need to look at those history rows in an application context, > > otherwise it gets all complicated again). > > Interesting. I hadn't seen consistency between the two databases as > trumping doing this less horribly, but it sounds like its more of a > thing that I thought. > > Thanks, > Michael > > -- > Rackspace Australia > > ___ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
> On Nov 24, 2014, at 7:32 PM, Michael Still wrote: > > Interesting. I hadn't seen consistency between the two databases as > trumping doing this less horribly, but it sounds like its more of a > thing that I thought. it really depends on what you need to do. if you need to get a result set of all entities, deleted or not, consider the difference between a SELECT for all rows from a single table, easy, vs. doing a UNION from primary table to history table, matching up all the columns that hopefully do in fact match up (awkward), and then dealing with joining out to related tables if you need that as well (very awkward from a UNION). if you have any plans to consume these rows in the app i’d advise just doing it like all the other tables. if we want to change that approach, we’d do it en-masse at some point and you’d get it for free. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
On Tue, Nov 25, 2014 at 11:14 AM, Mike Bayer wrote: > >> On Nov 24, 2014, at 5:20 PM, Michael Still wrote: >> >> Heya, >> >> Review https://review.openstack.org/#/c/135644/4 proposes the addition >> of a new database for our improved implementation of cells in Nova. >> However, there's an outstanding question about how to handle soft >> delete of rows -- we believe that we need to soft delete for forensic >> purposes. > > Everytime I talk to people about the soft delete thing, I hear the usual > refrain “we thought we needed it, but we didn’t and now it’s just overbuilt > cruft we want to get rid of”. > > Not saying you don’t have a need here but you definitely have this need, not > just following the herd right? Soft delete makes things a lot less > convenient. I can't comment on other projects, but Nova definitely needs the soft delete in the main nova database. Perhaps not for every table, but there is definitely code in the code base which uses it right now. Search for read_deleted=True if you're curious. For this case in particular, the concern is that operators might need to find where an instance was running once it is deleted to be able to diagnose issues reported by users. I think that's a valid use case of this particular data. >> This is a new database, so its our big chance to get this right. So, >> ideas welcome... >> >> Some initial proposals: >> >> - we do what we do in the current nova database -- we have a deleted >> column, and we set it to true when we delete the instance. >> >> - we have shadow tables and we move delete rows to a shadow table. > > > Both approaches are viable, but as the soft-delete column is widespread, it > would be thorny for this new app to use some totally different scheme, unless > the notion is that all schemes should move to the audit table approach (which > I wouldn’t mind, but it would be a big job).FTR, the audit table approach > is usually what I prefer for greenfield development, if all that’s needed is > forensic capabilities at the database inspection level, and not as much > active GUI-based “deleted” flags. That is, if you really don’t need to > query the history tables very often except when debugging an issue offline. > The reason its preferable is because those rows are still “deleted” from your > main table, and they don’t get in the way of querying. But if you need to > refer to these history rows in context of the application, that means you > need to get them mapped in such a way that they behave like the primary rows, > which overall is a more difficult approach than just using the soft delete > column. > > That said, I have a lot of plans to send improvements down the way of the > existing approach of “soft delete column” into projects, from the querying > POV, so that criteria to filter out soft delete can be done in a much more > robust fashion (see > https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event). >But this is still more complex and less performant than if the rows are > just gone totally, off in a history table somewhere (again, provided you > really don’t need to look at those history rows in an application context, > otherwise it gets all complicated again). Interesting. I hadn't seen consistency between the two databases as trumping doing this less horribly, but it sounds like its more of a thing that I thought. Thanks, Michael -- Rackspace Australia ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
> On Nov 24, 2014, at 5:20 PM, Michael Still wrote: > > Heya, > > Review https://review.openstack.org/#/c/135644/4 proposes the addition > of a new database for our improved implementation of cells in Nova. > However, there's an outstanding question about how to handle soft > delete of rows -- we believe that we need to soft delete for forensic > purposes. Everytime I talk to people about the soft delete thing, I hear the usual refrain “we thought we needed it, but we didn’t and now it’s just overbuilt cruft we want to get rid of”. Not saying you don’t have a need here but you definitely have this need, not just following the herd right? Soft delete makes things a lot less convenient. > > This is a new database, so its our big chance to get this right. So, > ideas welcome... > > Some initial proposals: > > - we do what we do in the current nova database -- we have a deleted > column, and we set it to true when we delete the instance. > > - we have shadow tables and we move delete rows to a shadow table. Both approaches are viable, but as the soft-delete column is widespread, it would be thorny for this new app to use some totally different scheme, unless the notion is that all schemes should move to the audit table approach (which I wouldn’t mind, but it would be a big job).FTR, the audit table approach is usually what I prefer for greenfield development, if all that’s needed is forensic capabilities at the database inspection level, and not as much active GUI-based “deleted” flags. That is, if you really don’t need to query the history tables very often except when debugging an issue offline. The reason its preferable is because those rows are still “deleted” from your main table, and they don’t get in the way of querying. But if you need to refer to these history rows in context of the application, that means you need to get them mapped in such a way that they behave like the primary rows, which overall is a more difficult approach than just using the soft delete column. That said, I have a lot of plans to send improvements down the way of the existing approach of “soft delete column” into projects, from the querying POV, so that criteria to filter out soft delete can be done in a much more robust fashion (see https://bitbucket.org/zzzeek/sqlalchemy/issue/3225/query-heuristic-inspector-event). But this is still more complex and less performant than if the rows are just gone totally, off in a history table somewhere (again, provided you really don’t need to look at those history rows in an application context, otherwise it gets all complicated again). ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [Nova] Handling soft delete for instance rows in a new cells database
On Tue, 2014-11-25 at 09:20 +1100, Michael Still wrote: > - we do what we do in the current nova database -- we have a deleted > column, and we set it to true when we delete the instance. Actually, current nova uses the oslo.db.sqlalchemy.models.SoftDeleteMixin class, which defines the columns 'deleted_at' (a DateTime) and 'deleted' (an *integer*). It also defines a 'soft_delete()' method, which sets the 'deleted' column to the row 'id'. As I understand it, this is to keep from breaking uniqueness constraints; you factor in 'deleted' in your uniqueness constraint, and you can have as many identical deleted records as you want… > - we have shadow tables and we move delete rows to a shadow table. > > - something else super clever I haven't thought of. Well, one thought might be to create a single 'audit' table with a couple of columns—a timestamp, say, and some sort of description of the change, perhaps as a JSON object. On a 'delete' operation, you could store the values of the row into this audit table. From an operator's standpoint, this could provide the required auditing and perhaps even a limited DR solution, while centralizing the data you need to monitor in a single location, which makes it easier to trim the data at intervals as needed. While I've proposed this as a soft-delete solution, it would also provide the ability to record other changes to objects; one could even include a column to record who performed the change. And of course I've suggested this as a DB table, but we could also consider the merits of ditching the table and doing the same thing as some sort of notification through the notifications system… -- Kevin L. Mitchell Rackspace ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev