I think we need to split the scenarios and focus on the end user experience with the cloud
.... a few come to my mind from the CERN experience (but this may not be all): 1. Accidental deletion of an object (including meta data) 2. Multi-level consistency (such as between Cell API and child instances) 3. Auditing CERN has the scenario 1 at a reasonable frequency. Ultimately, it is due to error by -- A - the openstack administrators themselves B - the delegated project administrators C - users with a non-optimised scope for administrative action D - users who make mistakes It seems that we should handle these as different cases 3 - make sure there is a log entry (ideally off the box) for all operations 2 - up to the component implementers but with the aim to expire deleted entries as soon as reasonable consistency is achieved 1[A-D] - how can we recover from operator/project admin/user error ? I understand that there are differing perspectives from cloud to server consolidation but my cloud users expect that if they create a one-off virtual desktop running Windows for software testing and install a set of software, I don't tell them it was accidentally deleted due to operator error (1A or 1B), you need to re-create it. Tim > -----Original Message----- > From: Jay Pipes [mailto:[email protected]] > Sent: 14 March 2014 16:55 > To: [email protected] > Subject: Re: [openstack-dev] [all][db][performance] Proposal: Get rid of soft > deletion (step by step) > > On Fri, 2014-03-14 at 08:37 +0100, Radomir Dopieralski wrote: > > Hello, > > > > I also think that this thread is going in the wrong direction, but I > > don't think the direction Boris wants is the correct one either. > > Frankly I'm a little surprised that nobody mentioned another advantage > > that soft delete gives us, the one that I think it was actually used for > > originally. > > > > You see, soft delete is an optimization. It's there to make the system > > work faster as a whole, have less code and be simpler to maintain and debug. > > > > How does it do it, when, as clearly shown in the first post in this > > thread, it makes the queries slower, requires additional indices in > > the database and more logic in the queries? > > I feel it isn't an optimization if: > > * It slows down the code base > * Makes the code harder to read and understand > * Deliberately obscures the actions of removing and restoring resources > * Encourages the idea that everything in the system is "undoable", like the > cloud is a Word doc. > > > The answer is, by doing more > > with those queries, by making you write less code, execute fewer > > queries to the databases and avoid duplicating the same data in multiple > > places. > > Fewer queries does not aklways make faster code, nor does it lead to > inherently race-free code. > > > OpenStack is a big, distributed system of multiple databases that > > sometimes rely on each other and cross-reference their records. It's > > not uncommon to have some long-running operation started, that uses > > some data, and then, in the middle of its execution, have that data deleted. > > With soft delete, that's not a problem -- the operation can continue > > safely and proceed as scheduled, with the data it was started with in > > the first place -- it still has access to the deleted records as if > > nothing happened. > > I believe a better solution would be to use Boris' solution and implement > safeguards around the delete operation. For instance, not > being able to delete an instance that has tasks still running against it. > Either that, or implement true task abortion logic that can > notify distributed components about the need to stop a running task because > either the user wants to delete a resource or simply > cancel the operation they began. > > > You simply won't be able to schedule another operation like that with > > the same data, because it has been soft-deleted and won't pass the > > validation at the beginning (or even won't appear in the UI or CLI). > > This solves a lot of race conditions, error handling, additional > > checks to make sure the record still exists, etc. > > Sorry, I disagree here. Components that rely on the soft-delete behavior to > get the resource data from the database should instead > respond to a NotFound that gets raised by aborting their running task. > > > Without soft delete, you need to write custom code every time to > > handle the case of a record being deleted mid-operation, including all > > the possible combinations of which record and when. > > Not custom code. Explicit code paths for explicit actions. > > > Or you need to copy all > > the relevant data in advance over to whatever is executing that > > operation. > > This is already happening. > > > This cannot be abstracted away entirely (although tools like TaskFlow > > help), as this is specific to the case you are handling. And it's not > > easy to find all the places where you can have a race condition like > > that -- especially when you are modifying existing code that has been > > relying on soft delete before. You can have bugs undetected for years, > > that only appear in production, on very large deployments, and are > > impossible to reproduce reliably. > > > > There are more similar cases like that, including cascading deletes > > and more advanced stuff, but I think this single case already shows > > that the advantages of soft delete out-weight its disadvantages. > > I respectfully disagree :) I think the benefits of explicit code paths and > increased performance of the database outweigh the costs of > changing existing code. > > Best, > -jay > > > On 13/03/14 19:52, Boris Pavlovic wrote: > > > Hi all, > > > > > > > > > I would like to fix direction of this thread. Cause it is going in > > > wrong direction. > > > > > > To assume: > > > 1) Yes restoring already deleted recourses could be useful. > > > 2) Current approach with soft deletion is broken by design and we > > > should get rid of them. > > > > > > More about why I think that it is broken: > > > 1) When you are restoring some resource you should restore N records > > > from N tables (e.g. VM) > > > 2) Restoring sometimes means not only restoring DB records. > > > 3) Not all resources should be restorable (e.g. why I need to > > > restore fixed_ip? or key-pairs?) > > > > > > > > > So what we should think about is: > > > 1) How to implement restoring functionally in common way (e.g. > > > framework that will be in oslo) > > > 2) Split of work of getting rid of soft deletion in steps (that I > > > already mention): > > > a) remove soft deletion from places where we are not using it > > > b) replace internal code where we are using soft deletion to that > > > framework > > > c) replace API stuff using ceilometer (for logs) or this framework > > > (for restorable stuff) > > > > > > > > > To put in a nutshell: Restoring Delete resources / Delayed Deletion > > > != Soft deletion. > > > > > > > > > Best regards, > > > Boris Pavlovic > > > > > > > > > > > > On Thu, Mar 13, 2014 at 9:21 PM, Mike Wilson <[email protected] > > > <mailto:[email protected]>> wrote: > > > > > > For some guests we use the LVM imagebackend and there are times when > > > the guest is deleted on accident. Humans, being what they are, don't > > > back up their files and don't take care of important data, so it is > > > not uncommon to use lvrestore and "undelete" an instance so that > > > people can get their data. Of course, this is not always possible if > > > the data has been subsequently overwritten. But it is common enough > > > that I imagine most of our operators are familiar with how to do it. > > > So I guess my saying that we do it on a regular basis is not quite > > > accurate. Probably would be better to say that it is not uncommon to > > > do this, but definitely not a daily task or something of that ilk. > > > > > > I have personally "undeleted" an instance a few times after > > > accidental deletion also. I can't remember the specifics, but I do > > > remember doing it :-). > > > > > > -Mike > > > > > > > > > On Tue, Mar 11, 2014 at 12:46 PM, Johannes Erdfelt > > > <[email protected] <mailto:[email protected]>> wrote: > > > > > > On Tue, Mar 11, 2014, Mike Wilson <[email protected] > > > <mailto:[email protected]>> wrote: > > > > Undeleting things is an important use case in my opinion. We > > > do this in our > > > > environment on a regular basis. In that light I'm not sure > > > that it would be > > > > appropriate just to log the deletion and git rid of the row. I > > > would like > > > > to see it go to an archival table where it is easily restored. > > > > > > I'm curious, what are you undeleting and why? > > > > > > JE > > > > > > > > > _______________________________________________ > > > OpenStack-dev mailing list > > > [email protected] > > > <mailto:[email protected]> > > > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > > > > > > > _______________________________________________ > > > OpenStack-dev mailing list > > > [email protected] > > > <mailto:[email protected]> > > > > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > > > > > > > > > > _______________________________________________ > > > OpenStack-dev mailing list > > > [email protected] > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > > > > > > _______________________________________________ > > OpenStack-dev mailing list > > [email protected] > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > OpenStack-dev mailing list > [email protected] > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev _______________________________________________ OpenStack-dev mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
