Re: [Xen-API] timing loops

George Shuklin Tue, 10 Jul 2012 13:09:39 -0700

Well, that stuff was the main reason we build additional layer of 'outof xapi tools' in dom0. F.e. right now we use 'absolute kill' functionallowing to kill domain without any queue in long list oftimeout-waiting graceful shutdown requests to domain...


And about cancellation of SR stuff...
There is a set of scenarios to think about.

1. Normally short operation performed in normal mode. We can simply says'no cancel if execution time is less than X'. Means if we quickly unplugVBD, all is fine and user have no chance to cancel operation. (If byluck user succeed, we can say 'oops, your request was too late'). Simpleimplementation: if operation is 'normally quick' we will wait for smalltimeout before processing cancellation request. If operation success tothat time, nothing to cancel, everything is fine. If operation is stillin progress, see #3.2. Normally long operation, performed in normal mode. We wants to cancelvdi-copy f.e.. I think this can be easily done by sending kill to'spare_dd' and removing new VDI.


Now hard part.

Before proposing of behavior, really bad scenario I saw in my XCPpractice: Storage server offline, PBD is still plugged. There is no wayto say 'lvchange' for LVM with unplugged PV, doing something with NFSwithout NFS server and so on. We can not do VBD-unplug, PBD-unplug andso on. Situation getting worse if we getting stuck with innocent SR(f.e. VM reboots) while dealing with died (or dying) SR attached to samehost. F.e. I saw that once with dying SFP with mass error rate. I wasunable to migrate domain away, no shutdown and my single solution wasreboot host and do manual power-state reset (or, actually, wait thathost goes back online and mark those machines as 'down').

I think SM's should provide some way to say 'nope, that stuff is dead'and allows forcefully VBD/PBD unplug operations.

That require conception of 'compromised host'. That host is normally donot accept new VMs, every VM going to reboot is actually shutdowned (seebelow) and started on next available non-compromised host. Only twooperations is allowed for VM (except casual memory-set/rename and so on):

1) Shutdown/reboot (which actually restart VM on different host)
2) Urgent migration.

Both of them do have different behavior compare to normal operations:they did not _DESTROY_ domain (unplug tapdisk and so on). They tryingdestroy domain and put it to 'pause' state if this is not possible (f.e.hanged tapdisk does not free shared memory or my any other way preventsthat VM real disappearance). Those 'paused' domains changes they UUID'sto 'deadbeaf' (like xapi now marks unkillable stray domains duringstartup). Main idea: we allow VM migration even if killing of originaldomain is failed. We migrate domain and putting it to endlessly pausedstate with '-d-'ying flag. Same for shutdown/reboot. We reports VM is'off' even if domain is not completely died. After all domains ismigrated/rebooted/shutdowned we can freely perform (even self-initiated)urgent reboot.

One more notice: during that state xapi should be able to restart andcontinue to operates (within compromised limits).

I have situation I got not 'dying' domain, butch of normal domains anddecide to restart xapi. Of cause xapi was not able to start (foundunkillable deadbeaf) and I was forced to reboot some good VMs due one bad.


Ok, now to second part.

3. Normally short operations performed too long. If that happens wecan't cancel them. F.e. because lvs is simply hangs at every call and wecan not to do anything with LVM. We allowing to mark task as 'forcefullycanceled' only if host is marked as degradated. In this case we allowsonly 'liberating' calls (like reboot/shutdown/migrate) for VMs andsituation is solved. In other words we reject cancellation of thoseoperations in normal mode, but allows to simply 'forget' about them forurgent evacuation/reboot.4. Normally long operations we can't kill. If our kill to spare_dd orother long-executed command is not success (f.e. we do sr-create ofLVMoISCSI, but dd to 1st 100Mb is hanged), we marks host as 'bad'. Herewe place long timeout (f.e. 30s - if program not reacts to kill -9 for30s, it is hang is syscall) before doing this.

... And I know that stuff is dirty and ugly. But all block devices canbehave as crap sometimes. F.e. not long ago in linux-raid was nasty bug,which cause raid10 to go to deadlock. Means every IO is simply go in anddo not returns. Same bug is now in LVM (with large amount disks, notreport it still, because could not reliably reproduce).


And virtualization platform should be able to overcome all THAT.

On 10.07.2012 18:36, Dave Scott wrote:

Hopefully in the future the whole stack will support cancellation --so the user can apply their own timeout values in their code insteadof us doing it one-size-fits-all. A lot of the domain-level stuff cannow be cancelled (which may cause the domain to crash if it happens ata bad time.. but this does at least cause things to unwind usually).Most of the storage interface is uncancellable, which is a big problemsince it involves off-box RPCs. We either need to fix that directly oroffer users the big red button labeled "driver domain restart" whichwill unstick things One bad thing about not supporting cancellation isthat it encourages people to close connections and walk away, unawarethat a large amount of resources (and locks) are still being consumedserver-side. One good thing to do would be to send heartbeats to anyrunning CLIs and auto-cancel when the connection is broken unless some"--async" option is given which would return immediately with a Task.In the meantime we always tune the timeouts to fail eventually if thesystem gets truly stuck under high load. This leads to fairly longtimeouts, which isn't ideal for everyone. There's a tension betweenhigh timeouts for stress testing and low timeouts for user experience-- we can't do both :( Cheers, Dave
-----Original Message----- From: Anil Madhavapeddy[mailto:[email protected]] Sent: 10 July 2012 15:24 To: Dave Scott Cc:[email protected] Subject: Re: [Xen-API] timing loops How do youdecide on a reasonable value of n, given that real timeouts shift sodramatically with dom0 system load? Or rather, what areas of xapiaren't fully event-driven and require such timeouts? I can imaginethe device/udev layer being icky in this regard, but a good way towrap all such instances might be to have a single event- dispatchdaemon which combines all the system events and timeouts, andcoordinates the remainder of the xapi process cluster (which will notneed arbitrary timeouts as a result). Or it just too impracticalsince there are so many places where such timeouts are required?-anil On 10 Jul 2012, at 15:18, Dave Scott wrote:
Hi, With all the recent xapi disaggregation work, are we now more
vulnerable to failures induced by moving the system clock around,affecting timeout logic in our async-style interfaces where we waitfor 'n' seconds for an event notification?
I've recently added 'oclock' as a dependency which gives us access to
a monotonic clock source, which is perfect (I believe) for reliably'timing out'. I started a patch to convert the whole codebase overbut it was getting much too big and hard to test because sometimes wereally do want a calendar date, and other times we really want apoint in time.
Maybe I should make a subset of my patch which fixes all the new
timing loops that have been introduced. What do you think? Would youlike to confess to having written:
let start = Unix.gettimeofday () in while (not p &&(Unix.gettimeofday () -. start < timeout) do
Thread.delay 1. done
I've got a nice higher-order function to replace this which does:let until p timeout interval = let start = Oclock.gettimeOclock.monotonic in while (not p && (Int64.(to_float (sub(Oclock.gettime
Oclock.monotonic) start) / 1e9) < timeout) do Thread.delay 1. Done
I believe this is one of many things that lwt (and JS core) does a
nice job of.
Cheers, Dave _______________________________________________ Xen-apimailing list [email protected]http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
_______________________________________________ Xen-api mailing list[email protected]http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api


_______________________________________________
Xen-api mailing list
[email protected]
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api

Re: [Xen-API] timing loops

Reply via email to