Well, that stuff was the main reason we build additional layer of 'out
of xapi tools' in dom0. F.e. right now we use 'absolute kill' function
allowing to kill domain without any queue in long list of
timeout-waiting graceful shutdown requests to domain...
And about cancellation of SR stuff...
There is a set of scenarios to think about.
1. Normally short operation performed in normal mode. We can simply says
'no cancel if execution time is less than X'. Means if we quickly unplug
VBD, all is fine and user have no chance to cancel operation. (If by
luck user succeed, we can say 'oops, your request was too late'). Simple
implementation: if operation is 'normally quick' we will wait for small
timeout before processing cancellation request. If operation success to
that time, nothing to cancel, everything is fine. If operation is still
in progress, see #3.
2. Normally long operation, performed in normal mode. We wants to cancel
vdi-copy f.e.. I think this can be easily done by sending kill to
'spare_dd' and removing new VDI.
Now hard part.
Before proposing of behavior, really bad scenario I saw in my XCP
practice: Storage server offline, PBD is still plugged. There is no way
to say 'lvchange' for LVM with unplugged PV, doing something with NFS
without NFS server and so on. We can not do VBD-unplug, PBD-unplug and
so on. Situation getting worse if we getting stuck with innocent SR
(f.e. VM reboots) while dealing with died (or dying) SR attached to same
host. F.e. I saw that once with dying SFP with mass error rate. I was
unable to migrate domain away, no shutdown and my single solution was
reboot host and do manual power-state reset (or, actually, wait that
host goes back online and mark those machines as 'down').
I think SM's should provide some way to say 'nope, that stuff is dead'
and allows forcefully VBD/PBD unplug operations.
That require conception of 'compromised host'. That host is normally do
not accept new VMs, every VM going to reboot is actually shutdowned (see
below) and started on next available non-compromised host. Only two
operations is allowed for VM (except casual memory-set/rename and so on):
1) Shutdown/reboot (which actually restart VM on different host)
2) Urgent migration.
Both of them do have different behavior compare to normal operations:
they did not _DESTROY_ domain (unplug tapdisk and so on). They trying
destroy domain and put it to 'pause' state if this is not possible (f.e.
hanged tapdisk does not free shared memory or my any other way prevents
that VM real disappearance). Those 'paused' domains changes they UUID's
to 'deadbeaf' (like xapi now marks unkillable stray domains during
startup). Main idea: we allow VM migration even if killing of original
domain is failed. We migrate domain and putting it to endlessly paused
state with '-d-'ying flag. Same for shutdown/reboot. We reports VM is
'off' even if domain is not completely died. After all domains is
migrated/rebooted/shutdowned we can freely perform (even self-initiated)
urgent reboot.
One more notice: during that state xapi should be able to restart and
continue to operates (within compromised limits).
I have situation I got not 'dying' domain, butch of normal domains and
decide to restart xapi. Of cause xapi was not able to start (found
unkillable deadbeaf) and I was forced to reboot some good VMs due one bad.
Ok, now to second part.
3. Normally short operations performed too long. If that happens we
can't cancel them. F.e. because lvs is simply hangs at every call and we
can not to do anything with LVM. We allowing to mark task as 'forcefully
canceled' only if host is marked as degradated. In this case we allows
only 'liberating' calls (like reboot/shutdown/migrate) for VMs and
situation is solved. In other words we reject cancellation of those
operations in normal mode, but allows to simply 'forget' about them for
urgent evacuation/reboot.
4. Normally long operations we can't kill. If our kill to spare_dd or
other long-executed command is not success (f.e. we do sr-create of
LVMoISCSI, but dd to 1st 100Mb is hanged), we marks host as 'bad'. Here
we place long timeout (f.e. 30s - if program not reacts to kill -9 for
30s, it is hang is syscall) before doing this.
... And I know that stuff is dirty and ugly. But all block devices can
behave as crap sometimes. F.e. not long ago in linux-raid was nasty bug,
which cause raid10 to go to deadlock. Means every IO is simply go in and
do not returns. Same bug is now in LVM (with large amount disks, not
report it still, because could not reliably reproduce).
And virtualization platform should be able to overcome all THAT.
On 10.07.2012 18:36, Dave Scott wrote:
Hopefully in the future the whole stack will support cancellation --
so the user can apply their own timeout values in their code instead
of us doing it one-size-fits-all. A lot of the domain-level stuff can
now be cancelled (which may cause the domain to crash if it happens at
a bad time.. but this does at least cause things to unwind usually).
Most of the storage interface is uncancellable, which is a big problem
since it involves off-box RPCs. We either need to fix that directly or
offer users the big red button labeled "driver domain restart" which
will unstick things One bad thing about not supporting cancellation is
that it encourages people to close connections and walk away, unaware
that a large amount of resources (and locks) are still being consumed
server-side. One good thing to do would be to send heartbeats to any
running CLIs and auto-cancel when the connection is broken unless some
"--async" option is given which would return immediately with a Task.
In the meantime we always tune the timeouts to fail eventually if the
system gets truly stuck under high load. This leads to fairly long
timeouts, which isn't ideal for everyone. There's a tension between
high timeouts for stress testing and low timeouts for user experience
-- we can't do both :( Cheers, Dave
-----Original Message----- From: Anil Madhavapeddy
[mailto:[email protected]] Sent: 10 July 2012 15:24 To: Dave Scott Cc:
[email protected] Subject: Re: [Xen-API] timing loops How do you
decide on a reasonable value of n, given that real timeouts shift so
dramatically with dom0 system load? Or rather, what areas of xapi
aren't fully event-driven and require such timeouts? I can imagine
the device/udev layer being icky in this regard, but a good way to
wrap all such instances might be to have a single event- dispatch
daemon which combines all the system events and timeouts, and
coordinates the remainder of the xapi process cluster (which will not
need arbitrary timeouts as a result). Or it just too impractical
since there are so many places where such timeouts are required?
-anil On 10 Jul 2012, at 15:18, Dave Scott wrote:
Hi, With all the recent xapi disaggregation work, are we now more
vulnerable to failures induced by moving the system clock around,
affecting timeout logic in our async-style interfaces where we wait
for 'n' seconds for an event notification?
I've recently added 'oclock' as a dependency which gives us access to
a monotonic clock source, which is perfect (I believe) for reliably
'timing out'. I started a patch to convert the whole codebase over
but it was getting much too big and hard to test because sometimes we
really do want a calendar date, and other times we really want a
point in time.
Maybe I should make a subset of my patch which fixes all the new
timing loops that have been introduced. What do you think? Would you
like to confess to having written:
let start = Unix.gettimeofday () in while (not p &&
(Unix.gettimeofday () -. start < timeout) do
Thread.delay 1. done
I've got a nice higher-order function to replace this which does:
let until p timeout interval = let start = Oclock.gettime
Oclock.monotonic in while (not p && (Int64.(to_float (sub
(Oclock.gettime
Oclock.monotonic) start) / 1e9) < timeout) do Thread.delay 1. Done
I believe this is one of many things that lwt (and JS core) does a
nice job of.
Cheers, Dave _______________________________________________ Xen-api
mailing list [email protected]
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
_______________________________________________ Xen-api mailing list
[email protected]
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api
_______________________________________________
Xen-api mailing list
[email protected]
http://lists.xen.org/cgi-bin/mailman/listinfo/xen-api