On Tue, Jan 26, 2016 at 10:31:19PM +0000, Mark Cave-Ayland wrote: > On 25/01/16 11:10, David Gibson wrote: > > > Um.. so the migration duration is a complete red herring, regardless > > of the units. > > > > Remember, we only ever compute the guest timebase value at the moment > > the guest requests it - actually maintaining a current timebase value > > makes sense in hardware, but would be nuts in software. > > > > The timebase is a function of real, wall-clock time, and the migration > > destination has a notion of wall-clock time without reference to the > > source. > > > > So what you need to transmit for migration is enough information to > > compute the guest timebase from real-time - essentially just an offset > > between real-time and the timebase. > > > > The guest can potentially observe the migration duration as a jump in > > timebase values, but qemu doesn't need to do any calculations with it. > > Thanks for more pointers - I think I'm slowly getting there. My current > thoughts are that the basic migration algorithm is doing the right thing > in that it works out the number of host ticks different between source > and destination.
Sorry, I've take a while to reply to this. I realised the tb migration didn't work the way I thought it did, so I've had to get my head around what's actually going on. I had thought that it transferred only meta-information telling the destination how to calculate the timebase, without actually working out the timebase value at any particular moment. In fact, what it sends is basically the tuple of (timebase, realtime) at the point of sending the migration stream. The destination then uses that to work out how to compute the timebase from realtime there. I'm not convinced this is a great approach, but it should basically work. However, as you've seen there are also some Just Plain Bugs in the logic for this. > I have a slight query with this section of code though: > > migration_duration_tb = muldiv64(migration_duration_ns, freq, > NANOSECONDS_PER_SECOND); > > This is not technically correct on TCG x86 since the timebase is the x86 > TSC which is running somewhere in the GHz range, compared to freq which > is hard-coded to 16MHz. Um.. what? AFAICT that line doesn't have any reference to the TSC speed. Just ns and the (guest) tb). Also 16MHz is only for the oldworld Macs - modern ppc cpus have the TB frequency architected as 512MHz. > However this doesn't seem to matter because the > timebase adjustment is limited to a maximum of 1s. Why should this be if > the timebase is supposed to be free running as you mentioned in a > previous email? AFAICT, what it's doing here is assuming that if the migration duration is >1s (or appears to be >1s) then it's because the host clocks are out of sync and so just capping the elapsed tb time at 1s. That's just wrong, IMO. 1s is a long downtime for a live migration, but it's not impossible, and it will happen nearly always in the scenariou you've discussed of manually loading the migration stream from a file. But more to the point, trying to maintain correctness of the timebase when the hosts are out of sync is basically futile. There's no other reference we can use, so all we can achieve is getting a different wrong value from what we'd get by blindly trusting the host clock. We do need to constrain the tb from going backwards, because that will cause chaos on the guest, but otherwise we should just trust the host clock and ditch that 1s clamp. If the hosts are out of sync, then guest time will jump, but that was always going to happen. > AFAICT the main problem on TCG x86 is that post-migration the timebase > calculated by cpu_ppc_get_tb() is incorrect: > > uint64_t cpu_ppc_get_tb(ppc_tb_t *tb_env, uint64_t vmclk, int64_t tb_offset) > { > /* TB time in tb periods */ > return muldiv64(vmclk, tb_env->tb_freq, get_ticks_per_sec()) + > tb_offset; > } So the problem here is that get_ticks_per_sec() (which always returns 1,000,000,000) is not talking about the same ticks as cpu_get_host_ticks(). That may not have been true when this code was written. > For a typical savevm/loadvm pair I see something like this: > > savevm: > > tb->guest_timebase = 26281306490558 > qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) = 7040725511 > > loadvm: > > cpu_get_host_ticks() = 26289847005259 > tb_off_adj = -8540514701 > qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) = 7040725511 > cpu_ppc_get_tb() = -15785159386 > > But as cpu_ppc_get_tb() uses QEMU_CLOCK_VIRTUAL for vmclk we end up with > a negative number for the timebase since the virtual clock is dwarfed by > the number of TSC ticks calculated for tb_off_adj. This will work on a > PPC host though since cpu_host_get_ticks() is also derived from the > timebase. Yeah, we shouldn't be using cpu_host_get_ticks() at all - or anything else which depends on a host frequency. We should only be using qemu interfaces which work in real time units (nanoseconds, usually). > Another question I have is cpu_ppc_load_tbl(): > > uint64_t cpu_ppc_load_tbl (CPUPPCState *env) > { > ppc_tb_t *tb_env = env->tb_env; > uint64_t tb; > > if (kvm_enabled()) { > return env->spr[SPR_TBL]; > } > > tb = cpu_ppc_get_tb(tb_env, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL), > tb_env->tb_offset); > LOG_TB("%s: tb %016" PRIx64 "\n", __func__, tb); > > return tb; > } > > Compared with cpu_ppc_load_tbu(), it is returning uint64_t rather than > uint32_t and doesn't appear to mask the bottom 32-bits of the timebase > value? That's because in 64-bit mode loading the "TBL" is actually a load of the whole 64-bit timebase. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature