On 04/14/2014 05:46 PM, Alexander Graf wrote: > > On 14.04.14 09:33, Alexey Kardashevskiy wrote: >> On 04/14/2014 05:08 PM, Alexander Graf wrote: >>> On 12.04.14 17:52, Alexey Kardashevskiy wrote: >>>> This allows guests to have a different timebase origin from the host. >>>> >>>> This is needed for migration, where a guest can migrate from one host >>>> to another and the two hosts might have a different timebase origin. >>>> However, the timebase seen by the guest must not go backwards, and >>>> should go forwards only by a small amount corresponding to the time >>>> taken for the migration. >>>> >>>> This is only supported for recent POWER hardware which has the TBU40 >>>> (timebase upper 40 bits) register. That includes POWER6, 7, 8 but not >>>> 970. >>>> >>>> This adds kvm_access_one_reg() to access a special register which is not >>>> in env->spr. >>>> >>>> The feature must be present in the host kernel. >>>> >>>> Signed-off-by: Alexey Kardashevskiy <a...@ozlabs.ru> >>>> --- >>>> Changes: >>>> v5: >>>> * fixed multiple comments in cpu_ppc_get_adjusted_tb and merged it >>>> into timebase_post_load() >>>> * removed round_up(1<<24) as KVM is expected to do this anyway >>>> * removed @freq from migration stream >>>> * renamed PPCTimebaseOffset to PPCTimebase >>>> * CLOCKS_PER_SEC is used as a constant which 1000000us/s (man clock) >>>> >>>> v4: >>>> * made it per machine timebase offser rather than per CPU >>>> >>>> v3: >>>> * kvm_access_one_reg moved out to a separate patch >>>> * tb_offset and host_timebase were replaced with guest_timebase as >>>> the destionation does not really care of offset on the source >>>> >>>> v2: >>>> * bumped the vmstate_ppc_cpu version >>>> * defined version for the env.tb_env field >>>> --- >>>> hw/ppc/ppc.c | 76 >>>> ++++++++++++++++++++++++++++++++++++++++++++++++++ >>>> hw/ppc/spapr.c | 4 +-- >>>> include/hw/ppc/spapr.h | 1 + >>>> target-ppc/cpu-qom.h | 15 ++++++++++ >>>> target-ppc/kvm.c | 5 ++++ >>>> trace-events | 3 ++ >>>> 6 files changed, 102 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c >>>> index 71df471..bf61a0a 100644 >>>> --- a/hw/ppc/ppc.c >>>> +++ b/hw/ppc/ppc.c >>>> @@ -29,9 +29,11 @@ >>>> #include "sysemu/cpus.h" >>>> #include "hw/timer/m48t59.h" >>>> #include "qemu/log.h" >>>> +#include "qemu/error-report.h" >>>> #include "hw/loader.h" >>>> #include "sysemu/kvm.h" >>>> #include "kvm_ppc.h" >>>> +#include "trace.h" >>>> //#define PPC_DEBUG_IRQ >>>> //#define PPC_DEBUG_TB >>>> @@ -829,6 +831,80 @@ static void cpu_ppc_set_tb_clk (void *opaque, >>>> uint32_t freq) >>>> cpu_ppc_store_purr(cpu, 0x0000000000000000ULL); >>>> } >>>> +static void timebase_pre_save(void *opaque) >>>> +{ >>>> + PPCTimebase *tb = opaque; >>>> + uint64_t ticks = cpu_get_real_ticks(); >>>> + PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu); >>>> + >>>> + if (!first_ppc_cpu->env.tb_env) { >>>> + error_report("No timebase object"); >>>> + return; >>>> + } >>>> + >>>> + tb->time_of_the_day = get_clock_realtime() / 1000; >>> It'd be good to indicate in the field name that we're in us units. In fact >>> it probably makes sense to use ns throughout and not convert :). >>> Nanoseconds are QEMU's "native" internal time format. >> Ooookay, I'll make it "ns", then no indicator is needed, right? > > I still prefer to keep an indicator. Makes it more readable. > >> >> >>>> + /* >>>> + * tb_offset is only expected to be changed by migration so >>>> + * there is no need to update it from KVM here >>>> + */ >>>> + tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset; >>>> +} >>>> + >>>> +static int timebase_post_load(void *opaque, int version_id) >>>> +{ >>>> + PPCTimebase *tb = opaque; >>>> + CPUState *cpu; >>>> + PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu); >>>> + int64_t tb_off_adj, tb_off; >>>> + int64_t migration_duration_us, migration_duration_tb, guest_tb, >>>> host_us; >>>> + unsigned long freq; >>>> + >>>> + if (!first_ppc_cpu->env.tb_env) { >>>> + error_report("No timebase object"); >>>> + return -1; >>>> + } >>>> + >>>> + freq = first_ppc_cpu->env.tb_env->tb_freq; >>>> + /* >>>> + * Calculate timebase on the destination side of migration. >>>> + * The destination timebase must be not less than the source >>>> timebase. >>>> + * We try to adjust timebase by downtime if host clocks are not >>>> + * too much out of sync (1 second for now). >>>> + */ >>>> + host_us = get_clock_realtime() / 1000; >>>> + migration_duration_us = MIN(CLOCKS_PER_SEC, host_us - >>>> tb->time_of_the_day); >>> CLOCKS_PER_SEC only make sense with the return value of clock(). I wouldn't >>> be surprised if this calculation goes totally bonkers on BSD or Windows >>> systems. >> >> http://linux.die.net/man/3/clock >> C89, C99, POSIX.1-2001. POSIX requires that CLOCKS_PER_SEC equals 1000000 >> independent of the actual resolution. >> >> This should be valid for BSD too. Ok, this has no effect on Windows. >> Replace definition with get_ticks_per_sec()? Defining USEC_PER_SEC (or >> NSEC_xxx) does not make much sense. > > I'd just define NSEC_PER_SEC. > >> >>> I still don't understand why we're doing a MIN() here. I would understand a >>> MAX() to guard against unsynchronized times on source/destination. >> If I added time difference between out-of-sync hosts, the delta would be >> huge and the destination guest would behave really weird until this time >> difference is over. Guests can cope with accidental timebase jump of >> seconds but not minutes. > > Exactly. That's MAX(), right?
Sorry, I am not following you here :( MAX gives bigger number than MIN and bigger number is what I want to avoid. I even tried both, MIN works, MAX does not. -- Alexey