On Mon, 07 Oct 2024 14:53:54 +0100 David Woodhouse <dw...@infradead.org> wrote:
> From: David Woodhouse <d...@amazon.co.uk> > > The vmclock device addresses the problem of live migration with > precision clocks. The tolerances of a hardware counter (e.g. TSC) are > typically around ±50PPM. A guest will use NTP/PTP/PPS to discipline that > counter against an external source of 'real' time, and track the precise > frequency of the counter as it changes with environmental conditions. > > When a guest is live migrated, anything it knows about the frequency of > the underlying counter becomes invalid. It may move from a host where > the counter running at -50PPM of its nominal frequency, to a host where > it runs at +50PPM. There will also be a step change in the value of the > counter, as the correctness of its absolute value at migration is > limited by the accuracy of the source and destination host's time > synchronization. > > The device exposes a shared memory region to guests, which can be mapped > all the way to userspace. In the first phase, this merely advertises a > 'disruption_marker', which indicates that the guest should throw away any > NTP synchronization it thinks it has, and start again. > > Because the region can be exposed all the way to userspace, applications > can still use time from a fast vDSO 'system call', and check the > disruption marker to be sure that their timestamp is indeed truthful. > > The structure also allows for the precise time, as known by the host, to > be exposed directly to guests so that they don't have to wait for NTP to > resync from scratch. > > The values and fields are based on the nascent virtio-rtc specification, > and the intent is that a version (hopefully precisely this version) of > this structure will be included as an optional part of that spec. In the > meantime, a simple ACPI device along the lines of VMGENID is perfectly > sufficient and is compatible with what's being shipped in certain > commercial hypervisors. > > Signed-off-by: David Woodhouse <d...@amazon.co.uk> Hi David, A couple of drive by comments (as we all love those!) Jonathan > --- > diff --git a/hw/acpi/vmclock-abi.h b/hw/acpi/vmclock-abi.h > new file mode 100644 > index 0000000000..19cbf85efd > --- /dev/null > +++ b/hw/acpi/vmclock-abi.h > @@ -0,0 +1,186 @@ > +#endif /* __VMCLOCK_ABI_H__ */ > diff --git a/hw/acpi/vmclock.c b/hw/acpi/vmclock.c > new file mode 100644 > index 0000000000..e7df047c33 > --- /dev/null > +++ b/hw/acpi/vmclock.c > + > +void vmclock_build_acpi(VmclockState *vms, GArray *table_data, > + BIOSLinker *linker, const char *oem_id) > +{ > + Aml *ssdt, *dev, *scope, *method, *addr, *crs; > + AcpiTable table = { .sig = "SSDT", .rev = 1, > + .oem_id = oem_id, .oem_table_id = "VMCLOCK" }; > + > + /* Put VMCLOCK into a separate SSDT table */ > + acpi_table_begin(&table, table_data); > + ssdt = init_aml_allocator(); > + > + scope = aml_scope("\\_SB"); > + dev = aml_device("VCLK"); > + aml_append(dev, aml_name_decl("_HID", aml_string("AMZNC10C"))); Nice _HID :) > + aml_append(dev, aml_name_decl("_CID", aml_string("VMCLOCK"))); > + aml_append(dev, aml_name_decl("_DDN", aml_string("VMCLOCK"))); > + > + /* Simple status method */ > + method = aml_method("_STA", 0, AML_NOTSERIALIZED); > + addr = aml_local(0); > + aml_append(method, aml_store(aml_int(0xf), addr)); > + aml_append(method, aml_return(addr)); > + aml_append(dev, method); It's static so you should be able to do: aml_append(dev, aml_name_decl("_STA", aml_int(0xf))); > + > + crs = aml_resource_template(); > + aml_append(crs, aml_qword_memory(AML_POS_DECODE, > + AML_MIN_FIXED, AML_MAX_FIXED, > + AML_CACHEABLE, AML_READ_ONLY, > + 0xffffffffffffffffULL, > + vms->physaddr, > + vms->physaddr + VMCLOCK_SIZE - 1, > + 0, VMCLOCK_SIZE)); > + aml_append(dev, aml_name_decl("_CRS", crs)); > + aml_append(scope, dev); > + aml_append(ssdt, scope); > + > + g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len); > + acpi_table_end(linker, &table); > + free_aml_allocator(); > +} > +static Property vmclock_device_properties[] = { > + DEFINE_PROP_END_OF_LIST(), > +}; > + > +static void vmclock_device_class_init(ObjectClass *klass, void *data) > +{ > + DeviceClass *dc = DEVICE_CLASS(klass); > + > + dc->vmsd = &vmstate_vmclock; > + dc->realize = vmclock_realize; > + device_class_set_props(dc, vmclock_device_properties); Probably a silly question but why register no properties? Is that any different to not registering them? + a quick look suggests not all class_init functions set them. > + dc->hotpluggable = false; > + set_bit(DEVICE_CATEGORY_MISC, dc->categories); > +} > + > +static const TypeInfo vmclock_device_info = { > + .name = TYPE_VMCLOCK, > + .parent = TYPE_DEVICE, > + .instance_size = sizeof(VmclockState), > + .class_init = vmclock_device_class_init, > +}; > + > +static void vmclock_register_types(void) > +{ > + type_register_static(&vmclock_device_info); > +} > + > +type_init(vmclock_register_types)