[PATCH 3.16.y-ckt 061/126] MIPS: KVM: Fix CACHE immediate offset sign extension
3.16.7-ckt22 -stable review patch. If anyone has any objections, please let me know. -- From: James Hogancommit c5c2a3b998f1ff5a586f9d37e154070b8d550d17 upstream. The immediate field of the CACHE instruction is signed, so ensure that it gets sign extended by casting it to an int16_t rather than just masking the low 16 bits. Fixes: e685c689f3a8 ("KVM/MIPS32: Privileged instruction/target branch emulation.") Signed-off-by: James Hogan Cc: Ralf Baechle Cc: Paolo Bonzini Cc: Gleb Natapov Cc: linux-m...@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini [ luis: backported to 3.16: - file rename: emulate.c -> kvm_mips_emul.c ] Signed-off-by: Luis Henriques --- arch/mips/kvm/kvm_mips_emul.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/kvm/kvm_mips_emul.c b/arch/mips/kvm/kvm_mips_emul.c index 18b4e2fdae33..950229176c2f 100644 --- a/arch/mips/kvm/kvm_mips_emul.c +++ b/arch/mips/kvm/kvm_mips_emul.c @@ -1434,7 +1434,7 @@ kvm_mips_emulate_cache(uint32_t inst, uint32_t *opc, uint32_t cause, base = (inst >> 21) & 0x1f; op_inst = (inst >> 16) & 0x1f; - offset = inst & 0x; + offset = (int16_t)inst; cache = (inst >> 16) & 0x3; op = (inst >> 18) & 0x7; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3.16.y-ckt 060/126] MIPS: KVM: Fix ASID restoration logic
3.16.7-ckt22 -stable review patch. If anyone has any objections, please let me know. -- From: James Hogancommit 002374f371bd02df864cce1fe85d90dc5b292837 upstream. ASID restoration on guest resume should determine the guest execution mode based on the guest Status register rather than bit 30 of the guest PC. Fix the two places in locore.S that do this, loading the guest status from the cop0 area. Note, this assembly is specific to the trap & emulate implementation of KVM, so it doesn't need to check the supervisor bit as that mode is not implemented in the guest. Fixes: b680f70fc111 ("KVM/MIPS32: Entry point for trampolining to...") Signed-off-by: James Hogan Cc: Ralf Baechle Cc: Paolo Bonzini Cc: Gleb Natapov Cc: linux-m...@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini [ luis: backported to 3.16: - file rename: locore.S -> kvm_locore.S ] Signed-off-by: Luis Henriques --- arch/mips/kvm/kvm_locore.S | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/arch/mips/kvm/kvm_locore.S b/arch/mips/kvm/kvm_locore.S index 17376cd838e6..fc24acb3a837 100644 --- a/arch/mips/kvm/kvm_locore.S +++ b/arch/mips/kvm/kvm_locore.S @@ -159,9 +159,11 @@ FEXPORT(__kvm_mips_vcpu_run) FEXPORT(__kvm_mips_load_asid) /* Set the ASID for the Guest Kernel */ - INT_SLL t0, t0, 1 /* with kseg0 @ 0x4000, kernel */ - /* addresses shift to 0x8000 */ - bltzt0, 1f /* If kernel */ + PTR_L t0, VCPU_COP0(k1) + LONG_L t0, COP0_STATUS(t0) + andit0, KSU_USER | ST0_ERL | ST0_EXL + xorit0, KSU_USER + bnezt0, 1f /* If kernel */ INT_ADDIU t1, k1, VCPU_GUEST_KERNEL_ASID /* (BD) */ INT_ADDIU t1, k1, VCPU_GUEST_USER_ASID/* else user */ 1: @@ -438,9 +440,11 @@ __kvm_mips_return_to_guest: mtc0t0, CP0_EPC /* Set the ASID for the Guest Kernel */ - INT_SLL t0, t0, 1 /* with kseg0 @ 0x4000, kernel */ - /* addresses shift to 0x8000 */ - bltzt0, 1f /* If kernel */ + PTR_L t0, VCPU_COP0(k1) + LONG_L t0, COP0_STATUS(t0) + andit0, KSU_USER | ST0_ERL | ST0_EXL + xorit0, KSU_USER + bnezt0, 1f /* If kernel */ INT_ADDIU t1, k1, VCPU_GUEST_KERNEL_ASID /* (BD) */ INT_ADDIU t1, k1, VCPU_GUEST_USER_ASID/* else user */ 1: -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3.16.y-ckt 062/126] MIPS: KVM: Uninit VCPU in vcpu_create error path
3.16.7-ckt22 -stable review patch. If anyone has any objections, please let me know. -- From: James Hogancommit 585bb8f9a5e592f2ce7abbe5ed3112d5438d2754 upstream. If either of the memory allocations in kvm_arch_vcpu_create() fail, the vcpu which has been allocated and kvm_vcpu_init'd doesn't get uninit'd in the error handling path. Add a call to kvm_vcpu_uninit() to fix this. Fixes: 669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM") Signed-off-by: James Hogan Cc: Ralf Baechle Cc: Paolo Bonzini Cc: Gleb Natapov Cc: linux-m...@linux-mips.org Cc: kvm@vger.kernel.org Signed-off-by: Paolo Bonzini [ luis: backported to 3.16: - file rename: mips.c -> kvm_mips.c ] Signed-off-by: Luis Henriques --- arch/mips/kvm/kvm_mips.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c index cc721a3c8996..2c81c2c9e8dc 100644 --- a/arch/mips/kvm/kvm_mips.c +++ b/arch/mips/kvm/kvm_mips.c @@ -307,7 +307,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) if (!gebase) { err = -ENOMEM; - goto out_free_cpu; + goto out_uninit_cpu; } kvm_debug("Allocated %d bytes for KVM Exception Handlers @ %p\n", ALIGN(size, PAGE_SIZE), gebase); @@ -368,6 +368,9 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, unsigned int id) out_free_gebase: kfree(gebase); +out_uninit_cpu: + kvm_vcpu_uninit(vcpu); + out_free_cpu: kfree(vcpu); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1] kvm/x86: Hyper-V tsc page setup
On 01/06/2016 12:48 AM, Peter Hornyack wrote: On Thu, Dec 24, 2015 at 1:33 AM, Andrey Smetaninwrote: Lately tsc page was implemented but filled with empty values. This patch setup tsc page scale and offset based on vcpu tsc, tsc_khz and HV_X64_MSR_TIME_REF_COUNT value. The valid tsc page drops HV_X64_MSR_TIME_REF_COUNT msr reads count to zero which potentially improves performance. The patch applies on top of 'kvm: Make vcpu->requests as 64 bit bitmap' previously sent. Signed-off-by: Andrey Smetanin CC: Paolo Bonzini CC: Gleb Natapov CC: Roman Kagan CC: Denis V. Lunev CC: qemu-de...@nongnu.org Reviewed-by: Peter Hornyack --- arch/x86/kvm/hyperv.c| 117 +-- arch/x86/kvm/hyperv.h| 2 + arch/x86/kvm/x86.c | 12 + include/linux/kvm_host.h | 1 + 4 files changed, 117 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c index d50675a..504fdc7 100644 --- a/arch/x86/kvm/hyperv.c +++ b/arch/x86/kvm/hyperv.c @@ -753,6 +753,105 @@ static int kvm_hv_msr_set_crash_data(struct kvm_vcpu *vcpu, return 0; } +static u64 calc_tsc_page_scale(u32 tsc_khz) +{ + /* +* reftime (in 100ns) = tsc * tsc_scale / 2^64 + tsc_offset +* so reftime_delta = (tsc_delta * tsc_scale) / 2^64 +* so tsc_scale = (2^64 * reftime_delta)/tsc_delta +* so tsc_scale = (2^64 * 10 * 10^6) / tsc_hz = (2^64 * 1) / tsc_khz +* so tsc_scale = (2^63 * 2 * 1) / tsc_khz +*/ + return mul_u64_u32_div(1ULL << 63, 2 * 1, tsc_khz); +} + +static int write_tsc_page(struct kvm *kvm, u64 gfn, + PHV_REFERENCE_TSC_PAGE tsc_ref) +{ + if (kvm_write_guest(kvm, gfn_to_gpa(gfn), + tsc_ref, sizeof(*tsc_ref))) + return 1; + mark_page_dirty(kvm, gfn); + return 0; +} + +static int read_tsc_page(struct kvm *kvm, u64 gfn, +PHV_REFERENCE_TSC_PAGE tsc_ref) +{ + if (kvm_read_guest(kvm, gfn_to_gpa(gfn), + tsc_ref, sizeof(*tsc_ref))) + return 1; + return 0; +} + +static u64 calc_tsc_page_time(struct kvm_vcpu *vcpu, + PHV_REFERENCE_TSC_PAGE tsc_ref) +{ + + u64 tsc = kvm_read_l1_tsc(vcpu, rdtsc()); + + return mul_u64_u64_shr(tsc, tsc_ref->tsc_scale, 64) + + tsc_ref->tsc_offset; +} + +static int setup_blank_tsc_page(struct kvm_vcpu *vcpu, u64 gfn) +{ + HV_REFERENCE_TSC_PAGE tsc_ref; + + memset(_ref, 0, sizeof(tsc_ref)); + return write_tsc_page(vcpu->kvm, gfn, _ref); +} + +int kvm_hv_setup_tsc_page(struct kvm_vcpu *vcpu) +{ + struct kvm *kvm = vcpu->kvm; + struct kvm_hv *hv = >arch.hyperv; + HV_REFERENCE_TSC_PAGE tsc_ref; + u32 tsc_khz; + int r; + u64 gfn, ref_time, tsc_scale, tsc_offset, tsc; + + if (WARN_ON_ONCE(!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE))) + return -EINVAL; + + gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT; + vcpu_debug(vcpu, "tsc page gfn 0x%llx\n", gfn); + + tsc_khz = vcpu->arch.virtual_tsc_khz; + if (!tsc_khz) { + vcpu_unimpl(vcpu, "no tsc khz\n"); + return setup_blank_tsc_page(vcpu, gfn); + } + + r = read_tsc_page(kvm, gfn, _ref); + if (r) { + vcpu_err(vcpu, "can't access tsc page gfn 0x%llx\n", gfn); + return r; + } + + tsc_scale = calc_tsc_page_scale(tsc_khz); + ref_time = get_time_ref_counter(kvm); + tsc = kvm_read_l1_tsc(vcpu, rdtsc()); + + /* tsc_offset = reftime - tsc * tsc_scale / 2^64 */ + tsc_offset = ref_time - mul_u64_u64_shr(tsc, tsc_scale, 64); + vcpu_debug(vcpu, "tsc khz %u tsc %llu scale %llu offset %llu\n", + tsc_khz, tsc, tsc_scale, tsc_offset); + + tsc_ref.tsc_sequence++; + if (tsc_ref.tsc_sequence == 0) Also avoid tsc_sequence == 0x here. In the Hyper-V TLFS 4.0 (Win2012 R2) 0x is the special sequence number to disable the reference TSC page. we already discussed with Microsoft that documentation contains wrong sequence number - 0x (instead of 0). please take a look into details here: https://lkml.org/lkml/2015/11/2/655 + tsc_ref.tsc_sequence = 1; + + tsc_ref.tsc_scale = tsc_scale; + tsc_ref.tsc_offset = tsc_offset; + + vcpu_debug(vcpu, "tsc page calibration time %llu vs. reftime %llu\n", + calc_tsc_page_time(vcpu, _ref), + get_time_ref_counter(kvm)); + + return write_tsc_page(kvm, gfn, _ref); +} + static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data, bool host) { @@
Re: [RFC PATCH v2 3/3] vfio-pci: Allow to mmap MSI-X table if EEH is supported
On 2016/1/5 5:42, Benjamin Herrenschmidt wrote: On Mon, 2016-01-04 at 14:07 -0700, Alex Williamson wrote: On Thu, 2015-12-31 at 16:50 +0800, Yongji Xie wrote: Current vfio-pci implementation disallows to mmap MSI-X table in case that user get to touch this directly. However, EEH mechanism can ensure that a given pci device can only shoot the MSIs assigned for its PE. So we think it's safe to expose the MSI-X table to userspace because the exposed MSI-X table can't be used to do harm to other memory space. And with MSI-X table mmapped, some performance issues which are caused when PCI adapters have critical registers in the same page as the MSI-X table also can be resolved. So this patch adds a Kconfig option, VFIO_PCI_MMAP_MSIX, to support for mmapping MSI-X table. Signed-off-by: Yongji Xie--- drivers/vfio/pci/Kconfig|4 drivers/vfio/pci/vfio_pci.c |6 -- 2 files changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 02912f1..67b0a2c 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -23,6 +23,10 @@ config VFIO_PCI_MMAP depends on VFIO_PCI def_bool y if !S390 +config VFIO_PCI_MMAP_MSIX + depends on VFIO_PCI_MMAP + def_bool y if EEH Does CONFIG_EEH necessarily mean the EEH is enabled? Could the system not support EEH or could EEH be disabled via kernel commandline options? EEH is definitely the wrong thing to test here anyway. What needs to be tested is that the PCI Host bridge supports filtering of MSIs, so ideally this should be some kind of host bridge attribute set by the architecture backend. So do you mean this attribute can be added in pci_host_bridge like this: --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -412,6 +412,7 @@ struct pci_host_bridge { void (*release_fn)(struct pci_host_bridge *); void *release_data; unsigned int ignore_reset_delay:1; /* for entire hierarchy */ + unsigned int msix_filtered:1; /* support filtering of MSIs */ /* Resource alignment requirements */ resource_size_t (*align_resource)(struct pci_dev *dev, const struct resource *res, I can surely do it if there is no objection from PCI folks. Thanks. Regards, Yongji Xie This can happen with or without CONFIG_EEH and you are right, CONFIG_EEH can be enabled and the machine not support it. Any IODA bridge will support this. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to reserve guest physical region for ACPI
On Tue, 5 Jan 2016 18:22:33 +0100 Laszlo Ersekwrote: > On 01/05/16 18:08, Igor Mammedov wrote: > > On Mon, 4 Jan 2016 21:17:31 +0100 > > Laszlo Ersek wrote: > > > >> Michael CC'd me on the grandparent of the email below. I'll try to add > >> my thoughts in a single go, with regard to OVMF. > >> > >> On 12/30/15 20:52, Michael S. Tsirkin wrote: > >>> On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote: > On Mon, 28 Dec 2015 14:50:15 +0200 > "Michael S. Tsirkin" wrote: > > > On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote: > >> > >> Hi Michael, Paolo, > >> > >> Now it is the time to return to the challenge that how to reserve guest > >> physical region internally used by ACPI. > >> > >> Igor suggested that: > >> | An alternative place to allocate reserve from could be high memory. > >> | For pc we have "reserved-memory-end" which currently makes sure > >> | that hotpluggable memory range isn't used by firmware > >> (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html) > >> > >> > >> OVMF has no support for the "reserved-memory-end" fw_cfg file. The > >> reason is that nobody wrote that patch, nor asked for the patch to be > >> written. (Not implying that just requesting the patch would be > >> sufficient for the patch to be written.) > >> > > I don't want to tie things to reserved-memory-end because this > > does not scale: next time we need to reserve memory, > > we'll need to find yet another way to figure out what is where. > Could you elaborate a bit more on a problem you're seeing? > > To me it looks like it scales rather well. > For example lets imagine that we adding a device > that has some on device memory that should be mapped into GPA > code to do so would look like: > > pc_machine_device_plug_cb(dev) > { > ... > if (dev == OUR_NEW_DEVICE_TYPE) { > memory_region_add_subregion(as, current_reserved_end, >mr); > set_new_reserved_end(current_reserved_end + > memory_region_size(>mr)); > } > } > > we can practically add any number of new devices that way. > >>> > >>> Yes but we'll have to build a host side allocator for these, and that's > >>> nasty. We'll also have to maintain these addresses indefinitely (at > >>> least per machine version) as they are guest visible. > >>> Not only that, there's no way for guest to know if we move things > >>> around, so basically we'll never be able to change addresses. > >>> > >>> > > > I would like ./hw/acpi/bios-linker-loader.c interface to be extended to > > support 64 bit RAM instead > >> > >> This looks quite doable in OVMF, as long as the blob to allocate from > >> high memory contains *zero* ACPI tables. > >> > >> ( > >> Namely, each ACPI table is installed from the containing fw_cfg blob > >> with EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), and the latter has its > >> own allocation policy for the *copies* of ACPI tables it installs. > >> > >> This allocation policy is left unspecified in the section of the UEFI > >> spec that governs EFI_ACPI_TABLE_PROTOCOL. > >> > >> The current policy in edk2 (= the reference implementation) seems to be > >> "allocate from under 4GB". It is currently being changed to "try to > >> allocate from under 4GB, and if that fails, retry from high memory". (It > >> is motivated by Aarch64 machines that may have no DRAM at all under 4GB.) > >> ) > >> > > (and maybe a way to allocate and > > zero-initialize buffer without loading it through fwcfg), > >> > >> Sounds reasonable. > >> > > this way bios > > does the allocation, and addresses can be patched into acpi. > and then guest side needs to parse/execute some AML that would > initialize QEMU side so it would know where to write data. > >>> > >>> Well not really - we can put it in a data table, by itself > >>> so it's easy to find. > >> > >> Do you mean acpi_tb_find_table(), acpi_get_table_by_index() / > >> acpi_get_table_with_size()? > >> > >>> > >>> AML is only needed if access from ACPI is desired. > >>> > >>> > bios-linker-loader is a great interface for initializing some > guest owned data and linking it together but I think it adds > unnecessary complexity and is misused if it's used to handle > device owned data/on device memory in this and VMGID cases. > >>> > >>> I want a generic interface for guest to enumerate these things. linker > >>> seems quite reasonable but if you see a reason why it won't do, or want > >>> to propose a better interface, fine. > >> > >> * The guest could do the following: > >> - while processing the ALLOCATE commands, it would make a note where in > >> GPA space each fw_cfg blob gets allocated > >> - at the
Re: [PATCH 3/6] nvdimm acpi: introduce patched dsm memory
On 01/06/2016 11:23 PM, Igor Mammedov wrote: On Tue, 5 Jan 2016 02:52:05 +0800 Xiao Guangrongwrote: The dsm memory is used to save the input parameters and store the dsm result which is filled by QEMU. The address of dsm memory is decided by bios and patched into int64 object returned by "MEMA" method Signed-off-by: Xiao Guangrong --- hw/acpi/aml-build.c | 12 hw/acpi/nvdimm.c| 24 ++-- include/hw/acpi/aml-build.h | 1 + 3 files changed, 35 insertions(+), 2 deletions(-) diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 78e1290..83eadb3 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -394,6 +394,18 @@ Aml *aml_int(const uint64_t val) } /* + * ACPI 1.0b: 16.2.3 Data Objects Encoding: + * encode: QWordConst + */ +Aml *aml_int64(const uint64_t val) +{ +Aml *var = aml_alloc(); +build_append_byte(var->buf, 0x0E); /* QWordPrefix */ +build_append_int_noprefix(var->buf, val, 8); +return var; +} + +/* * helper to construct NameString, which returns Aml object * for using with aml_append or other aml_* terms */ diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c index bc7cd8f..a72104c 100644 --- a/hw/acpi/nvdimm.c +++ b/hw/acpi/nvdimm.c @@ -28,6 +28,7 @@ #include "hw/acpi/acpi.h" #include "hw/acpi/aml-build.h" +#include "hw/acpi/bios-linker-loader.h" #include "hw/nvram/fw_cfg.h" #include "hw/mem/nvdimm.h" @@ -402,7 +403,8 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, MemoryRegion *io, state->dsm_mem->len); } -#define NVDIMM_COMMON_DSM "NCAL" +#define NVDIMM_GET_DSM_MEM "MEMA" +#define NVDIMM_COMMON_DSM "NCAL" static void nvdimm_build_common_dsm(Aml *dev) { @@ -468,7 +470,8 @@ static void nvdimm_build_ssdt(GSList *device_list, GArray *table_offsets, GArray *table_data, GArray *linker, uint8_t revision) { -Aml *ssdt, *sb_scope, *dev; +Aml *ssdt, *sb_scope, *dev, *method; +int offset; acpi_add_table(table_offsets, table_data); @@ -499,9 +502,26 @@ static void nvdimm_build_ssdt(GSList *device_list, GArray *table_offsets, aml_append(sb_scope, dev); +/* + * leave it at the end of ssdt so that we can conveniently get the + * offset of int64 object returned by the function which will be + * patched with the real address of the dsm memory by BIOS. + */ +method = aml_method(NVDIMM_GET_DSM_MEM, 0, AML_NOTSERIALIZED); +aml_append(method, aml_return(aml_int64(0x0))); there is no need in dedicated aml_int64(), you can use aml_int(0x64) trick We can not do this due to the trick in bios_linker_loader_add_pointer() which will issue a COMMAND_ADD_POINTER to BIOS, however, this request does: /* * COMMAND_ADD_POINTER - patch the table (originating from * @dest_file) at @pointer.offset, by adding a pointer to the table * originating from @src_file. 1,2,4 or 8 byte unsigned * addition is used depending on @pointer.size. */ that means the new-offset = old-offset + the address of the new table allocated by BIOS. So we expect 0 offset here. +aml_append(sb_scope, method); aml_append(ssdt, sb_scope); /* copy AML table into ACPI tables blob and patch header there */ g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len); + +offset = table_data->len - 8; + +bios_linker_loader_alloc(linker, NVDIMM_DSM_MEM_FILE, TARGET_PAGE_SIZE, + false /* high memory */); +bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE, + NVDIMM_DSM_MEM_FILE, table_data, + table_data->data + offset, + sizeof(uint64_t)); this offset magic will break badly as soon as someone add something to the end of SSDT. Yes, it is, so don't do that, :) and this is why we made the comment here: +/* + * leave it at the end of ssdt so that we can conveniently get the + * offset of int64 object returned by the function which will be + * patched with the real address of the dsm memory by BIOS. + */ -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to reserve guest physical region for ACPI
On 01/06/16 14:39, Igor Mammedov wrote: > On Tue, 5 Jan 2016 18:22:33 +0100 > Laszlo Ersekwrote: > >> On 01/05/16 18:08, Igor Mammedov wrote: >>> On Mon, 4 Jan 2016 21:17:31 +0100 >>> Laszlo Ersek wrote: >>> Michael CC'd me on the grandparent of the email below. I'll try to add my thoughts in a single go, with regard to OVMF. On 12/30/15 20:52, Michael S. Tsirkin wrote: > On Wed, Dec 30, 2015 at 04:55:54PM +0100, Igor Mammedov wrote: >> On Mon, 28 Dec 2015 14:50:15 +0200 >> "Michael S. Tsirkin" wrote: >> >>> On Mon, Dec 28, 2015 at 10:39:04AM +0800, Xiao Guangrong wrote: Hi Michael, Paolo, Now it is the time to return to the challenge that how to reserve guest physical region internally used by ACPI. Igor suggested that: | An alternative place to allocate reserve from could be high memory. | For pc we have "reserved-memory-end" which currently makes sure | that hotpluggable memory range isn't used by firmware (https://lists.nongnu.org/archive/html/qemu-devel/2015-11/msg00926.html) OVMF has no support for the "reserved-memory-end" fw_cfg file. The reason is that nobody wrote that patch, nor asked for the patch to be written. (Not implying that just requesting the patch would be sufficient for the patch to be written.) >>> I don't want to tie things to reserved-memory-end because this >>> does not scale: next time we need to reserve memory, >>> we'll need to find yet another way to figure out what is where. >> Could you elaborate a bit more on a problem you're seeing? >> >> To me it looks like it scales rather well. >> For example lets imagine that we adding a device >> that has some on device memory that should be mapped into GPA >> code to do so would look like: >> >> pc_machine_device_plug_cb(dev) >> { >>... >>if (dev == OUR_NEW_DEVICE_TYPE) { >>memory_region_add_subregion(as, current_reserved_end, >mr); >>set_new_reserved_end(current_reserved_end + >> memory_region_size(>mr)); >>} >> } >> >> we can practically add any number of new devices that way. > > Yes but we'll have to build a host side allocator for these, and that's > nasty. We'll also have to maintain these addresses indefinitely (at > least per machine version) as they are guest visible. > Not only that, there's no way for guest to know if we move things > around, so basically we'll never be able to change addresses. > > >> >>> I would like ./hw/acpi/bios-linker-loader.c interface to be extended to >>> support 64 bit RAM instead This looks quite doable in OVMF, as long as the blob to allocate from high memory contains *zero* ACPI tables. ( Namely, each ACPI table is installed from the containing fw_cfg blob with EFI_ACPI_TABLE_PROTOCOL.InstallAcpiTable(), and the latter has its own allocation policy for the *copies* of ACPI tables it installs. This allocation policy is left unspecified in the section of the UEFI spec that governs EFI_ACPI_TABLE_PROTOCOL. The current policy in edk2 (= the reference implementation) seems to be "allocate from under 4GB". It is currently being changed to "try to allocate from under 4GB, and if that fails, retry from high memory". (It is motivated by Aarch64 machines that may have no DRAM at all under 4GB.) ) >>> (and maybe a way to allocate and >>> zero-initialize buffer without loading it through fwcfg), Sounds reasonable. >>> this way bios >>> does the allocation, and addresses can be patched into acpi. >> and then guest side needs to parse/execute some AML that would >> initialize QEMU side so it would know where to write data. > > Well not really - we can put it in a data table, by itself > so it's easy to find. Do you mean acpi_tb_find_table(), acpi_get_table_by_index() / acpi_get_table_with_size()? > > AML is only needed if access from ACPI is desired. > > >> bios-linker-loader is a great interface for initializing some >> guest owned data and linking it together but I think it adds >> unnecessary complexity and is misused if it's used to handle >> device owned data/on device memory in this and VMGID cases. > > I want a generic interface for guest to enumerate these things. linker > seems quite reasonable but if you see a reason why it won't do, or want > to propose a better interface, fine. * The guest could do the following: - while processing the ALLOCATE commands, it would make a note where in GPA space
Re: [PATCH 3/6] nvdimm acpi: introduce patched dsm memory
On Tue, 5 Jan 2016 02:52:05 +0800 Xiao Guangrongwrote: > The dsm memory is used to save the input parameters and store > the dsm result which is filled by QEMU. > > The address of dsm memory is decided by bios and patched into > int64 object returned by "MEMA" method > > Signed-off-by: Xiao Guangrong > --- > hw/acpi/aml-build.c | 12 > hw/acpi/nvdimm.c| 24 ++-- > include/hw/acpi/aml-build.h | 1 + > 3 files changed, 35 insertions(+), 2 deletions(-) > > diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c > index 78e1290..83eadb3 100644 > --- a/hw/acpi/aml-build.c > +++ b/hw/acpi/aml-build.c > @@ -394,6 +394,18 @@ Aml *aml_int(const uint64_t val) > } > > /* > + * ACPI 1.0b: 16.2.3 Data Objects Encoding: > + * encode: QWordConst > + */ > +Aml *aml_int64(const uint64_t val) > +{ > +Aml *var = aml_alloc(); > +build_append_byte(var->buf, 0x0E); /* QWordPrefix */ > +build_append_int_noprefix(var->buf, val, 8); > +return var; > +} > + > +/* > * helper to construct NameString, which returns Aml object > * for using with aml_append or other aml_* terms > */ > diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c > index bc7cd8f..a72104c 100644 > --- a/hw/acpi/nvdimm.c > +++ b/hw/acpi/nvdimm.c > @@ -28,6 +28,7 @@ > > #include "hw/acpi/acpi.h" > #include "hw/acpi/aml-build.h" > +#include "hw/acpi/bios-linker-loader.h" > #include "hw/nvram/fw_cfg.h" > #include "hw/mem/nvdimm.h" > > @@ -402,7 +403,8 @@ void nvdimm_init_acpi_state(AcpiNVDIMMState *state, > MemoryRegion *io, > state->dsm_mem->len); > } > > -#define NVDIMM_COMMON_DSM "NCAL" > +#define NVDIMM_GET_DSM_MEM "MEMA" > +#define NVDIMM_COMMON_DSM "NCAL" > > static void nvdimm_build_common_dsm(Aml *dev) > { > @@ -468,7 +470,8 @@ static void nvdimm_build_ssdt(GSList *device_list, GArray > *table_offsets, >GArray *table_data, GArray *linker, >uint8_t revision) > { > -Aml *ssdt, *sb_scope, *dev; > +Aml *ssdt, *sb_scope, *dev, *method; > +int offset; > > acpi_add_table(table_offsets, table_data); > > @@ -499,9 +502,26 @@ static void nvdimm_build_ssdt(GSList *device_list, > GArray *table_offsets, > > aml_append(sb_scope, dev); > > +/* > + * leave it at the end of ssdt so that we can conveniently get the > + * offset of int64 object returned by the function which will be > + * patched with the real address of the dsm memory by BIOS. > + */ > +method = aml_method(NVDIMM_GET_DSM_MEM, 0, AML_NOTSERIALIZED); > +aml_append(method, aml_return(aml_int64(0x0))); there is no need in dedicated aml_int64(), you can use aml_int(0x64) trick > +aml_append(sb_scope, method); > aml_append(ssdt, sb_scope); > /* copy AML table into ACPI tables blob and patch header there */ > g_array_append_vals(table_data, ssdt->buf->data, ssdt->buf->len); > + > +offset = table_data->len - 8; > + > +bios_linker_loader_alloc(linker, NVDIMM_DSM_MEM_FILE, TARGET_PAGE_SIZE, > + false /* high memory */); > +bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE, > + NVDIMM_DSM_MEM_FILE, table_data, > + table_data->data + offset, > + sizeof(uint64_t)); this offset magic will break badly as soon as someone add something to the end of SSDT. > build_header(linker, table_data, > (void *)(table_data->data + table_data->len - ssdt->buf->len), > "SSDT", ssdt->buf->len, revision, "NVDIMM"); > diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h > index ef44d02..b4726a4 100644 > --- a/include/hw/acpi/aml-build.h > +++ b/include/hw/acpi/aml-build.h > @@ -246,6 +246,7 @@ Aml *aml_name(const char *name_format, ...) > GCC_FMT_ATTR(1, 2); > Aml *aml_name_decl(const char *name, Aml *val); > Aml *aml_return(Aml *val); > Aml *aml_int(const uint64_t val); > +Aml *aml_int64(const uint64_t val); > Aml *aml_arg(int pos); > Aml *aml_to_integer(Aml *arg); > Aml *aml_to_hexstring(Aml *src, Aml *dst); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 110441] New: KVM guests randomly get I/O errors on VirtIO based devices
https://bugzilla.kernel.org/show_bug.cgi?id=110441 Bug ID: 110441 Summary: KVM guests randomly get I/O errors on VirtIO based devices Product: Virtualization Version: unspecified Kernel Version: 3.16.7-ckt11-1+deb8u5 Hardware: All OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: kvm Assignee: virtualization_...@kernel-bugs.osdl.org Reporter: jordi.mall...@collabora.co.uk Regression: No We've been seeing a strange bug in KVM guests hosted by a Debian jessie box (running 3.16.7-ckt11-1+deb8u5 on x86-64), Basically, we are getting random VirtIO errors inside our guests, resulting in stuff like this [4735406.568235] blk_update_request: I/O error, dev vda, sector 142339584 [4735406.572008] EXT4-fs warning (device dm-0): ext4_end_bio:317: I/O error -5 writing to inode 1184437 (offset 0 size 208896 starting block 17729472) [4735406.572008] Buffer I/O error on device dm-0, logical block 17729472 [ ... ] [4735406.572008] Buffer I/O error on device dm-0, logical block 17729481 [4735406.643486] blk_update_request: I/O error, dev vda, sector 142356480 [ ... ] [4735406.748456] blk_update_request: I/O error, dev vda, sector 38587480 [4735411.020309] Buffer I/O error on dev dm-0, logical block 12640808, lost sync page write [4735411.055184] Aborting journal on device dm-0-8. [4735411.056148] Buffer I/O error on dev dm-0, logical block 12615680, lost sync page write [4735411.057626] JBD2: Error -5 detected when updating journal superblock for dm-0-8. [4735411.057936] Buffer I/O error on dev dm-0, logical block 0, lost sync page write [4735411.057946] EXT4-fs error (device dm-0): ext4_journal_check_start:56: Detected aborted journal [4735411.057948] EXT4-fs (dm-0): Remounting filesystem read-only [4735411.057949] EXT4-fs (dm-0): previous I/O error to superblock detected (From an Ubuntu 15.04 guest, EXT4 on LVM2) Or, Jan 06 03:39:11 titanium kernel: end_request: I/O error, dev vda, sector 1592467904 Jan 06 03:39:11 titanium kernel: EXT4-fs warning (device vda3): ext4_end_bio:317: I/O error -5 writing to inode 31169653 (offset 0 size 0 starting block 199058492) Jan 06 03:39:11 titanium kernel: Buffer I/O error on device vda3, logical block 198899256 [...] Jan 06 03:39:12 titanium kernel: Aborting journal on device vda3-8. Jan 06 03:39:12 titanium kernel: Buffer I/O error on device vda3, logical block 99647488 (From a Debian jessie guest, EXT4 directly on a VirtIO-based block device) When this happens, it affects multiple guests on the hosts at the same time. Normally they are severe enough that they end up with a r/o file system, but we've seen a few hosts survive with a non-fatal I/O error. The host's dmesg has nothing interesting to see. We've seen this happen with quite heterogeneous guests: Debian 6, 7 and 8 (Debian kernels 2.6.32, 3.2 and 3.16) Ubuntu 14.09 and 15.04 (Ubuntu kernels) 32 bit and 64 bit installs. In short, we haven't seen a clear characteristic in any guest, other than the affected hosts being the ones with some sustained I/O load (build machines, cgit servers, PostgreSQL RDBMs...). Most of the times, hosts that just sit there doing nothing with their disks are not affected. The host is a stock Debian jessie install that manages libvirt-based QEMU guests. All the guests have their block devices using virtio drivers, some of them on spinning media based on LSI RAID (was a 3ware card before, got replaced as we were very suspicious about it, but are getting the same results), and some of them based on PCIe SSD storage. We have some other 3 hosts, similar setup except they run Debian wheezy (and honestly we're not too keen on upgrading them yet, just in case), none of them has ever shown this kind of problem We've been seeing this since last summer, and haven't found a pattern that tells us where these I/O error bugs are coming from. Google isn't revealing other people with a similar problem, and we're finding that quite surprising as our setup is quite basic. This has also been reported downstream at the Debian BTS as Bug#810121 (https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=810121). -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: HAPPY NEW YEAR
From: Carl Leinbach Sent: Wednesday, January 06, 2016 4:21 PM To: Carl Leinbach Subject: HAPPY NEW YEAR Donation has been made to you by Mrs. Liliane Bettencourt. Contact email: mrslilianebettencou...@gmail.com for more details -- To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: HAPPY NEW YEAR
From: Carl Leinbach Sent: Wednesday, January 06, 2016 4:21 PM To: Carl Leinbach Subject: HAPPY NEW YEAR Donation has been made to you by Mrs. Liliane Bettencourt. Contact email: mrslilianebettencou...@gmail.com for more details -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
What's kvmclock's custom sched_clock for?
AFAICT KVM reliably passes a monotonic TSC through to guests, even if the host suspends. That's all that sched_clock needs, I think. So why does kvmclock have a custom sched_clock? On a related note, KVM doesn't pass the "invariant TSC" feature through to guests on my machine even though "invtsc" is set in QEMU and the kernel host code appears to support it. What gives? --Andy -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/4] scsi: cleanup ioctl headers and provide UAPI versions
> "Paolo" == Paolo Bonziniwrites: >> This is v3 of the series to provide an "official" sg.h header (and >> scsi_ioctl.h too, though it's basically obsolete) together with the >> other userspace API definitions. The change from v2 to v3 is that >> defaults for sg.c are not exported in include/uapi/linux/sg.c. Paolo> What happened to these patches?... They predate me being patch monkey. Please repost with any review tags or acks you may have received. -- Martin K. Petersen Oracle Linux Engineering -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html