Re: [Qemu-devel] [PATCH 3/5] Add migration functions for VFIO devices
It's nice to see cloud vendors are also quite interested in VFIO migration interfaces and functions. From what Yan said and Huawei's requirements, there should be more devices which don't have private memory, maybe GPU is almost the only one which has the private memory. As VFIO is a generic user-space device controlling interfaces nowadays in the kernel and perhaps becomes into an standard in future, I guess we also need to think more about a generic framework and how to let the non-GPU devices to step into VFIO easily. From perspective of the vendors of the devices and the cloud vendors who want to build their migration support on top of VFIO, it would be nice to have a simple and friendly path for them. Thanks, Zhi. On 12/18/18 9:12 PM, Zhao Yan wrote: right, a capabilities field in struct vfio_device_migration_info can avoid populating iteration APIs and migration states into every vendor drivers who actually may not requires those APIs and simply do nothing or return value 0 in response to those APIs. struct vfio_device_migration_info { __u32 device_state; /* VFIO device state */ + __u32 capabilities;/* VFIO device capabilities */ struct { __u64 precopy_only; __u64 compatible; __u64 postcopy_only; __u64 threshold_size; } pending; ... }; So, only for devices who need iteration APIs, like GPU with standalone video memory, can set flag VFIO_MIGRATION_HAS_ITERTATION to this capabilities field. Then callbacks like save_live_iterate(), is_active_iterate(), save_live_pending() will check the flag VFIO_MIGRATION_HAS_ITERTATION in capabilities field and send requests into vendor driver. But, for simple devices who only use system memory, like IGD and NIC, will not set the flag VFIO_MIGRATION_HAS_ITERTATION, and as a result, no need to handle requests like "Get buffer", "Set buffer", "Get pending bytes" triggered by QEMU iteration callbacks. And therefore, detailed migration states are not cared for vendor drivers for these devices. Thanks to Gonglei for providing this idea and details. Free free to give your comments to the above description. On Mon, Dec 17, 2018 at 11:19:49AM +, Gonglei (Arei) wrote: Hi, It's great to see this patch series, which is a very important step, although currently only consider GPU mdev devices to support hot migration. However, this is based on the VFIO framework after all, so we expect that we can make this live migration framework more general. For example, the vfio_save_pending() callback is used to obtain device memory (such as GPU memory), but if the device (such as network card) has no special proprietary memory, but only system memory? It is too much to perform a null operation for this kind of device by writing memory to the vendor driver of kernel space. I think we can acquire the capability from the vendor driver before using this. If there is device memory that needs iterative copying, the vendor driver return ture, otherwise return false. Then QEMU implement the specific logic, otherwise return directly. Just like getting the capability list of KVM module, can we? Regards, -Gonglei -Original Message- From: Qemu-devel [mailto:qemu-devel-bounces+arei.gonglei=huawei@nongnu.org] On Behalf Of Kirti Wankhede Sent: Wednesday, November 21, 2018 4:40 AM To: alex.william...@redhat.com; c...@nvidia.com Cc: zhengxiao...@alibaba-inc.com; kevin.t...@intel.com; yi.l@intel.com; eskul...@redhat.com; ziye.y...@intel.com; qemu-devel@nongnu.org; coh...@redhat.com; shuangtai@alibaba-inc.com; dgilb...@redhat.com; zhi.a.w...@intel.com; mlevi...@redhat.com; pa...@linux.ibm.com; a...@ozlabs.ru; Kirti Wankhede ; eau...@redhat.com; fel...@nutanix.com; jonathan.dav...@nutanix.com; changpeng@intel.com; ken@amd.com Subject: [Qemu-devel] [PATCH 3/5] Add migration functions for VFIO devices - Migration function are implemented for VFIO_DEVICE_TYPE_PCI device. - Added SaveVMHandlers and implemented all basic functions required for live migration. - Added VM state change handler to know running or stopped state of VM. - Added migration state change notifier to get notification on migration state change. This state is translated to VFIO device state and conveyed to vendor driver. - VFIO device supportd migration or not is decided based of migration region query. If migration region query is successful then migration is supported else migration is blocked. - Structure vfio_device_migration_info is mapped at 0th offset of migration region and should always trapped by VFIO device's driver. Added both type of access support, trapped or mmapped, for data section of the region. - To save device state, read data offset and size using structure vfio_device_migration_info.data, accordingly copy data from the region. - To restore device state, write data offset and size in the structure and write data in the region.
Re: [Qemu-devel] travis failures
On 2018-12-21 04:06, Alexey Kardashevskiy wrote: > Hi > > I am trying https://travis-ci.org/aik/qemu/ and that thing fails every > time I am not so sure why. > > One example: > https://travis-ci.org/aik/qemu/jobs/470796318 > > The errors are like this: > > GTESTER check-qtest-unicore32 > GTESTER check-qtest-x86_64 > Could not access KVM kernel module: No such file or directory > qemu-system-x86_64: failed to initialize KVM: No such file or directory > qemu-system-x86_64: Back to tcg accelerator > > > Does anyone else see those? How do we fix them? Thanks. Some test are explicitly requesting "-M accel=kvm:tcg" and this is causing this message if KVM is not available. We could maybe silence them if qtest_enabled() ? Thomas
[Qemu-devel] [PATCH 15/15] spapr: add hotplug hooks for PHB hotplug
From: Michael Roth Hotplugging PHBs is a machine-level operation, but PHBs reside on the main system bus, so we register spapr machine as the handler for the main system bus. We re-get the phandle of the interrupt controller systematically for simplicity. Signed-off-by: Michael Roth Signed-off-by: Greg Kurz --- hw/ppc/spapr.c | 147 hw/ppc/spapr_drc.c |1 hw/ppc/spapr_pci.c | 16 - include/hw/ppc/spapr.h |1 4 files changed, 149 insertions(+), 16 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 5c405a5fafca..065c9f19700e 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2923,6 +2923,10 @@ static void spapr_machine_init(MachineState *machine) register_savevm_live(NULL, "spapr/htab", -1, 1, _htab_handlers, spapr); +if (smc->dr_phb_enabled) { +qbus_set_hotplug_handler(sysbus_get_default(), OBJECT(machine), NULL); +} + qemu_register_boot_set(spapr_boot_set, spapr); if (kvm_enabled()) { @@ -3716,6 +3720,135 @@ out: error_propagate(errp, local_err); } +static void spapr_phb_pre_plug(HotplugHandler *hotplug_dev, DeviceState *dev, + Error **errp) +{ +sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev)); +sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(dev); +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); +const unsigned windows_supported = spapr_phb_windows_supported(sphb); + +if (sphb->index == (uint32_t)-1) { +error_setg(errp, "\"index\" for PAPR PHB is mandatory"); +return; +} + +/* + * This will check that sphb->index doesn't exceed the maximum number of + * PHBs for the current machine type. + */ +smc->phb_placement(spapr, sphb->index, + >buid, >io_win_addr, + >mem_win_addr, >mem64_win_addr, + windows_supported, sphb->dma_liobn, errp); +} + +static void spapr_phb_plug(HotplugHandler *hotplug_dev, DeviceState *dev, + Error **errp) +{ +sPAPRMachineState *spapr = SPAPR_MACHINE(OBJECT(hotplug_dev)); +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); +sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(dev); +void *fdt = NULL; +int fdt_start_offset; +int fdt_size; +Error *local_err = NULL; +sPAPRDRConnector *drc; +int ret; +bool hotplugged = spapr_drc_hotplugged(dev); +int offset, phandle = 0; +gchar *nodename = NULL; + +if (!smc->dr_phb_enabled) { +return; +} + +drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PHB, sphb->index); +/* hotplug hooks should check it's enabled before getting this far */ +assert(drc); + +if (hotplugged) { +if (spapr->fdt_blob) { +/* + * SLOF might have pushed an updated FDT with new phandle values. + * Re-get the one of our interrupt controller. + */ +nodename = spapr->irq->get_nodename(spapr); + +offset = fdt_subnode_offset(spapr->fdt_blob, 0, nodename); +if (offset < 0) { +error_setg(errp, "Can't find node \"%s\": %s", + nodename, fdt_strerror(offset)); +goto out; +} + +phandle = fdt_get_phandle(spapr->fdt_blob, offset); +if (phandle < 0) { +error_setg(errp, "Can't get phandle of node \"%s\": %s", + nodename, fdt_strerror(offset)); +goto out; +} +} +DEVICE_GET_CLASS(dev)->reset(dev); +} + +/* For cold-plugged at initial boot and fallback for hotplug */ +if (!phandle) { +phandle = PHANDLE_XICP; +} + +fdt = create_device_tree(_size); +ret = spapr_populate_pci_dt(sphb, phandle, fdt, spapr->irq->nr_msis, +_start_offset); +if (ret < 0) { +error_setg(_err, "unable to create FDT for %sPHB", + dev->hotplugged ? "hotplugged " : ""); +goto out; +} + +if (hotplugged) { +/* generally SLOF creates these, for hotplug it's up to QEMU */ +_FDT(fdt_setprop_string(fdt, fdt_start_offset, "name", "pci")); +} + +spapr_drc_attach(drc, DEVICE(dev), fdt, fdt_start_offset, _err); + +out: +g_free(nodename); + +if (local_err) { +error_propagate(errp, local_err); +g_free(fdt); +return; +} + +if (hotplugged) { +spapr_hotplug_req_add_by_index(drc); +} else if (drc) { +spapr_drc_reset(drc); +} +} + +void spapr_phb_release(DeviceState *dev) +{ +object_unparent(OBJECT(dev)); +} + +static void spapr_phb_unplug_request(HotplugHandler *hotplug_dev, + DeviceState *dev, Error **errp) +{ +sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(dev); +sPAPRDRConnector *drc; + +drc =
[Qemu-devel] [PATCH 14/15] spapr: Expose the name of the interrupt controller node
This will be needed by PHB hotplug in order to access the phandle property. Signed-off-by: Greg Kurz --- hw/intc/spapr_xive.c|9 +++-- hw/intc/xics_spapr.c|9 - hw/ppc/spapr_irq.c |3 +++ include/hw/ppc/spapr_irq.h |1 + include/hw/ppc/spapr_xive.h |1 + include/hw/ppc/xics.h |1 + 6 files changed, 21 insertions(+), 3 deletions(-) diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c index 87424de26c1c..0540aac88d2a 100644 --- a/hw/intc/spapr_xive.c +++ b/hw/intc/spapr_xive.c @@ -1410,6 +1410,12 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr) spapr_register_hypercall(H_INT_RESET, h_int_reset); } +gchar *spapr_xive_get_nodename(sPAPRMachineState *spapr) +{ +return g_strdup_printf("interrupt-controller@%" PRIx64, +spapr->xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT)); +} + void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt, uint32_t phandle) { @@ -1444,8 +1450,7 @@ void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt, XIVE_TM_OS_PAGE * (1ull << TM_SHIFT)); timas[3] = cpu_to_be64(1ull << TM_SHIFT); -nodename = g_strdup_printf("interrupt-controller@%" PRIx64, - xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT)); +nodename = spapr_xive_get_nodename(spapr); _FDT(node = fdt_add_subnode(fdt, 0, nodename)); g_free(nodename); diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c index f67d3c80bf3a..75d40daf518d 100644 --- a/hw/intc/xics_spapr.c +++ b/hw/intc/xics_spapr.c @@ -244,6 +244,13 @@ void xics_spapr_init(sPAPRMachineState *spapr) spapr_register_hypercall(H_IPOLL, h_ipoll); } +#define NODENAME "interrupt-controller" + +gchar *spapr_xics_get_nodename(sPAPRMachineState *spapr) +{ +return g_strdup(NODENAME); +} + void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt, uint32_t phandle) { @@ -252,7 +259,7 @@ void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt, }; int node; -_FDT(node = fdt_add_subnode(fdt, 0, "interrupt-controller")); +_FDT(node = fdt_add_subnode(fdt, 0, NODENAME)); _FDT(fdt_setprop_string(fdt, node, "device_type", "PowerPC-External-Interrupt-Presentation")); diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c index 0999a2b2d69c..703c3a3c20d5 100644 --- a/hw/ppc/spapr_irq.c +++ b/hw/ppc/spapr_irq.c @@ -223,6 +223,7 @@ sPAPRIrq spapr_irq_xics = { .qirq= spapr_qirq_xics, .print_info = spapr_irq_print_info_xics, .dt_populate = spapr_dt_xics, +.get_nodename = spapr_xics_get_nodename, .cpu_intc_create = spapr_irq_cpu_intc_create_xics, .post_load = spapr_irq_post_load_xics, }; @@ -349,6 +350,7 @@ sPAPRIrq spapr_irq_xive = { .qirq= spapr_qirq_xive, .print_info = spapr_irq_print_info_xive, .dt_populate = spapr_dt_xive, +.get_nodename = spapr_xive_get_nodename, .cpu_intc_create = spapr_irq_cpu_intc_create_xive, .post_load = spapr_irq_post_load_xive, .reset = spapr_irq_reset_xive, @@ -462,6 +464,7 @@ sPAPRIrq spapr_irq_xics_legacy = { .qirq= spapr_qirq_xics, .print_info = spapr_irq_print_info_xics, .dt_populate = spapr_dt_xics, +.get_nodename = spapr_xics_get_nodename, .cpu_intc_create = spapr_irq_cpu_intc_create_xics, .post_load = spapr_irq_post_load_xics, }; diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h index b34d5a00381b..59a1cf8bbc1d 100644 --- a/include/hw/ppc/spapr_irq.h +++ b/include/hw/ppc/spapr_irq.h @@ -42,6 +42,7 @@ typedef struct sPAPRIrq { void (*print_info)(sPAPRMachineState *spapr, Monitor *mon); void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt, uint32_t phandle); +gchar *(*get_nodename)(sPAPRMachineState *spapr); Object *(*cpu_intc_create)(sPAPRMachineState *spapr, Object *cpu, Error **errp); int (*post_load)(sPAPRMachineState *spapr, int version_id); diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h index 728735dbcfbe..d280310ed587 100644 --- a/include/hw/ppc/spapr_xive.h +++ b/include/hw/ppc/spapr_xive.h @@ -47,6 +47,7 @@ typedef struct sPAPRMachineState sPAPRMachineState; void spapr_xive_hcall_init(sPAPRMachineState *spapr); void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt, uint32_t phandle); +gchar *spapr_xive_get_nodename(sPAPRMachineState *spapr); void spapr_xive_set_tctx_os_cam(XiveTCTX *tctx); #endif /* PPC_SPAPR_XIVE_H */ diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h index 14afda198cdb..eafb6428787f 100644 --- a/include/hw/ppc/xics.h +++ b/include/hw/ppc/xics.h @@ -204,6 +204,7 @@ typedef struct sPAPRMachineState
Re: [Qemu-devel] is the "tcg translation" necessary when the "kvm acceleration" emulation mode enabled?
> scenario 1: did the tcg translation need to be done in this case now > that the host and target arch is the same? or let the kvm emulation > the system wit the original instructions with out the TCG > translation TCG is turned off when KVM is enabled. The code for TCG does not run at all. > scenario 2: the pre condition is same with scenario 1 except the kvm > is disable? so ,in this scenario, the TCG must be used in order to > the pure software emulation without acceleration? Correct. In this case KVM is turned off. The capabilities of the emulated CPU will _not_ be what you have in your host processor, but rather what QEMU implements. For example, you will get no AVX. > scenario 3: in this scenario, the host and target arch is not the > same, so how to use the "kvm mechainsim" in this case? so the > instructions feed to the kvm module to run must be translated By TCG > module? right? KVM cannot be used in this case; KVM is only enabled when the host and the target are the same. Paolo On 21/12/18 02:57, tugouxp wrote: > hi folks: > i am very puzzled about the relationship between "target cpu instruction" > translated to host instructions through TCG module and the "kvm" > acceleration" mode. > > > think about three scenario of emulation: > scenario 1, 2 and 3 as follows: > > 1. target cpu: x86_64, > host cpu: x84_64, > emulation OS: ubuntu_desktop_amd64.iso > kvm: enabled. > > > 2. target cpu: x86_64, > > host cpu: x84_64, > > emulation OS: ubuntu_desktop_amd64.iso > kvm: disabled. > > > 3. target cpu x86_64, > host cpu: armv8 > emulationOS: ubuntu_desktop_amd64.iso > kvm: enabled
[Qemu-devel] [PATCH] i386: remove the 'INTEL_PT' CPUID bit from named CPU models
From: Robert Hoo Processor tracing is not yet implemented for KVM and it will be an opt in feature requiring a special module parameter. Disable it, because it is wrong to enable it by default and it is impossible that no one has ever used it. Cc: qemu-sta...@nongnu.org Signed-off-by: Paolo Bonzini --- target/i386/cpu.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index dae46f0319..9c54c41e7a 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -2493,8 +2493,7 @@ static X86CPUDefinition builtin_x86_defs[] = { CPUID_7_0_EBX_SMAP | CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_AVX512F | CPUID_7_0_EBX_AVX512DQ | CPUID_7_0_EBX_AVX512BW | CPUID_7_0_EBX_AVX512CD | -CPUID_7_0_EBX_AVX512VL | CPUID_7_0_EBX_CLFLUSHOPT | -CPUID_7_0_EBX_INTEL_PT, +CPUID_7_0_EBX_AVX512VL | CPUID_7_0_EBX_CLFLUSHOPT, .features[FEAT_7_0_ECX] = CPUID_7_0_ECX_PKU | CPUID_7_0_ECX_OSPKE | CPUID_7_0_ECX_AVX512VNNI, @@ -2546,7 +2545,7 @@ static X86CPUDefinition builtin_x86_defs[] = { CPUID_7_0_EBX_HLE | CPUID_7_0_EBX_AVX2 | CPUID_7_0_EBX_SMEP | CPUID_7_0_EBX_BMI2 | CPUID_7_0_EBX_ERMS | CPUID_7_0_EBX_INVPCID | CPUID_7_0_EBX_RTM | CPUID_7_0_EBX_RDSEED | CPUID_7_0_EBX_ADX | -CPUID_7_0_EBX_SMAP | CPUID_7_0_EBX_INTEL_PT, +CPUID_7_0_EBX_SMAP, .features[FEAT_7_0_ECX] = CPUID_7_0_ECX_VBMI | CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_PKU | CPUID_7_0_ECX_OSPKE | CPUID_7_0_ECX_VBMI2 | CPUID_7_0_ECX_GFNI | @@ -2604,8 +2603,7 @@ static X86CPUDefinition builtin_x86_defs[] = { CPUID_7_0_EBX_SMAP | CPUID_7_0_EBX_CLWB | CPUID_7_0_EBX_AVX512F | CPUID_7_0_EBX_AVX512DQ | CPUID_7_0_EBX_AVX512BW | CPUID_7_0_EBX_AVX512CD | -CPUID_7_0_EBX_AVX512VL | CPUID_7_0_EBX_CLFLUSHOPT | -CPUID_7_0_EBX_INTEL_PT, +CPUID_7_0_EBX_AVX512VL | CPUID_7_0_EBX_CLFLUSHOPT, .features[FEAT_7_0_ECX] = CPUID_7_0_ECX_VBMI | CPUID_7_0_ECX_UMIP | CPUID_7_0_ECX_PKU | CPUID_7_0_ECX_OSPKE | CPUID_7_0_ECX_VBMI2 | CPUID_7_0_ECX_GFNI | -- 2.20.1
Re: [Qemu-devel] [PATCH 1/2] i386: remove the new CPUID 'PCONFIG' from Icelake-Server CPU model
On 20/12/18 13:50, Robert Hoo wrote: > On Thu, 2018-12-20 at 13:38 +0100, Paolo Bonzini wrote: >> On 20/12/18 01:18, Robert Hoo wrote: >>> I think the sooner, the better. Take the time window that Icelake >>> CPU >>> model has just shipped with QEMU 3.1.0 and is not publicly/widely >>> used >>> yet. >> >> We should still leave it in the 3.1 machine types. I've just sent a >> patch to do the same with MPX. >> > I took a look your patch of "Disable MPX support on named CPU models". > Seems you do the same as I do to PCONFIG. So you agree with my above > patch?:-) > > I won't object that keep it in 3.1 machine type as you do to MPX. Sorry Robert, I changed my mind. If no hypervisor exists that enables PCONFIG for guests (using the PCONFIG_ENABLE processor control), effectively no one can ever have used it. We should disable it in all machine types and Cc qemu-stable. In fact, the same is true for INTEL_PT, which is not supported by any released kernel version and, even is going to be available only with a module parameter when it will be. This is not the same as MPX, which did work even though nobody was probably using it. So this series is correct and I will follow up with one for INTEL_PT; however, this begs the question of how the patches are being tested. Paolo
[Qemu-devel] [PULL 39/40] spapr: change default CPU type to POWER9
From: Cédric Le Goater Signed-off-by: Cédric Le Goater Signed-off-by: David Gibson --- hw/ppc/spapr.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 65c6065602..19a07c5c9d 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -3931,7 +3931,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) hc->unplug = spapr_machine_device_unplug; smc->dr_lmb_enabled = true; -mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0"); +mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.0"); mc->has_hotpluggable_cpus = true; smc->resize_hpt_default = SPAPR_RESIZE_HPT_ENABLED; fwc->get_dev_path = spapr_get_fw_dev_path; @@ -4028,6 +4028,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc) { spapr_machine_4_0_class_options(mc); SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_3_1); +mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0"); } DEFINE_SPAPR_MACHINE(3_1, "3.1", false); -- 2.19.2
[Qemu-devel] [PULL 31/40] spapr: introduce a new machine IRQ backend for XIVE
From: Cédric Le Goater The XIVE IRQ backend uses the same layout as the new XICS backend but covers the full range of the IRQ number space. The IRQ numbers for the CPU IPIs are allocated at the bottom of this space, below 4K, to preserve compatibility with XICS which does not use that range. This should be enough given that the maximum number of CPUs is 1024 for the sPAPR machine under QEMU. For the record, the biggest POWER8 or POWER9 system has a maximum of 1536 HW threads (16 sockets, 192 cores, SMT8). Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: David Gibson --- hw/ppc/spapr_irq.c | 93 ++ include/hw/ppc/spapr.h | 2 + include/hw/ppc/spapr_irq.h | 2 + 3 files changed, 97 insertions(+) diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c index f8b651de0e..1f5aac55d3 100644 --- a/hw/ppc/spapr_irq.c +++ b/hw/ppc/spapr_irq.c @@ -12,6 +12,7 @@ #include "qemu/error-report.h" #include "qapi/error.h" #include "hw/ppc/spapr.h" +#include "hw/ppc/spapr_xive.h" #include "hw/ppc/xics.h" #include "sysemu/kvm.h" @@ -205,6 +206,98 @@ sPAPRIrq spapr_irq_xics = { .print_info = spapr_irq_print_info_xics, }; +/* + * XIVE IRQ backend. + */ +static void spapr_irq_init_xive(sPAPRMachineState *spapr, Error **errp) +{ +MachineState *machine = MACHINE(spapr); +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); +uint32_t nr_servers = spapr_max_server_number(spapr); +DeviceState *dev; +int i; + +/* KVM XIVE device not yet available */ +if (kvm_enabled()) { +if (machine_kernel_irqchip_required(machine)) { +error_setg(errp, "kernel_irqchip requested. no KVM XIVE support"); +return; +} +} + +dev = qdev_create(NULL, TYPE_SPAPR_XIVE); +qdev_prop_set_uint32(dev, "nr-irqs", smc->irq->nr_irqs); +/* + * 8 XIVE END structures per CPU. One for each available priority + */ +qdev_prop_set_uint32(dev, "nr-ends", nr_servers << 3); +qdev_init_nofail(dev); + +spapr->xive = SPAPR_XIVE(dev); + +/* Enable the CPU IPIs */ +for (i = 0; i < nr_servers; ++i) { +spapr_xive_irq_claim(spapr->xive, SPAPR_IRQ_IPI + i, false); +} +} + +static int spapr_irq_claim_xive(sPAPRMachineState *spapr, int irq, bool lsi, +Error **errp) +{ +if (!spapr_xive_irq_claim(spapr->xive, irq, lsi)) { +error_setg(errp, "IRQ %d is invalid", irq); +return -1; +} +return 0; +} + +static void spapr_irq_free_xive(sPAPRMachineState *spapr, int irq, int num) +{ +int i; + +for (i = irq; i < irq + num; ++i) { +spapr_xive_irq_free(spapr->xive, i); +} +} + +static qemu_irq spapr_qirq_xive(sPAPRMachineState *spapr, int irq) +{ +return spapr_xive_qirq(spapr->xive, irq); +} + +static void spapr_irq_print_info_xive(sPAPRMachineState *spapr, + Monitor *mon) +{ +CPUState *cs; + +CPU_FOREACH(cs) { +PowerPCCPU *cpu = POWERPC_CPU(cs); + +xive_tctx_pic_print_info(XIVE_TCTX(cpu->intc), mon); +} + +spapr_xive_pic_print_info(spapr->xive, mon); +} + +/* + * XIVE uses the full IRQ number space. Set it to 8K to be compatible + * with XICS. + */ + +#define SPAPR_IRQ_XIVE_NR_IRQS 0x2000 +#define SPAPR_IRQ_XIVE_NR_MSIS (SPAPR_IRQ_XIVE_NR_IRQS - SPAPR_IRQ_MSI) + +sPAPRIrq spapr_irq_xive = { +.nr_irqs = SPAPR_IRQ_XIVE_NR_IRQS, +.nr_msis = SPAPR_IRQ_XIVE_NR_MSIS, + +.init= spapr_irq_init_xive, +.claim = spapr_irq_claim_xive, +.free= spapr_irq_free_xive, +.qirq= spapr_qirq_xive, +.print_info = spapr_irq_print_info_xive, +}; + /* * sPAPR IRQ frontend routines for devices */ diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index 198764066d..cb3082d319 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -16,6 +16,7 @@ typedef struct sPAPREventLogEntry sPAPREventLogEntry; typedef struct sPAPREventSource sPAPREventSource; typedef struct sPAPRPendingHPT sPAPRPendingHPT; typedef struct ICSState ICSState; +typedef struct sPAPRXive sPAPRXive; #define HPTE64_V_HPTE_DIRTY 0x0040ULL #define SPAPR_ENTRY_POINT 0x100 @@ -175,6 +176,7 @@ struct sPAPRMachineState { const char *icp_type; int32_t irq_map_nr; unsigned long *irq_map; +sPAPRXive *xive; bool cmd_line_caps[SPAPR_CAP_NUM]; sPAPRCapabilities def, eff, mig; diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h index bd7301e6d9..23cdb51b87 100644 --- a/include/hw/ppc/spapr_irq.h +++ b/include/hw/ppc/spapr_irq.h @@ -13,6 +13,7 @@ /* * IRQ range offsets per device type */ +#define SPAPR_IRQ_IPI0x0 #define SPAPR_IRQ_EPOW 0x1000 /* XICS_IRQ_BASE offset */ #define SPAPR_IRQ_HOTPLUG0x1001 #define SPAPR_IRQ_VIO0x1100 /* 256 VIO devices */ @@ -42,6 +43,7 @@ typedef struct sPAPRIrq {
[Qemu-devel] [PULL 38/40] spapr: introduce an 'ic-mode' machine option
From: Cédric Le Goater This option is used to select the interrupt controller mode (XICS or XIVE) with which the machine will operate. XICS being the default mode for now. When running a machine with the XIVE interrupt mode backend, the guest OS is required to have support for the XIVE exploitation mode. In the case of legacy OS, the mode selected by CAS should be XICS and the OS should fail to boot. However, QEMU could possibly detect it, terminate the boot process and reset to stop in the SLOF firmware. This is not yet handled. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: David Gibson --- hw/ppc/spapr.c | 50 +++-- hw/ppc/spapr_cpu_core.c | 3 +-- hw/ppc/spapr_irq.c | 34 +--- include/hw/ppc/spapr.h | 1 + 4 files changed, 55 insertions(+), 33 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 2f87c8ba19..65c6065602 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1104,10 +1104,9 @@ static void spapr_dt_ov5_platform_support(sPAPRMachineState *spapr, void *fdt, int chosen) { PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu); -sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); char val[2 * 4] = { -23, smc->irq->ov5, /* Xive mode. */ +23, spapr->irq->ov5, /* Xive mode. */ 24, 0x00, /* Hash/Radix, filled in below. */ 25, 0x00, /* Hash options: Segment Tables == no, GTSE == no. */ 26, 0x40, /* Radix options: GTSE == yes. */ @@ -1276,7 +1275,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr, _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2)); /* /interrupt controller */ -smc->irq->dt_populate(spapr, spapr_max_server_number(spapr), fdt, +spapr->irq->dt_populate(spapr, spapr_max_server_number(spapr), fdt, PHANDLE_XICP); ret = spapr_populate_memory(spapr, fdt); @@ -1297,7 +1296,8 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr, } QLIST_FOREACH(phb, >phbs, list) { -ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt, smc->irq->nr_msis); +ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt, +spapr->irq->nr_msis); if (ret < 0) { error_report("couldn't setup PCI devices in fdt"); exit(1); @@ -2633,7 +2633,7 @@ static void spapr_machine_init(MachineState *machine) spapr_ovec_set(spapr->ov5, OV5_DRMEM_V2); /* advertise XIVE on POWER9 machines */ -if (smc->irq->ov5 & SPAPR_OV5_XIVE_EXPLOIT) { +if (spapr->irq->ov5 & SPAPR_OV5_XIVE_EXPLOIT) { if (ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00, 0, spapr->max_compat_pvr)) { spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT); @@ -3053,9 +3053,38 @@ static void spapr_set_vsmt(Object *obj, Visitor *v, const char *name, visit_type_uint32(v, name, (uint32_t *)opaque, errp); } +static char *spapr_get_ic_mode(Object *obj, Error **errp) +{ +sPAPRMachineState *spapr = SPAPR_MACHINE(obj); + +if (spapr->irq == _irq_xics_legacy) { +return g_strdup("legacy"); +} else if (spapr->irq == _irq_xics) { +return g_strdup("xics"); +} else if (spapr->irq == _irq_xive) { +return g_strdup("xive"); +} +g_assert_not_reached(); +} + +static void spapr_set_ic_mode(Object *obj, const char *value, Error **errp) +{ +sPAPRMachineState *spapr = SPAPR_MACHINE(obj); + +/* The legacy IRQ backend can not be set */ +if (strcmp(value, "xics") == 0) { +spapr->irq = _irq_xics; +} else if (strcmp(value, "xive") == 0) { +spapr->irq = _irq_xive; +} else { +error_setg(errp, "Bad value for \"ic-mode\" property"); +} +} + static void spapr_instance_init(Object *obj) { sPAPRMachineState *spapr = SPAPR_MACHINE(obj); +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); spapr->htab_fd = -1; spapr->use_hotplug_event_source = true; @@ -3089,6 +3118,14 @@ static void spapr_instance_init(Object *obj) " the host's SMT mode", _abort); object_property_add_bool(obj, "vfio-no-msix-emulation", spapr_get_msix_emulation, NULL, NULL); + +/* The machine class defines the default interrupt controller mode */ +spapr->irq = smc->irq; +object_property_add_str(obj, "ic-mode", spapr_get_ic_mode, +spapr_set_ic_mode, NULL); +object_property_set_description(obj, "ic-mode", + "Specifies the interrupt controller mode (xics, xive)", + NULL); } static void spapr_machine_finalizefn(Object *obj) @@ -3811,9 +3848,8 @@ static void spapr_pic_print_info(InterruptStatsProvider *obj, Monitor *mon) { sPAPRMachineState *spapr = SPAPR_MACHINE(obj); -
[Qemu-devel] [PULL 33/40] spapr: add device tree support for the XIVE exploitation mode
From: Cédric Le Goater The XIVE interface for the guest is described in the device tree under the "interrupt-controller" node. A couple of new properties are specific to XIVE : - "reg" contains the base address and size of the thread interrupt managnement areas (TIMA), for the User level and for the Guest OS level. Only the Guest OS level is taken into account today. - "ibm,xive-eq-sizes" the size of the event queues. One cell per size supported, contains log2 of size, in ascending order. - "ibm,xive-lisn-ranges" the IRQ interrupt number ranges assigned to the guest for the IPIs. and also under the root node : - "ibm,plat-res-int-priorities" contains a list of priorities that the hypervisor has reserved for its own use. OPAL uses the priority 7 queue to automatically escalate interrupts for all other queues (DD2.X POWER9). So only priorities [0..6] are allowed for the guest. Extend the sPAPR IRQ backend with a new handler to populate the DT with the appropriate "interrupt-controller" node. Signed-off-by: Cédric Le Goater [dwg: Fix style nits] Signed-off-by: David Gibson --- hw/intc/spapr_xive.c| 67 + hw/intc/xics_spapr.c| 3 +- hw/ppc/spapr.c | 3 +- hw/ppc/spapr_irq.c | 3 ++ include/hw/ppc/spapr_irq.h | 2 ++ include/hw/ppc/spapr_xive.h | 2 ++ include/hw/ppc/xics.h | 4 +-- 7 files changed, 80 insertions(+), 4 deletions(-) diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c index 9f2820039c..682c192268 100644 --- a/hw/intc/spapr_xive.c +++ b/hw/intc/spapr_xive.c @@ -14,6 +14,7 @@ #include "target/ppc/cpu.h" #include "sysemu/cpus.h" #include "monitor/monitor.h" +#include "hw/ppc/fdt.h" #include "hw/ppc/spapr.h" #include "hw/ppc/spapr_xive.h" #include "hw/ppc/xive.h" @@ -1400,3 +1401,69 @@ void spapr_xive_hcall_init(sPAPRMachineState *spapr) spapr_register_hypercall(H_INT_SYNC, h_int_sync); spapr_register_hypercall(H_INT_RESET, h_int_reset); } + +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt, + uint32_t phandle) +{ +sPAPRXive *xive = spapr->xive; +int node; +uint64_t timas[2 * 2]; +/* Interrupt number ranges for the IPIs */ +uint32_t lisn_ranges[] = { +cpu_to_be32(0), +cpu_to_be32(nr_servers), +}; +/* + * EQ size - the sizes of pages supported by the system 4K, 64K, + * 2M, 16M. We only advertise 64K for the moment. + */ +uint32_t eq_sizes[] = { +cpu_to_be32(16), /* 64K */ +}; +/* + * The following array is in sync with the reserved priorities + * defined by the 'spapr_xive_priority_is_reserved' routine. + */ +uint32_t plat_res_int_priorities[] = { +cpu_to_be32(7),/* start */ +cpu_to_be32(0xf8), /* count */ +}; +gchar *nodename; + +/* Thread Interrupt Management Area : User (ring 3) and OS (ring 2) */ +timas[0] = cpu_to_be64(xive->tm_base + + XIVE_TM_USER_PAGE * (1ull << TM_SHIFT)); +timas[1] = cpu_to_be64(1ull << TM_SHIFT); +timas[2] = cpu_to_be64(xive->tm_base + + XIVE_TM_OS_PAGE * (1ull << TM_SHIFT)); +timas[3] = cpu_to_be64(1ull << TM_SHIFT); + +nodename = g_strdup_printf("interrupt-controller@%" PRIx64, + xive->tm_base + XIVE_TM_USER_PAGE * (1 << TM_SHIFT)); +_FDT(node = fdt_add_subnode(fdt, 0, nodename)); +g_free(nodename); + +_FDT(fdt_setprop_string(fdt, node, "device_type", "power-ivpe")); +_FDT(fdt_setprop(fdt, node, "reg", timas, sizeof(timas))); + +_FDT(fdt_setprop_string(fdt, node, "compatible", "ibm,power-ivpe")); +_FDT(fdt_setprop(fdt, node, "ibm,xive-eq-sizes", eq_sizes, + sizeof(eq_sizes))); +_FDT(fdt_setprop(fdt, node, "ibm,xive-lisn-ranges", lisn_ranges, + sizeof(lisn_ranges))); + +/* For Linux to link the LSIs to the interrupt controller. */ +_FDT(fdt_setprop(fdt, node, "interrupt-controller", NULL, 0)); +_FDT(fdt_setprop_cell(fdt, node, "#interrupt-cells", 2)); + +/* For SLOF */ +_FDT(fdt_setprop_cell(fdt, node, "linux,phandle", phandle)); +_FDT(fdt_setprop_cell(fdt, node, "phandle", phandle)); + +/* + * The "ibm,plat-res-int-priorities" property defines the priority + * ranges reserved by the hypervisor + */ +_FDT(fdt_setprop(fdt, 0, "ibm,plat-res-int-priorities", + plat_res_int_priorities, sizeof(plat_res_int_priorities))); +} diff --git a/hw/intc/xics_spapr.c b/hw/intc/xics_spapr.c index 2e27b92b87..f67d3c80bf 100644 --- a/hw/intc/xics_spapr.c +++ b/hw/intc/xics_spapr.c @@ -244,7 +244,8 @@ void xics_spapr_init(sPAPRMachineState *spapr) spapr_register_hypercall(H_IPOLL, h_ipoll); } -void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle) +void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void
[Qemu-devel] [PULL 35/40] spapr: extend the sPAPR IRQ backend for XICS migration
From: Cédric Le Goater Introduce a new sPAPR IRQ handler to handle resend after migration when the machine is using a KVM XICS interrupt controller model. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: David Gibson --- hw/ppc/spapr.c | 13 + hw/ppc/spapr_irq.c | 27 +++ include/hw/ppc/spapr_irq.h | 2 ++ 3 files changed, 34 insertions(+), 8 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index dfb617e580..0b09a88753 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1730,14 +1730,6 @@ static int spapr_post_load(void *opaque, int version_id) return err; } -if (!object_dynamic_cast(OBJECT(spapr->ics), TYPE_ICS_KVM)) { -CPUState *cs; -CPU_FOREACH(cs) { -PowerPCCPU *cpu = POWERPC_CPU(cs); -icp_resend(ICP(cpu->intc)); -} -} - /* In earlier versions, there was no separate qdev for the PAPR * RTC, so the RTC offset was stored directly in sPAPREnvironment. * So when migrating from those versions, poke the incoming offset @@ -1758,6 +1750,11 @@ static int spapr_post_load(void *opaque, int version_id) } } +err = spapr_irq_post_load(spapr, version_id); +if (err) { +return err; +} + return err; } diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c index fdcc7795e4..292c448a15 100644 --- a/hw/ppc/spapr_irq.c +++ b/hw/ppc/spapr_irq.c @@ -197,6 +197,18 @@ static Object *spapr_irq_cpu_intc_create_xics(sPAPRMachineState *spapr, return icp_create(cpu, spapr->icp_type, XICS_FABRIC(spapr), errp); } +static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id) +{ +if (!object_dynamic_cast(OBJECT(spapr->ics), TYPE_ICS_KVM)) { +CPUState *cs; +CPU_FOREACH(cs) { +PowerPCCPU *cpu = POWERPC_CPU(cs); +icp_resend(ICP(cpu->intc)); +} +} +return 0; +} + #define SPAPR_IRQ_XICS_NR_IRQS 0x1000 #define SPAPR_IRQ_XICS_NR_MSIS \ (XICS_IRQ_BASE + SPAPR_IRQ_XICS_NR_IRQS - SPAPR_IRQ_MSI) @@ -212,6 +224,7 @@ sPAPRIrq spapr_irq_xics = { .print_info = spapr_irq_print_info_xics, .dt_populate = spapr_dt_xics, .cpu_intc_create = spapr_irq_cpu_intc_create_xics, +.post_load = spapr_irq_post_load_xics, }; /* @@ -295,6 +308,11 @@ static Object *spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr, return xive_tctx_create(cpu, XIVE_ROUTER(spapr->xive), errp); } +static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id) +{ +return 0; +} + /* * XIVE uses the full IRQ number space. Set it to 8K to be compatible * with XICS. @@ -314,6 +332,7 @@ sPAPRIrq spapr_irq_xive = { .print_info = spapr_irq_print_info_xive, .dt_populate = spapr_dt_xive, .cpu_intc_create = spapr_irq_cpu_intc_create_xive, +.post_load = spapr_irq_post_load_xive, }; /* @@ -352,6 +371,13 @@ qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq) return smc->irq->qirq(spapr, irq); } +int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id) +{ +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); + +return smc->irq->post_load(spapr, version_id); +} + /* * XICS legacy routines - to deprecate one day */ @@ -420,4 +446,5 @@ sPAPRIrq spapr_irq_xics_legacy = { .print_info = spapr_irq_print_info_xics, .dt_populate = spapr_dt_xics, .cpu_intc_create = spapr_irq_cpu_intc_create_xics, +.post_load = spapr_irq_post_load_xics, }; diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h index 13db0428ab..84a25ffb6c 100644 --- a/include/hw/ppc/spapr_irq.h +++ b/include/hw/ppc/spapr_irq.h @@ -43,6 +43,7 @@ typedef struct sPAPRIrq { void *fdt, uint32_t phandle); Object *(*cpu_intc_create)(sPAPRMachineState *spapr, Object *cpu, Error **errp); +int (*post_load)(sPAPRMachineState *spapr, int version_id); } sPAPRIrq; extern sPAPRIrq spapr_irq_xics; @@ -53,6 +54,7 @@ void spapr_irq_init(sPAPRMachineState *spapr, Error **errp); int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp); void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num); qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq); +int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id); /* * XICS legacy routines -- 2.19.2
[Qemu-devel] [PULL 36/40] spapr: add a 'reset' method to the sPAPR IRQ backend
From: Cédric Le Goater For the time being, the XIVE reset handler updates the OS CAM line of the vCPU as it is done under a real hypervisor when a vCPU is scheduled to run on a HW thread. This will let the XIVE presenter engine find a match among the NVTs dispatched on the HW threads. This handler will become even more useful when we introduce the machine supporting both interrupt modes, XIVE and XICS. In this machine, the interrupt mode is chosen by the CAS negotiation process and activated after a reset. Signed-off-by: Cédric Le Goater [dwg: Fix style nits] Signed-off-by: David Gibson --- hw/intc/spapr_xive.c| 17 + hw/ppc/spapr.c | 6 ++ hw/ppc/spapr_irq.c | 31 ++- include/hw/ppc/spapr_irq.h | 2 ++ include/hw/ppc/spapr_xive.h | 1 + 5 files changed, 56 insertions(+), 1 deletion(-) diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c index 682c192268..0e39c90cbd 100644 --- a/hw/intc/spapr_xive.c +++ b/hw/intc/spapr_xive.c @@ -179,6 +179,23 @@ static void spapr_xive_map_mmio(sPAPRXive *xive) sysbus_mmio_map(SYS_BUS_DEVICE(xive), 2, xive->tm_base); } +/* + * When a Virtual Processor is scheduled to run on a HW thread, the + * hypervisor pushes its identifier in the OS CAM line. Emulate the + * same behavior under QEMU. + */ +void spapr_xive_set_tctx_os_cam(XiveTCTX *tctx) +{ +uint8_t nvt_blk; +uint32_t nvt_idx; +uint32_t nvt_cam; + +spapr_xive_cpu_to_nvt(POWERPC_CPU(tctx->cs), _blk, _idx); + +nvt_cam = cpu_to_be32(TM_QW1W2_VO | xive_nvt_cam_line(nvt_blk, nvt_idx)); +memcpy(>regs[TM_QW1_OS + TM_WORD2], _cam, 4); +} + static void spapr_xive_end_reset(XiveEND *end) { memset(end, 0, sizeof(*end)); diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 0b09a88753..487f80e940 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1619,6 +1619,12 @@ static void spapr_machine_reset(void) qemu_devices_reset(); +/* + * This is fixing some of the default configuration of the XIVE + * devices. To be called after the reset of the machine devices. + */ +spapr_irq_reset(spapr, _fatal); + /* DRC reset may cause a device to be unplugged. This will cause troubles * if this device is used by another device (eg, a running vhost backend * will crash QEMU if the DIMM holding the vring goes away). To avoid such diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c index 292c448a15..9ecbf47329 100644 --- a/hw/ppc/spapr_irq.c +++ b/hw/ppc/spapr_irq.c @@ -305,7 +305,14 @@ static void spapr_irq_print_info_xive(sPAPRMachineState *spapr, static Object *spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr, Object *cpu, Error **errp) { -return xive_tctx_create(cpu, XIVE_ROUTER(spapr->xive), errp); +Object *obj = xive_tctx_create(cpu, XIVE_ROUTER(spapr->xive), errp); + +/* + * (TCG) Early setting the OS CAM line for hotplugged CPUs as they + * don't benificiate from the reset of the XIVE IRQ backend + */ +spapr_xive_set_tctx_os_cam(XIVE_TCTX(obj)); +return obj; } static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id) @@ -313,6 +320,18 @@ static int spapr_irq_post_load_xive(sPAPRMachineState *spapr, int version_id) return 0; } +static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp) +{ +CPUState *cs; + +CPU_FOREACH(cs) { +PowerPCCPU *cpu = POWERPC_CPU(cs); + +/* (TCG) Set the OS CAM line of the thread interrupt context. */ +spapr_xive_set_tctx_os_cam(XIVE_TCTX(cpu->intc)); +} +} + /* * XIVE uses the full IRQ number space. Set it to 8K to be compatible * with XICS. @@ -333,6 +352,7 @@ sPAPRIrq spapr_irq_xive = { .dt_populate = spapr_dt_xive, .cpu_intc_create = spapr_irq_cpu_intc_create_xive, .post_load = spapr_irq_post_load_xive, +.reset = spapr_irq_reset_xive, }; /* @@ -378,6 +398,15 @@ int spapr_irq_post_load(sPAPRMachineState *spapr, int version_id) return smc->irq->post_load(spapr, version_id); } +void spapr_irq_reset(sPAPRMachineState *spapr, Error **errp) +{ +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); + +if (smc->irq->reset) { +smc->irq->reset(spapr, errp); +} +} + /* * XICS legacy routines - to deprecate one day */ diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h index 84a25ffb6c..63061a009b 100644 --- a/include/hw/ppc/spapr_irq.h +++ b/include/hw/ppc/spapr_irq.h @@ -44,6 +44,7 @@ typedef struct sPAPRIrq { Object *(*cpu_intc_create)(sPAPRMachineState *spapr, Object *cpu, Error **errp); int (*post_load)(sPAPRMachineState *spapr, int version_id); +void (*reset)(sPAPRMachineState *spapr, Error **errp); } sPAPRIrq; extern sPAPRIrq spapr_irq_xics; @@ -55,6 +56,7 @@ int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error
[Qemu-devel] [PULL 30/40] spapr-iommu: Always advertise the maximum possible DMA window size
From: Alexey Kardashevskiy When deciding about the huge DMA window, the typical Linux pseries guest uses the maximum allowed RAM size as the upper limit. We did the same on QEMU side to match that logic. Now we are going to support a GPU RAM pass through which is not available at the guest boot time as it requires the guest driver interaction. As the result, the guest requests a smaller window than it should. Therefore the guest needs to be patched to understand this new memory and so does QEMU. Instead of reimplementing here whatever solution we choose for the guest, this advertises the biggest possible window size limited by 32 bit (as defined by LoPAPR). Since the window size has to be power-of-two (the create rtas call receives a window shift, not a size), this uses 0x8000. as the maximum number of TCEs possible (rather than 32bit maximum of 0x.). This is safe as: 1. The guest visible emulated table is allocated in KVM (actual pages are allocated in page fault handler) and QEMU (actual pages are allocated when updated); 2. The hardware table (and corresponding userspace address table) supports sparse allocation and also checks for locked_vm limit so it is unable to cause the host any damage. Signed-off-by: Alexey Kardashevskiy Signed-off-by: David Gibson --- hw/ppc/spapr_rtas_ddw.c | 19 +++ 1 file changed, 3 insertions(+), 16 deletions(-) diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c index 329feb148f..cb8a410359 100644 --- a/hw/ppc/spapr_rtas_ddw.c +++ b/hw/ppc/spapr_rtas_ddw.c @@ -96,9 +96,8 @@ static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu, uint32_t nret, target_ulong rets) { sPAPRPHBState *sphb; -uint64_t buid, max_window_size; +uint64_t buid; uint32_t avail, addr, pgmask = 0; -MachineState *machine = MACHINE(spapr); if ((nargs != 3) || (nret != 5)) { goto param_error_exit; @@ -114,27 +113,15 @@ static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu, /* Translate page mask to LoPAPR format */ pgmask = spapr_page_mask_to_query_mask(sphb->page_size_mask); -/* - * This is "Largest contiguous block of TCEs allocated specifically - * for (that is, are reserved for) this PE". - * Return the maximum number as maximum supported RAM size was in 4K pages. - */ -if (machine->ram_size == machine->maxram_size) { -max_window_size = machine->ram_size; -} else { -max_window_size = machine->device_memory->base + - memory_region_size(>device_memory->mr); -} - avail = SPAPR_PCI_DMA_MAX_WINDOWS - spapr_phb_get_active_win_num(sphb); rtas_st(rets, 0, RTAS_OUT_SUCCESS); rtas_st(rets, 1, avail); -rtas_st(rets, 2, max_window_size >> SPAPR_TCE_PAGE_SHIFT); +rtas_st(rets, 2, 0x8000); /* The largest window we can possibly have */ rtas_st(rets, 3, pgmask); rtas_st(rets, 4, 0); /* DMA migration mask, not supported */ -trace_spapr_iommu_ddw_query(buid, addr, avail, max_window_size, pgmask); +trace_spapr_iommu_ddw_query(buid, addr, avail, 0x8000, pgmask); return; param_error_exit: -- 2.19.2
[Qemu-devel] [PULL 37/40] spapr: add an extra OV5 field to the sPAPR IRQ backend
From: Cédric Le Goater The interrupt modes supported by the hypervisor are advertised to the guest with new bits definitions of the option vector 5 of property "ibm,arch-vec-5-platform-support. The byte 23 bits 0-1 of the OV5 are defined as follow : 0b00 PAPR 2.7 and earlier (Legacy systems) 0b01 XIVE Exploitation mode only 0b10 Either available If the client/guest selects the XIVE interrupt mode, it informs the hypervisor by returning the value 0b01 in byte 23 bits 0-1. A 0b00 value indicates the use of the XICS interrupt mode (Legacy systems). The sPAPR IRQ backend is extended with these definitions and the values are directly used to populate the "ibm,arch-vec-5-platform-support" property. The interrupt mode is advertised under TCG and under KVM. Although a KVM XIVE device is not yet available, the machine can still operate with kernel_irqchip=off. However, we apply a restriction on the CPU which is required to be a POWER9 when a XIVE interrupt controller is in use. Signed-off-by: Cédric Le Goater Signed-off-by: David Gibson --- hw/ppc/spapr.c | 33 ++--- hw/ppc/spapr_irq.c | 3 +++ include/hw/ppc/spapr.h | 6 ++ include/hw/ppc/spapr_irq.h | 1 + 4 files changed, 36 insertions(+), 7 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 487f80e940..2f87c8ba19 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1095,15 +1095,19 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt) spapr_dt_rtas_tokens(fdt, rtas); } -/* Prepare ibm,arch-vec-5-platform-support, which indicates the MMU features - * that the guest may request and thus the valid values for bytes 24..26 of - * option vector 5: */ -static void spapr_dt_ov5_platform_support(void *fdt, int chosen) +/* + * Prepare ibm,arch-vec-5-platform-support, which indicates the MMU + * and the XIVE features that the guest may request and thus the valid + * values for bytes 23..26 of option vector 5: + */ +static void spapr_dt_ov5_platform_support(sPAPRMachineState *spapr, void *fdt, + int chosen) { PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu); +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); char val[2 * 4] = { -23, 0x00, /* Xive mode, filled in below. */ +23, smc->irq->ov5, /* Xive mode. */ 24, 0x00, /* Hash/Radix, filled in below. */ 25, 0x00, /* Hash options: Segment Tables == no, GTSE == no. */ 26, 0x40, /* Radix options: GTSE == yes. */ @@ -,7 +1115,11 @@ static void spapr_dt_ov5_platform_support(void *fdt, int chosen) if (!ppc_check_compat(first_ppc_cpu, CPU_POWERPC_LOGICAL_3_00, 0, first_ppc_cpu->compat_pvr)) { -/* If we're in a pre POWER9 compat mode then the guest should do hash */ +/* + * If we're in a pre POWER9 compat mode then the guest should + * do hash and use the legacy interrupt mode + */ +val[1] = 0x00; /* XICS */ val[3] = 0x00; /* Hash */ } else if (kvm_enabled()) { if (kvmppc_has_cap_mmu_radix() && kvmppc_has_cap_mmu_hash_v3()) { @@ -1189,7 +1197,7 @@ static void spapr_dt_chosen(sPAPRMachineState *spapr, void *fdt) _FDT(fdt_setprop_string(fdt, chosen, "stdout-path", stdout_path)); } -spapr_dt_ov5_platform_support(fdt, chosen); +spapr_dt_ov5_platform_support(spapr, fdt, chosen); g_free(stdout_path); g_free(bootlist); @@ -2624,6 +2632,17 @@ static void spapr_machine_init(MachineState *machine) /* advertise support for ibm,dyamic-memory-v2 */ spapr_ovec_set(spapr->ov5, OV5_DRMEM_V2); +/* advertise XIVE on POWER9 machines */ +if (smc->irq->ov5 & SPAPR_OV5_XIVE_EXPLOIT) { +if (ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00, + 0, spapr->max_compat_pvr)) { +spapr_ovec_set(spapr->ov5, OV5_XIVE_EXPLOIT); +} else { +error_report("XIVE-only machines require a POWER9 CPU"); +exit(1); +} +} + /* init CPUs */ spapr_init_cpus(spapr); diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c index 9ecbf47329..9e3aa85b6d 100644 --- a/hw/ppc/spapr_irq.c +++ b/hw/ppc/spapr_irq.c @@ -216,6 +216,7 @@ static int spapr_irq_post_load_xics(sPAPRMachineState *spapr, int version_id) sPAPRIrq spapr_irq_xics = { .nr_irqs = SPAPR_IRQ_XICS_NR_IRQS, .nr_msis = SPAPR_IRQ_XICS_NR_MSIS, +.ov5 = SPAPR_OV5_XIVE_LEGACY, .init= spapr_irq_init_xics, .claim = spapr_irq_claim_xics, @@ -343,6 +344,7 @@ static void spapr_irq_reset_xive(sPAPRMachineState *spapr, Error **errp) sPAPRIrq spapr_irq_xive = { .nr_irqs = SPAPR_IRQ_XIVE_NR_IRQS, .nr_msis = SPAPR_IRQ_XIVE_NR_MSIS, +.ov5 = SPAPR_OV5_XIVE_EXPLOIT, .init= spapr_irq_init_xive, .claim =
[Qemu-devel] [PULL 18/40] ppc/xive: introduce the XiveRouter model
From: Cédric Le Goater The XiveRouter models the second sub-engine of the XIVE architecture : the Interrupt Virtualization Routing Engine (IVRE). The IVRE handles event notifications of the IVSE and performs the interrupt routing process. For this purpose, it uses a set of tables stored in system memory, the first of which being the Event Assignment Structure (EAS) table. The EAT associates an interrupt source number with an Event Notification Descriptor (END) which will be used in a second phase of the routing process to identify a Notification Virtual Target. The XiveRouter is an abstract class which needs to be inherited from to define a storage for the EAT, and other upcoming tables. Signed-off-by: Cédric Le Goater [dwg: Folded in parts of a later fix by Cédric fixing field access] [dwg: Fix style nits] Signed-off-by: David Gibson --- hw/intc/xive.c | 77 ++ include/hw/ppc/xive.h | 31 +++ include/hw/ppc/xive_regs.h | 62 ++ 3 files changed, 170 insertions(+) create mode 100644 include/hw/ppc/xive_regs.h diff --git a/hw/intc/xive.c b/hw/intc/xive.c index 8d5434d6bd..8878abc317 100644 --- a/hw/intc/xive.c +++ b/hw/intc/xive.c @@ -444,6 +444,82 @@ static const TypeInfo xive_source_info = { .class_init= xive_source_class_init, }; +/* + * XIVE Router (aka. Virtualization Controller or IVRE) + */ + +int xive_router_get_eas(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx, +XiveEAS *eas) +{ +XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr); + +return xrc->get_eas(xrtr, eas_blk, eas_idx, eas); +} + +static void xive_router_notify(XiveNotifier *xn, uint32_t lisn) +{ +XiveRouter *xrtr = XIVE_ROUTER(xn); +uint8_t eas_blk = XIVE_SRCNO_BLOCK(lisn); +uint32_t eas_idx = XIVE_SRCNO_INDEX(lisn); +XiveEAS eas; + +/* EAS cache lookup */ +if (xive_router_get_eas(xrtr, eas_blk, eas_idx, )) { +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: Unknown LISN %x\n", lisn); +return; +} + +/* + * The IVRE checks the State Bit Cache at this point. We skip the + * SBC lookup because the state bits of the sources are modeled + * internally in QEMU. + */ + +if (!xive_eas_is_valid()) { +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid LISN %x\n", lisn); +return; +} + +if (xive_eas_is_masked()) { +/* Notification completed */ +return; +} +} + +static void xive_router_class_init(ObjectClass *klass, void *data) +{ +DeviceClass *dc = DEVICE_CLASS(klass); +XiveNotifierClass *xnc = XIVE_NOTIFIER_CLASS(klass); + +dc->desc= "XIVE Router Engine"; +xnc->notify = xive_router_notify; +} + +static const TypeInfo xive_router_info = { +.name = TYPE_XIVE_ROUTER, +.parent= TYPE_SYS_BUS_DEVICE, +.abstract = true, +.class_size= sizeof(XiveRouterClass), +.class_init= xive_router_class_init, +.interfaces= (InterfaceInfo[]) { +{ TYPE_XIVE_NOTIFIER }, +{ } +} +}; + +void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon) +{ +if (!xive_eas_is_valid(eas)) { +return; +} + +monitor_printf(mon, " %08x %s end:%02x/%04x data:%08x\n", + lisn, xive_eas_is_masked(eas) ? "M" : " ", + (uint8_t) xive_get_field64(EAS_END_BLOCK, eas->w), + (uint32_t) xive_get_field64(EAS_END_INDEX, eas->w), + (uint32_t) xive_get_field64(EAS_END_DATA, eas->w)); +} + /* * XIVE Fabric */ @@ -457,6 +533,7 @@ static void xive_register_types(void) { type_register_static(_source_info); type_register_static(_fabric_info); +type_register_static(_router_info); } type_init(xive_register_types) diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h index 436f1bf756..527aa73366 100644 --- a/include/hw/ppc/xive.h +++ b/include/hw/ppc/xive.h @@ -141,6 +141,8 @@ #define PPC_XIVE_H #include "hw/qdev-core.h" +#include "hw/sysbus.h" +#include "hw/ppc/xive_regs.h" /* * XIVE Fabric (Interface between Source and Router) @@ -297,4 +299,33 @@ static inline void xive_source_irq_set(XiveSource *xsrc, uint32_t srcno, } } +/* + * XIVE Router + */ + +typedef struct XiveRouter { +SysBusDeviceparent; +} XiveRouter; + +#define TYPE_XIVE_ROUTER "xive-router" +#define XIVE_ROUTER(obj)\ +OBJECT_CHECK(XiveRouter, (obj), TYPE_XIVE_ROUTER) +#define XIVE_ROUTER_CLASS(klass)\ +OBJECT_CLASS_CHECK(XiveRouterClass, (klass), TYPE_XIVE_ROUTER) +#define XIVE_ROUTER_GET_CLASS(obj) \ +OBJECT_GET_CLASS(XiveRouterClass, (obj), TYPE_XIVE_ROUTER) + +typedef struct XiveRouterClass { +SysBusDeviceClass parent; + +/* XIVE table accessors */ +int (*get_eas)(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx, +
[Qemu-devel] [PULL 27/40] ppc/xive: notify the CPU when the interrupt priority is more privileged
From: Cédric Le Goater After the event data was enqueued in the O/S Event Queue, the IVPE raises the bit corresponding to the priority of the pending interrupt in the register IBP (Interrupt Pending Buffer) to indicate there is an event pending in one of the 8 priority queues. The Pending Interrupt Priority Register (PIPR) is also updated using the IPB. This register represent the priority of the most favored pending notification. The PIPR is then compared to the the Current Processor Priority Register (CPPR). If it is more favored (numerically less than), the CPU interrupt line is raised and the EO bit of the Notification Source Register (NSR) is updated to notify the presence of an exception for the O/S. The check needs to be done whenever the PIPR or the CPPR are changed. The O/S acknowledges the interrupt with a special load in the Thread Interrupt Management Area. If the EO bit of the NSR is set, the CPPR takes the value of PIPR. The bit number in the IBP corresponding to the priority of the pending interrupt is reseted and so is the EO bit of the NSR. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson [dwg: Fix style nits] Signed-off-by: David Gibson --- hw/intc/xive.c | 96 +- 1 file changed, 95 insertions(+), 1 deletion(-) diff --git a/hw/intc/xive.c b/hw/intc/xive.c index 1d737346c3..607e74acd2 100644 --- a/hw/intc/xive.c +++ b/hw/intc/xive.c @@ -22,9 +22,75 @@ * XIVE Thread Interrupt Management context */ +/* + * Convert a priority number to an Interrupt Pending Buffer (IPB) + * register, which indicates a pending interrupt at the priority + * corresponding to the bit number + */ +static uint8_t priority_to_ipb(uint8_t priority) +{ +return priority > XIVE_PRIORITY_MAX ? +0 : 1 << (XIVE_PRIORITY_MAX - priority); +} + +/* + * Convert an Interrupt Pending Buffer (IPB) register to a Pending + * Interrupt Priority Register (PIPR), which contains the priority of + * the most favored pending notification. + */ +static uint8_t ipb_to_pipr(uint8_t ibp) +{ +return ibp ? clz32((uint32_t)ibp << 24) : 0xff; +} + +static void ipb_update(uint8_t *regs, uint8_t priority) +{ +regs[TM_IPB] |= priority_to_ipb(priority); +regs[TM_PIPR] = ipb_to_pipr(regs[TM_IPB]); +} + +static uint8_t exception_mask(uint8_t ring) +{ +switch (ring) { +case TM_QW1_OS: +return TM_QW1_NSR_EO; +default: +g_assert_not_reached(); +} +} + static uint64_t xive_tctx_accept(XiveTCTX *tctx, uint8_t ring) { -return 0; +uint8_t *regs = >regs[ring]; +uint8_t nsr = regs[TM_NSR]; +uint8_t mask = exception_mask(ring); + +qemu_irq_lower(tctx->output); + +if (regs[TM_NSR] & mask) { +uint8_t cppr = regs[TM_PIPR]; + +regs[TM_CPPR] = cppr; + +/* Reset the pending buffer bit */ +regs[TM_IPB] &= ~priority_to_ipb(cppr); +regs[TM_PIPR] = ipb_to_pipr(regs[TM_IPB]); + +/* Drop Exception bit */ +regs[TM_NSR] &= ~mask; +} + +return (nsr << 8) | regs[TM_CPPR]; +} + +static void xive_tctx_notify(XiveTCTX *tctx, uint8_t ring) +{ +uint8_t *regs = >regs[ring]; + +if (regs[TM_PIPR] < regs[TM_CPPR]) { +regs[TM_NSR] |= exception_mask(ring); +qemu_irq_raise(tctx->output); +} } static void xive_tctx_set_cppr(XiveTCTX *tctx, uint8_t ring, uint8_t cppr) @@ -34,6 +100,9 @@ static void xive_tctx_set_cppr(XiveTCTX *tctx, uint8_t ring, uint8_t cppr) } tctx->regs[ring + TM_CPPR] = cppr; + +/* CPPR has changed, check if we need to raise a pending exception */ +xive_tctx_notify(tctx, ring); } /* @@ -189,6 +258,17 @@ static void xive_tm_set_os_cppr(XiveTCTX *tctx, hwaddr offset, xive_tctx_set_cppr(tctx, TM_QW1_OS, value & 0xff); } +/* + * Adjust the IPB to allow a CPU to process event queues of other + * priorities during one physical interrupt cycle. + */ +static void xive_tm_set_os_pending(XiveTCTX *tctx, hwaddr offset, + uint64_t value, unsigned size) +{ +ipb_update(>regs[TM_QW1_OS], value & 0xff); +xive_tctx_notify(tctx, TM_QW1_OS); +} + /* * Define a mapping of "special" operations depending on the TIMA page * offset and the size of the operation. @@ -211,6 +291,7 @@ static const XiveTmOp xive_tm_operations[] = { /* MMIOs above 2K : special operations with side effects */ { XIVE_TM_OS_PAGE, TM_SPC_ACK_OS_REG, 2, NULL, xive_tm_ack_os_reg }, +{ XIVE_TM_OS_PAGE, TM_SPC_SET_OS_PENDING, 1, xive_tm_set_os_pending, NULL }, }; static const XiveTmOp *xive_tm_find_op(hwaddr offset, unsigned size, bool write) @@ -373,6 +454,13 @@ static void xive_tctx_reset(void *dev) tctx->regs[TM_QW1_OS + TM_LSMFB] = 0xFF; tctx->regs[TM_QW1_OS + TM_ACK_CNT] = 0xFF; tctx->regs[TM_QW1_OS + TM_AGE] = 0xFF; + +/* + * Initialize PIPR to 0xFF to avoid phantom interrupts when the + * CPPR is first set. + */ +
[Qemu-devel] [PULL 19/40] ppc/xive: introduce the XIVE Event Notification Descriptors
From: Cédric Le Goater To complete the event routing, the IVRE sub-engine uses a second table containing Event Notification Descriptor (END) structures. An END specifies on which Event Queue (EQ) the event notification data, defined in the associated EAS, should be posted when an exception occurs. It also defines which Notification Virtual Target (NVT) should be notified. The Event Queue is a memory page provided by the O/S defining a circular buffer, one per server and priority couple, containing Event Queue entries. These are 4 bytes long, the first bit being a 'generation' bit and the 31 following bits the END Data field. They are pulled by the O/S when the exception occurs. The END Data field is a way to set an invariant logical event source number for an IRQ. On sPAPR machines, it is set with the H_INT_SET_SOURCE_CONFIG hcall when the EISN flag is used. Signed-off-by: Cédric Le Goater [dwg: Fold in a later fix from Cédric fixing field accessors] Signed-off-by: David Gibson --- hw/intc/xive.c | 174 + include/hw/ppc/xive.h | 18 include/hw/ppc/xive_regs.h | 67 ++ 3 files changed, 259 insertions(+) diff --git a/hw/intc/xive.c b/hw/intc/xive.c index 8878abc317..9ec841f741 100644 --- a/hw/intc/xive.c +++ b/hw/intc/xive.c @@ -444,6 +444,95 @@ static const TypeInfo xive_source_info = { .class_init= xive_source_class_init, }; +/* + * XiveEND helpers + */ + +void xive_end_queue_pic_print_info(XiveEND *end, uint32_t width, Monitor *mon) +{ +uint64_t qaddr_base = (uint64_t) be32_to_cpu(end->w2 & 0x0fff) << 32 +| be32_to_cpu(end->w3); +uint32_t qsize = xive_get_field32(END_W0_QSIZE, end->w0); +uint32_t qindex = xive_get_field32(END_W1_PAGE_OFF, end->w1); +uint32_t qentries = 1 << (qsize + 10); +int i; + +/* + * print out the [ (qindex - (width - 1)) .. (qindex + 1)] window + */ +monitor_printf(mon, " [ "); +qindex = (qindex - (width - 1)) & (qentries - 1); +for (i = 0; i < width; i++) { +uint64_t qaddr = qaddr_base + (qindex << 2); +uint32_t qdata = -1; + +if (dma_memory_read(_space_memory, qaddr, , +sizeof(qdata))) { +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to read EQ @0x%" + HWADDR_PRIx "\n", qaddr); +return; +} +monitor_printf(mon, "%s%08x ", i == width - 1 ? "^" : "", + be32_to_cpu(qdata)); +qindex = (qindex + 1) & (qentries - 1); +} +} + +void xive_end_pic_print_info(XiveEND *end, uint32_t end_idx, Monitor *mon) +{ +uint64_t qaddr_base = (uint64_t) be32_to_cpu(end->w2 & 0x0fff) << 32 +| be32_to_cpu(end->w3); +uint32_t qindex = xive_get_field32(END_W1_PAGE_OFF, end->w1); +uint32_t qgen = xive_get_field32(END_W1_GENERATION, end->w1); +uint32_t qsize = xive_get_field32(END_W0_QSIZE, end->w0); +uint32_t qentries = 1 << (qsize + 10); + +uint32_t nvt = xive_get_field32(END_W6_NVT_INDEX, end->w6); +uint8_t priority = xive_get_field32(END_W7_F0_PRIORITY, end->w7); + +if (!xive_end_is_valid(end)) { +return; +} + +monitor_printf(mon, " %08x %c%c%c%c%c prio:%d nvt:%04x eq:@%08"PRIx64 + "% 6d/%5d ^%d", end_idx, + xive_end_is_valid(end)? 'v' : '-', + xive_end_is_enqueue(end) ? 'q' : '-', + xive_end_is_notify(end) ? 'n' : '-', + xive_end_is_backlog(end) ? 'b' : '-', + xive_end_is_escalate(end) ? 'e' : '-', + priority, nvt, qaddr_base, qindex, qentries, qgen); + +xive_end_queue_pic_print_info(end, 6, mon); +monitor_printf(mon, "]\n"); +} + +static void xive_end_enqueue(XiveEND *end, uint32_t data) +{ +uint64_t qaddr_base = (uint64_t) be32_to_cpu(end->w2 & 0x0fff) << 32 +| be32_to_cpu(end->w3); +uint32_t qsize = xive_get_field32(END_W0_QSIZE, end->w0); +uint32_t qindex = xive_get_field32(END_W1_PAGE_OFF, end->w1); +uint32_t qgen = xive_get_field32(END_W1_GENERATION, end->w1); + +uint64_t qaddr = qaddr_base + (qindex << 2); +uint32_t qdata = cpu_to_be32((qgen << 31) | (data & 0x7fff)); +uint32_t qentries = 1 << (qsize + 10); + +if (dma_memory_write(_space_memory, qaddr, , sizeof(qdata))) { +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: failed to write END data @0x%" + HWADDR_PRIx "\n", qaddr); +return; +} + +qindex = (qindex + 1) & (qentries - 1); +if (qindex == 0) { +qgen ^= 1; +end->w1 = xive_set_field32(END_W1_GENERATION, end->w1, qgen); +} +end->w1 = xive_set_field32(END_W1_PAGE_OFF, end->w1, qindex); +} + /* * XIVE Router (aka. Virtualization Controller or IVRE) */ @@ -456,6 +545,83 @@ int xive_router_get_eas(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx, return xrc->get_eas(xrtr, eas_blk,
[Qemu-devel] [PULL 32/40] spapr: add hcalls support for the XIVE exploitation interrupt mode
From: Cédric Le Goater The different XIVE virtualization structures (sources and event queues) are configured with a set of Hypervisor calls : - H_INT_GET_SOURCE_INFO used to obtain the address of the MMIO page of the Event State Buffer (ESB) entry associated with the source. - H_INT_SET_SOURCE_CONFIG assigns a source to a "target". - H_INT_GET_SOURCE_CONFIG determines which "target" and "priority" is assigned to a source - H_INT_GET_QUEUE_INFO returns the address of the notification management page associated with the specified "target" and "priority". - H_INT_SET_QUEUE_CONFIG sets or resets the event queue for a given "target" and "priority". It is also used to set the notification configuration associated with the queue, only unconditional notification is supported for the moment. Reset is performed with a queue size of 0 and queueing is disabled in that case. - H_INT_GET_QUEUE_CONFIG returns the queue settings for a given "target" and "priority". - H_INT_RESET resets all of the guest's internal interrupt structures to their initial state, losing all configuration set via the hcalls H_INT_SET_SOURCE_CONFIG and H_INT_SET_QUEUE_CONFIG. - H_INT_SYNC issue a synchronisation on a source to make sure all notifications have reached their queue. Calls that still need to be addressed : H_INT_SET_OS_REPORTING_LINE H_INT_GET_OS_REPORTING_LINE See the code for more documentation on each hcall. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson [dwg: Folded in fix for field accessors] Signed-off-by: David Gibson --- hw/intc/spapr_xive.c| 982 hw/ppc/spapr_irq.c | 2 + include/hw/ppc/spapr.h | 15 +- include/hw/ppc/spapr_xive.h | 4 + 4 files changed, 1002 insertions(+), 1 deletion(-) diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c index d6291c6470..9f2820039c 100644 --- a/hw/intc/spapr_xive.c +++ b/hw/intc/spapr_xive.c @@ -38,6 +38,13 @@ #define SPAPR_XIVE_NVT_BASE 0x400 +/* + * The sPAPR machine has a unique XIVE IC device. Assign a fixed value + * to the controller block id value. It can nevertheless be changed + * for testing purpose. + */ +#define SPAPR_XIVE_BLOCK_ID 0x0 + /* * sPAPR NVT and END indexing helpers */ @@ -46,6 +53,64 @@ static uint32_t spapr_xive_nvt_to_target(uint8_t nvt_blk, uint32_t nvt_idx) return nvt_idx - SPAPR_XIVE_NVT_BASE; } +static void spapr_xive_cpu_to_nvt(PowerPCCPU *cpu, + uint8_t *out_nvt_blk, uint32_t *out_nvt_idx) +{ +assert(cpu); + +if (out_nvt_blk) { +*out_nvt_blk = SPAPR_XIVE_BLOCK_ID; +} + +if (out_nvt_blk) { +*out_nvt_idx = SPAPR_XIVE_NVT_BASE + cpu->vcpu_id; +} +} + +static int spapr_xive_target_to_nvt(uint32_t target, +uint8_t *out_nvt_blk, uint32_t *out_nvt_idx) +{ +PowerPCCPU *cpu = spapr_find_cpu(target); + +if (!cpu) { +return -1; +} + +spapr_xive_cpu_to_nvt(cpu, out_nvt_blk, out_nvt_idx); +return 0; +} + +/* + * sPAPR END indexing uses a simple mapping of the CPU vcpu_id, 8 + * priorities per CPU + */ +static void spapr_xive_cpu_to_end(PowerPCCPU *cpu, uint8_t prio, + uint8_t *out_end_blk, uint32_t *out_end_idx) +{ +assert(cpu); + +if (out_end_blk) { +*out_end_blk = SPAPR_XIVE_BLOCK_ID; +} + +if (out_end_idx) { +*out_end_idx = (cpu->vcpu_id << 3) + prio; +} +} + +static int spapr_xive_target_to_end(uint32_t target, uint8_t prio, +uint8_t *out_end_blk, uint32_t *out_end_idx) +{ +PowerPCCPU *cpu = spapr_find_cpu(target); + +if (!cpu) { +return -1; +} + +spapr_xive_cpu_to_end(cpu, prio, out_end_blk, out_end_idx); +return 0; +} + /* * On sPAPR machines, use a simplified output for the XIVE END * structure dumping only the information related to the OS EQ. @@ -418,3 +483,920 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t lisn) return xive_source_qirq(xsrc, lisn); } + +/* + * XIVE hcalls + * + * The terminology used by the XIVE hcalls is the following : + * + * TARGET vCPU number + * EQ Event Queue assigned by OS to receive event data + * ESBpage for source interrupt management + * LISN Logical Interrupt Source Number identifying a source in the + * machine + * EISN Effective Interrupt Source Number used by guest OS to + * identify source in the guest + * + * The EAS, END, NVT structures are not exposed. + */ + +/* + * Linux hosts under OPAL reserve priority 7 for their own escalation + * interrupts (DD2.X POWER9). So we only allow the guest to use + * priorities [0..6]. + */ +static bool spapr_xive_priority_is_reserved(uint8_t priority) +{ +switch (priority) { +case 0 ... 6: +return false; +case 7: /* OPAL escalation queue */ +default: +
[Qemu-devel] [PULL 29/40] spapr/xive: use the VCPU id as a NVT identifier
From: Cédric Le Goater The IVPE scans the O/S CAM line of the XIVE thread interrupt contexts to find a matching Notification Virtual Target (NVT) among the NVTs dispatched on the HW processor threads. On a real system, the thread interrupt contexts are updated by the hypervisor when a Virtual Processor is scheduled to run on a HW thread. Under QEMU, the model will emulate the same behavior by hardwiring the NVT identifier in the thread context registers at reset. The NVT identifier used by the sPAPRXive model is the VCPU id. The END identifier is also derived from the VCPU id. A set of helpers doing the conversion between identifiers are provided for the hcalls configuring the sources and the ENDs. The model does not need a NVT table but the XiveRouter NVT operations are provided to perform some extra checks in the routing algorithm. Signed-off-by: Cédric Le Goater Signed-off-by: David Gibson --- hw/intc/spapr_xive.c | 56 +++- 1 file changed, 55 insertions(+), 1 deletion(-) diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c index 5f03adca56..d6291c6470 100644 --- a/hw/intc/spapr_xive.c +++ b/hw/intc/spapr_xive.c @@ -26,6 +26,26 @@ #define SPAPR_XIVE_VC_BASE 0x00060100ull #define SPAPR_XIVE_TM_BASE 0x000603020318ull +/* + * The allocation of VP blocks is a complex operation in OPAL and the + * VP identifiers have a relation with the number of HW chips, the + * size of the VP blocks, VP grouping, etc. The QEMU sPAPR XIVE + * controller model does not have the same constraints and can use a + * simple mapping scheme of the CPU vcpu_id + * + * These identifiers are never returned to the OS. + */ + +#define SPAPR_XIVE_NVT_BASE 0x400 + +/* + * sPAPR NVT and END indexing helpers + */ +static uint32_t spapr_xive_nvt_to_target(uint8_t nvt_blk, uint32_t nvt_idx) +{ +return nvt_idx - SPAPR_XIVE_NVT_BASE; +} + /* * On sPAPR machines, use a simplified output for the XIVE END * structure dumping only the information related to the OS EQ. @@ -40,7 +60,8 @@ static void spapr_xive_end_pic_print_info(sPAPRXive *xive, XiveEND *end, uint32_t nvt = xive_get_field32(END_W6_NVT_INDEX, end->w6); uint8_t priority = xive_get_field32(END_W7_F0_PRIORITY, end->w7); -monitor_printf(mon, "%3d/%d % 6d/%5d ^%d", nvt, +monitor_printf(mon, "%3d/%d % 6d/%5d ^%d", + spapr_xive_nvt_to_target(0, nvt), priority, qindex, qentries, qgen); xive_end_queue_pic_print_info(end, 6, mon); @@ -246,6 +267,37 @@ static int spapr_xive_write_end(XiveRouter *xrtr, uint8_t end_blk, return 0; } +static int spapr_xive_get_nvt(XiveRouter *xrtr, + uint8_t nvt_blk, uint32_t nvt_idx, XiveNVT *nvt) +{ +uint32_t vcpu_id = spapr_xive_nvt_to_target(nvt_blk, nvt_idx); +PowerPCCPU *cpu = spapr_find_cpu(vcpu_id); + +if (!cpu) { +/* TODO: should we assert() if we can find a NVT ? */ +return -1; +} + +/* + * sPAPR does not maintain a NVT table. Return that the NVT is + * valid if we have found a matching CPU + */ +nvt->w0 = cpu_to_be32(NVT_W0_VALID); +return 0; +} + +static int spapr_xive_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, +uint32_t nvt_idx, XiveNVT *nvt, +uint8_t word_number) +{ +/* + * We don't need to write back to the NVTs because the sPAPR + * machine should never hit a non-scheduled NVT. It should never + * get called. + */ +g_assert_not_reached(); +} + static const VMStateDescription vmstate_spapr_xive_end = { .name = TYPE_SPAPR_XIVE "/end", .version_id = 1, @@ -308,6 +360,8 @@ static void spapr_xive_class_init(ObjectClass *klass, void *data) xrc->get_eas = spapr_xive_get_eas; xrc->get_end = spapr_xive_get_end; xrc->write_end = spapr_xive_write_end; +xrc->get_nvt = spapr_xive_get_nvt; +xrc->write_nvt = spapr_xive_write_nvt; } static const TypeInfo spapr_xive_info = { -- 2.19.2
[Qemu-devel] [PULL 17/40] ppc/xive: introduce the XiveNotifier interface
From: Cédric Le Goater The XiveNotifier offers a simple interface, between the XiveSource object and the main interrupt controller of the machine. It will forward event notifications to the XIVE Interrupt Virtualization Routing Engine (IVRE). Signed-off-by: Cédric Le Goater [dwg: Adjust type name string for XiveNotifier] Signed-off-by: David Gibson --- hw/intc/xive.c| 25 + include/hw/ppc/xive.h | 23 +++ 2 files changed, 48 insertions(+) diff --git a/hw/intc/xive.c b/hw/intc/xive.c index 4998f128e7..8d5434d6bd 100644 --- a/hw/intc/xive.c +++ b/hw/intc/xive.c @@ -156,7 +156,11 @@ static bool xive_source_esb_eoi(XiveSource *xsrc, uint32_t srcno) */ static void xive_source_notify(XiveSource *xsrc, int srcno) { +XiveNotifierClass *xnc = XIVE_NOTIFIER_GET_CLASS(xsrc->xive); +if (xnc->notify) { +xnc->notify(xsrc->xive, srcno); +} } /* @@ -363,6 +367,17 @@ static void xive_source_reset(void *dev) static void xive_source_realize(DeviceState *dev, Error **errp) { XiveSource *xsrc = XIVE_SOURCE(dev); +Object *obj; +Error *local_err = NULL; + +obj = object_property_get_link(OBJECT(dev), "xive", _err); +if (!obj) { +error_propagate(errp, local_err); +error_prepend(errp, "required link 'xive' not found: "); +return; +} + +xsrc->xive = XIVE_NOTIFIER(obj); if (!xsrc->nr_irqs) { error_setg(errp, "Number of interrupt needs to be greater than 0"); @@ -429,9 +444,19 @@ static const TypeInfo xive_source_info = { .class_init= xive_source_class_init, }; +/* + * XIVE Fabric + */ +static const TypeInfo xive_fabric_info = { +.name = TYPE_XIVE_NOTIFIER, +.parent = TYPE_INTERFACE, +.class_size = sizeof(XiveNotifierClass), +}; + static void xive_register_types(void) { type_register_static(_source_info); +type_register_static(_fabric_info); } type_init(xive_register_types) diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h index 7cebc32eba..436f1bf756 100644 --- a/include/hw/ppc/xive.h +++ b/include/hw/ppc/xive.h @@ -142,6 +142,27 @@ #include "hw/qdev-core.h" +/* + * XIVE Fabric (Interface between Source and Router) + */ + +typedef struct XiveNotifier { +Object parent; +} XiveNotifier; + +#define TYPE_XIVE_NOTIFIER "xive-notifier" +#define XIVE_NOTIFIER(obj) \ +OBJECT_CHECK(XiveNotifier, (obj), TYPE_XIVE_NOTIFIER) +#define XIVE_NOTIFIER_CLASS(klass) \ +OBJECT_CLASS_CHECK(XiveNotifierClass, (klass), TYPE_XIVE_NOTIFIER) +#define XIVE_NOTIFIER_GET_CLASS(obj) \ +OBJECT_GET_CLASS(XiveNotifierClass, (obj), TYPE_XIVE_NOTIFIER) + +typedef struct XiveNotifierClass { +InterfaceClass parent; +void (*notify)(XiveNotifier *xn, uint32_t lisn); +} XiveNotifierClass; + /* * XIVE Interrupt Source */ @@ -171,6 +192,8 @@ typedef struct XiveSource { uint64_tesb_flags; uint32_tesb_shift; MemoryRegionesb_mmio; + +XiveNotifier*xive; } XiveSource; /* -- 2.19.2
[Qemu-devel] [PULL 34/40] spapr: allocate the interrupt thread context under the CPU core
From: Cédric Le Goater Each interrupt mode has its own specific interrupt presenter object, that we store under the CPU object, one for XICS and one for XIVE. Extend the sPAPR IRQ backend with a new handler to support them both. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: David Gibson --- hw/intc/xive.c | 22 ++ hw/ppc/spapr_cpu_core.c| 5 ++--- hw/ppc/spapr_irq.c | 15 +++ include/hw/ppc/spapr_irq.h | 2 ++ include/hw/ppc/xive.h | 1 + 5 files changed, 42 insertions(+), 3 deletions(-) diff --git a/hw/intc/xive.c b/hw/intc/xive.c index 607e74acd2..ea33494338 100644 --- a/hw/intc/xive.c +++ b/hw/intc/xive.c @@ -528,6 +528,28 @@ static const TypeInfo xive_tctx_info = { .class_init= xive_tctx_class_init, }; +Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp) +{ +Error *local_err = NULL; +Object *obj; + +obj = object_new(TYPE_XIVE_TCTX); +object_property_add_child(cpu, TYPE_XIVE_TCTX, obj, _abort); +object_unref(obj); +object_property_add_const_link(obj, "cpu", cpu, _abort); +object_property_set_bool(obj, true, "realized", _err); +if (local_err) { +goto error; +} + +return obj; + +error: +object_unparent(obj); +error_propagate(errp, local_err); +return NULL; +} + /* * XIVE ESB helpers */ diff --git a/hw/ppc/spapr_cpu_core.c b/hw/ppc/spapr_cpu_core.c index 2398ce62c0..1811cd48db 100644 --- a/hw/ppc/spapr_cpu_core.c +++ b/hw/ppc/spapr_cpu_core.c @@ -11,7 +11,6 @@ #include "hw/ppc/spapr_cpu_core.h" #include "target/ppc/cpu.h" #include "hw/ppc/spapr.h" -#include "hw/ppc/xics.h" /* for icp_create() - to be removed */ #include "hw/boards.h" #include "qapi/error.h" #include "sysemu/cpus.h" @@ -215,6 +214,7 @@ static void spapr_cpu_core_unrealize(DeviceState *dev, Error **errp) static void spapr_realize_vcpu(PowerPCCPU *cpu, sPAPRMachineState *spapr, sPAPRCPUCore *sc, Error **errp) { +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); CPUPPCState *env = >env; CPUState *cs = CPU(cpu); Error *local_err = NULL; @@ -233,8 +233,7 @@ static void spapr_realize_vcpu(PowerPCCPU *cpu, sPAPRMachineState *spapr, qemu_register_reset(spapr_cpu_reset, cpu); spapr_cpu_reset(cpu); -cpu->intc = icp_create(OBJECT(cpu), spapr->icp_type, XICS_FABRIC(spapr), - _err); +cpu->intc = smc->irq->cpu_intc_create(spapr, OBJECT(cpu), _err); if (local_err) { goto error_unregister; } diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c index 975954dc27..fdcc7795e4 100644 --- a/hw/ppc/spapr_irq.c +++ b/hw/ppc/spapr_irq.c @@ -191,6 +191,12 @@ static void spapr_irq_print_info_xics(sPAPRMachineState *spapr, Monitor *mon) ics_pic_print_info(spapr->ics, mon); } +static Object *spapr_irq_cpu_intc_create_xics(sPAPRMachineState *spapr, + Object *cpu, Error **errp) +{ +return icp_create(cpu, spapr->icp_type, XICS_FABRIC(spapr), errp); +} + #define SPAPR_IRQ_XICS_NR_IRQS 0x1000 #define SPAPR_IRQ_XICS_NR_MSIS \ (XICS_IRQ_BASE + SPAPR_IRQ_XICS_NR_IRQS - SPAPR_IRQ_MSI) @@ -205,6 +211,7 @@ sPAPRIrq spapr_irq_xics = { .qirq= spapr_qirq_xics, .print_info = spapr_irq_print_info_xics, .dt_populate = spapr_dt_xics, +.cpu_intc_create = spapr_irq_cpu_intc_create_xics, }; /* @@ -282,6 +289,12 @@ static void spapr_irq_print_info_xive(sPAPRMachineState *spapr, spapr_xive_pic_print_info(spapr->xive, mon); } +static Object *spapr_irq_cpu_intc_create_xive(sPAPRMachineState *spapr, + Object *cpu, Error **errp) +{ +return xive_tctx_create(cpu, XIVE_ROUTER(spapr->xive), errp); +} + /* * XIVE uses the full IRQ number space. Set it to 8K to be compatible * with XICS. @@ -300,6 +313,7 @@ sPAPRIrq spapr_irq_xive = { .qirq= spapr_qirq_xive, .print_info = spapr_irq_print_info_xive, .dt_populate = spapr_dt_xive, +.cpu_intc_create = spapr_irq_cpu_intc_create_xive, }; /* @@ -405,4 +419,5 @@ sPAPRIrq spapr_irq_xics_legacy = { .qirq= spapr_qirq_xics, .print_info = spapr_irq_print_info_xics, .dt_populate = spapr_dt_xics, +.cpu_intc_create = spapr_irq_cpu_intc_create_xics, }; diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h index e51e9f052f..13db0428ab 100644 --- a/include/hw/ppc/spapr_irq.h +++ b/include/hw/ppc/spapr_irq.h @@ -41,6 +41,8 @@ typedef struct sPAPRIrq { void (*print_info)(sPAPRMachineState *spapr, Monitor *mon); void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers, void *fdt, uint32_t phandle); +Object *(*cpu_intc_create)(sPAPRMachineState *spapr, Object *cpu, + Error **errp); } sPAPRIrq; extern sPAPRIrq spapr_irq_xics; diff
[Qemu-devel] [PULL 21/40] spapr: introduce a spapr_irq_init() routine
From: Cédric Le Goater Initialize the MSI bitmap from it as this will be necessary for the sPAPR IRQ backend for XIVE. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: David Gibson --- hw/ppc/spapr.c | 2 +- hw/ppc/spapr_irq.c | 16 +++- include/hw/ppc/spapr_irq.h | 1 + 3 files changed, 13 insertions(+), 6 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 2b2df6b848..c1c0e75fcd 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2593,7 +2593,7 @@ static void spapr_machine_init(MachineState *machine) spapr_set_vsmt_mode(spapr, _fatal); /* Set up Interrupt Controller before we create the VCPUs */ -smc->irq->init(spapr, _fatal); +spapr_irq_init(spapr, _fatal); /* Set up containers for ibm,client-architecture-support negotiated options */ diff --git a/hw/ppc/spapr_irq.c b/hw/ppc/spapr_irq.c index e77b94cc68..f8b651de0e 100644 --- a/hw/ppc/spapr_irq.c +++ b/hw/ppc/spapr_irq.c @@ -97,11 +97,6 @@ static void spapr_irq_init_xics(sPAPRMachineState *spapr, Error **errp) int nr_irqs = smc->irq->nr_irqs; Error *local_err = NULL; -/* Initialize the MSI IRQ allocator. */ -if (!SPAPR_MACHINE_GET_CLASS(spapr)->legacy_irq_allocation) { -spapr_irq_msi_init(spapr, smc->irq->nr_msis); -} - if (kvm_enabled()) { if (machine_kernel_irqchip_allowed(machine) && !xics_kvm_init(spapr, _err)) { @@ -213,6 +208,17 @@ sPAPRIrq spapr_irq_xics = { /* * sPAPR IRQ frontend routines for devices */ +void spapr_irq_init(sPAPRMachineState *spapr, Error **errp) +{ +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr); + +/* Initialize the MSI IRQ allocator. */ +if (!SPAPR_MACHINE_GET_CLASS(spapr)->legacy_irq_allocation) { +spapr_irq_msi_init(spapr, smc->irq->nr_msis); +} + +smc->irq->init(spapr, errp); +} int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp) { diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h index a467ce696e..bd7301e6d9 100644 --- a/include/hw/ppc/spapr_irq.h +++ b/include/hw/ppc/spapr_irq.h @@ -43,6 +43,7 @@ typedef struct sPAPRIrq { extern sPAPRIrq spapr_irq_xics; extern sPAPRIrq spapr_irq_xics_legacy; +void spapr_irq_init(sPAPRMachineState *spapr, Error **errp); int spapr_irq_claim(sPAPRMachineState *spapr, int irq, bool lsi, Error **errp); void spapr_irq_free(sPAPRMachineState *spapr, int irq, int num); qemu_irq spapr_qirq(sPAPRMachineState *spapr, int irq); -- 2.19.2
[Qemu-devel] [PULL 15/40] ppc/xive: introduce a XIVE interrupt source model
From: Cédric Le Goater The first sub-engine of the overall XIVE architecture is the Interrupt Virtualization Source Engine (IVSE). An IVSE can be integrated into another logic, like in a PCI PHB or in the main interrupt controller to manage IPIs. Each IVSE instance is associated with an Event State Buffer (ESB) that contains a two bit state entry for each possible event source. When an event is signaled to the IVSE, by MMIO or some other means, the associated interrupt state bits are fetched from the ESB and modified. Depending on the resulting ESB state, the event is forwarded to the IVRE sub-engine of the controller doing the routing. Each supported ESB entry is associated with either a single or a even/odd pair of pages which provides commands to manage the source: to EOI, to turn off the source for instance. On a sPAPR machine, the O/S will obtain the page address of the ESB entry associated with a source and its characteristic using the H_INT_GET_SOURCE_INFO hcall. On PowerNV, a similar OPAL call is used. The xive_source_notify() routine is in charge forwarding the source event notification to the routing engine. It will be filled later on. Signed-off-by: Cédric Le Goater Signed-off-by: David Gibson --- default-configs/ppc64-softmmu.mak | 1 + hw/intc/Makefile.objs | 1 + hw/intc/xive.c| 382 ++ include/hw/ppc/xive.h | 260 4 files changed, 644 insertions(+) create mode 100644 hw/intc/xive.c create mode 100644 include/hw/ppc/xive.h diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak index aec2855750..2d1e7c5c46 100644 --- a/default-configs/ppc64-softmmu.mak +++ b/default-configs/ppc64-softmmu.mak @@ -16,6 +16,7 @@ CONFIG_VIRTIO_VGA=y CONFIG_XICS=$(CONFIG_PSERIES) CONFIG_XICS_SPAPR=$(CONFIG_PSERIES) CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM)) +CONFIG_XIVE=$(CONFIG_PSERIES) CONFIG_MEM_DEVICE=y CONFIG_DIMM=y CONFIG_SPAPR_RNG=y diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs index 0e9963f5ee..72a46ed91c 100644 --- a/hw/intc/Makefile.objs +++ b/hw/intc/Makefile.objs @@ -37,6 +37,7 @@ obj-$(CONFIG_SH4) += sh_intc.o obj-$(CONFIG_XICS) += xics.o obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o obj-$(CONFIG_XICS_KVM) += xics_kvm.o +obj-$(CONFIG_XIVE) += xive.o obj-$(CONFIG_POWERNV) += xics_pnv.o obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o obj-$(CONFIG_S390_FLIC) += s390_flic.o diff --git a/hw/intc/xive.c b/hw/intc/xive.c new file mode 100644 index 00..6389bd8323 --- /dev/null +++ b/hw/intc/xive.c @@ -0,0 +1,382 @@ +/* + * QEMU PowerPC XIVE interrupt controller model + * + * Copyright (c) 2017-2018, IBM Corporation. + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/log.h" +#include "qapi/error.h" +#include "target/ppc/cpu.h" +#include "sysemu/cpus.h" +#include "sysemu/dma.h" +#include "hw/qdev-properties.h" +#include "monitor/monitor.h" +#include "hw/ppc/xive.h" + +/* + * XIVE ESB helpers + */ + +static uint8_t xive_esb_set(uint8_t *pq, uint8_t value) +{ +uint8_t old_pq = *pq & 0x3; + +*pq &= ~0x3; +*pq |= value & 0x3; + +return old_pq; +} + +static bool xive_esb_trigger(uint8_t *pq) +{ +uint8_t old_pq = *pq & 0x3; + +switch (old_pq) { +case XIVE_ESB_RESET: +xive_esb_set(pq, XIVE_ESB_PENDING); +return true; +case XIVE_ESB_PENDING: +case XIVE_ESB_QUEUED: +xive_esb_set(pq, XIVE_ESB_QUEUED); +return false; +case XIVE_ESB_OFF: +xive_esb_set(pq, XIVE_ESB_OFF); +return false; +default: + g_assert_not_reached(); +} +} + +static bool xive_esb_eoi(uint8_t *pq) +{ +uint8_t old_pq = *pq & 0x3; + +switch (old_pq) { +case XIVE_ESB_RESET: +case XIVE_ESB_PENDING: +xive_esb_set(pq, XIVE_ESB_RESET); +return false; +case XIVE_ESB_QUEUED: +xive_esb_set(pq, XIVE_ESB_PENDING); +return true; +case XIVE_ESB_OFF: +xive_esb_set(pq, XIVE_ESB_OFF); +return false; +default: + g_assert_not_reached(); +} +} + +/* + * XIVE Interrupt Source (or IVSE) + */ + +uint8_t xive_source_esb_get(XiveSource *xsrc, uint32_t srcno) +{ +assert(srcno < xsrc->nr_irqs); + +return xsrc->status[srcno] & 0x3; +} + +uint8_t xive_source_esb_set(XiveSource *xsrc, uint32_t srcno, uint8_t pq) +{ +assert(srcno < xsrc->nr_irqs); + +return xive_esb_set(>status[srcno], pq); +} + +/* + * Returns whether the event notification should be forwarded. + */ +static bool xive_source_esb_trigger(XiveSource *xsrc, uint32_t srcno) +{ +assert(srcno < xsrc->nr_irqs); + +return xive_esb_trigger(>status[srcno]); +} + +/* + * Returns whether the event notification should be forwarded. + */ +static bool xive_source_esb_eoi(XiveSource *xsrc, uint32_t srcno) +{ +
[Qemu-devel] [PULL 40/40] MAINTAINERS: PPC: add a XIVE section
From: Cédric Le Goater Signed-off-by: Cédric Le Goater Signed-off-by: David Gibson --- MAINTAINERS | 8 1 file changed, 8 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index d676c73f88..0ab4676b06 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1011,6 +1011,14 @@ F: tests/libqos/*spapr* F: tests/rtas* F: tests/libqos/rtas* +XIVE +M: David Gibson +M: Cédric Le Goater +L: qemu-...@nongnu.org +S: Supported +F: hw/*/*xive* +F: include/hw/*/*xive* + virtex_ml507 M: Edgar E. Iglesias L: qemu-...@nongnu.org -- 2.19.2
[Qemu-devel] [PULL 24/40] ppc/xive: add support for the END Event State Buffers
From: Cédric Le Goater The Event Notification Descriptor (END) XIVE structure also contains two Event State Buffers providing further coalescing of interrupts, one for the notification event (ESn) and one for the escalation events (ESe). A MMIO page is assigned for each to control the EOI through loads only. Stores are not allowed. The END ESBs are modeled through an object resembling the 'XiveSource' It is stateless as the END state bits are backed into the XiveEND structure under the XiveRouter and the MMIO accesses follow the same rules as for the XiveSource ESBs. END ESBs are not supported by the Linux drivers neither on OPAL nor on sPAPR. Nevetherless, it provides a mean to study the question in the future and validates a bit more the XIVE model. Signed-off-by: Cédric Le Goater [dwg: Fold in a later fix for field access] Signed-off-by: David Gibson --- hw/intc/xive.c| 160 +- include/hw/ppc/xive.h | 21 ++ 2 files changed, 179 insertions(+), 2 deletions(-) diff --git a/hw/intc/xive.c b/hw/intc/xive.c index 9ec841f741..7b2ef7480d 100644 --- a/hw/intc/xive.c +++ b/hw/intc/xive.c @@ -613,8 +613,18 @@ static void xive_router_end_notify(XiveRouter *xrtr, uint8_t end_blk, * even futher coalescing in the Router */ if (!xive_end_is_notify()) { -qemu_log_mask(LOG_UNIMP, "XIVE: !UCOND_NOTIFY not implemented\n"); -return; +uint8_t pq = xive_get_field32(END_W1_ESn, end.w1); +bool notify = xive_esb_trigger(); + +if (pq != xive_get_field32(END_W1_ESn, end.w1)) { +end.w1 = xive_set_field32(END_W1_ESn, end.w1, pq); +xive_router_write_end(xrtr, end_blk, end_idx, , 1); +} + +/* ESn[Q]=1 : end of notification */ +if (!notify) { +return; +} } /* @@ -694,6 +704,151 @@ void xive_eas_pic_print_info(XiveEAS *eas, uint32_t lisn, Monitor *mon) (uint32_t) xive_get_field64(EAS_END_DATA, eas->w)); } +/* + * END ESB MMIO loads + */ +static uint64_t xive_end_source_read(void *opaque, hwaddr addr, unsigned size) +{ +XiveENDSource *xsrc = XIVE_END_SOURCE(opaque); +uint32_t offset = addr & 0xFFF; +uint8_t end_blk; +uint32_t end_idx; +XiveEND end; +uint32_t end_esmask; +uint8_t pq; +uint64_t ret = -1; + +end_blk = xsrc->block_id; +end_idx = addr >> (xsrc->esb_shift + 1); + +if (xive_router_get_end(xsrc->xrtr, end_blk, end_idx, )) { +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: No END %x/%x\n", end_blk, + end_idx); +return -1; +} + +if (!xive_end_is_valid()) { +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: END %x/%x is invalid\n", + end_blk, end_idx); +return -1; +} + +end_esmask = addr_is_even(addr, xsrc->esb_shift) ? END_W1_ESn : END_W1_ESe; +pq = xive_get_field32(end_esmask, end.w1); + +switch (offset) { +case XIVE_ESB_LOAD_EOI ... XIVE_ESB_LOAD_EOI + 0x7FF: +ret = xive_esb_eoi(); + +/* Forward the source event notification for routing ?? */ +break; + +case XIVE_ESB_GET ... XIVE_ESB_GET + 0x3FF: +ret = pq; +break; + +case XIVE_ESB_SET_PQ_00 ... XIVE_ESB_SET_PQ_00 + 0x0FF: +case XIVE_ESB_SET_PQ_01 ... XIVE_ESB_SET_PQ_01 + 0x0FF: +case XIVE_ESB_SET_PQ_10 ... XIVE_ESB_SET_PQ_10 + 0x0FF: +case XIVE_ESB_SET_PQ_11 ... XIVE_ESB_SET_PQ_11 + 0x0FF: +ret = xive_esb_set(, (offset >> 8) & 0x3); +break; +default: +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid END ESB load addr %d\n", + offset); +return -1; +} + +if (pq != xive_get_field32(end_esmask, end.w1)) { +end.w1 = xive_set_field32(end_esmask, end.w1, pq); +xive_router_write_end(xsrc->xrtr, end_blk, end_idx, , 1); +} + +return ret; +} + +/* + * END ESB MMIO stores are invalid + */ +static void xive_end_source_write(void *opaque, hwaddr addr, + uint64_t value, unsigned size) +{ +qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid ESB write addr 0x%" + HWADDR_PRIx"\n", addr); +} + +static const MemoryRegionOps xive_end_source_ops = { +.read = xive_end_source_read, +.write = xive_end_source_write, +.endianness = DEVICE_BIG_ENDIAN, +.valid = { +.min_access_size = 8, +.max_access_size = 8, +}, +.impl = { +.min_access_size = 8, +.max_access_size = 8, +}, +}; + +static void xive_end_source_realize(DeviceState *dev, Error **errp) +{ +XiveENDSource *xsrc = XIVE_END_SOURCE(dev); +Object *obj; +Error *local_err = NULL; + +obj = object_property_get_link(OBJECT(dev), "xive", _err); +if (!obj) { +error_propagate(errp, local_err); +error_prepend(errp, "required link 'xive' not found: "); +return; +} + +xsrc->xrtr = XIVE_ROUTER(obj); + +if
[Qemu-devel] [PULL 11/40] sam460ex: use g_new(T, n) instead of g_malloc(sizeof(T) * n)
From: Greg Kurz Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: David Gibson --- hw/ppc/sam460ex.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ppc/sam460ex.c b/hw/ppc/sam460ex.c index 5aac58f36e..4b051c0950 100644 --- a/hw/ppc/sam460ex.c +++ b/hw/ppc/sam460ex.c @@ -430,7 +430,7 @@ static void sam460ex_init(MachineState *machine) ppc4xx_plb_init(env); /* interrupt controllers */ -irqs = g_malloc0(sizeof(*irqs) * PPCUIC_OUTPUT_NB); +irqs = g_new0(qemu_irq, PPCUIC_OUTPUT_NB); irqs[PPCUIC_OUTPUT_INT] = ((qemu_irq *)env->irq_inputs)[PPC40x_INPUT_INT]; irqs[PPCUIC_OUTPUT_CINT] = ((qemu_irq *)env->irq_inputs)[PPC40x_INPUT_CINT]; uic[0] = ppcuic_init(env, irqs, 0xc0, 0, 1); -- 2.19.2
[Qemu-devel] [PULL 16/40] ppc/xive: add support for the LSI interrupt sources
From: Cédric Le Goater The 'sent' status of the LSI interrupt source is modeled with the 'P' bit of the ESB and the assertion status of the source is maintained with an extra bit under the main XiveSource object. The type of the source is stored in the same array for practical reasons. Signed-off-by: Cédric Le Goater [dwg: Fix style nit] Signed-off-by: David Gibson --- hw/intc/xive.c| 67 +++ include/hw/ppc/xive.h | 19 +++- 2 files changed, 79 insertions(+), 7 deletions(-) diff --git a/hw/intc/xive.c b/hw/intc/xive.c index 6389bd8323..4998f128e7 100644 --- a/hw/intc/xive.c +++ b/hw/intc/xive.c @@ -89,14 +89,42 @@ uint8_t xive_source_esb_set(XiveSource *xsrc, uint32_t srcno, uint8_t pq) return xive_esb_set(>status[srcno], pq); } +/* + * Returns whether the event notification should be forwarded. + */ +static bool xive_source_lsi_trigger(XiveSource *xsrc, uint32_t srcno) +{ +uint8_t old_pq = xive_source_esb_get(xsrc, srcno); + +xsrc->status[srcno] |= XIVE_STATUS_ASSERTED; + +switch (old_pq) { +case XIVE_ESB_RESET: +xive_source_esb_set(xsrc, srcno, XIVE_ESB_PENDING); +return true; +default: +return false; +} +} + /* * Returns whether the event notification should be forwarded. */ static bool xive_source_esb_trigger(XiveSource *xsrc, uint32_t srcno) { +bool ret; + assert(srcno < xsrc->nr_irqs); -return xive_esb_trigger(>status[srcno]); +ret = xive_esb_trigger(>status[srcno]); + +if (xive_source_irq_is_lsi(xsrc, srcno) && +xive_source_esb_get(xsrc, srcno) == XIVE_ESB_QUEUED) { +qemu_log_mask(LOG_GUEST_ERROR, + "XIVE: queued an event on LSI IRQ %d\n", srcno); +} + +return ret; } /* @@ -104,9 +132,23 @@ static bool xive_source_esb_trigger(XiveSource *xsrc, uint32_t srcno) */ static bool xive_source_esb_eoi(XiveSource *xsrc, uint32_t srcno) { +bool ret; + assert(srcno < xsrc->nr_irqs); -return xive_esb_eoi(>status[srcno]); +ret = xive_esb_eoi(>status[srcno]); + +/* + * LSI sources do not set the Q bit but they can still be + * asserted, in which case we should forward a new event + * notification + */ +if (xive_source_irq_is_lsi(xsrc, srcno) && +xsrc->status[srcno] & XIVE_STATUS_ASSERTED) { +ret = xive_source_lsi_trigger(xsrc, srcno); +} + +return ret; } /* @@ -271,8 +313,16 @@ static void xive_source_set_irq(void *opaque, int srcno, int val) XiveSource *xsrc = XIVE_SOURCE(opaque); bool notify = false; -if (val) { -notify = xive_source_esb_trigger(xsrc, srcno); +if (xive_source_irq_is_lsi(xsrc, srcno)) { +if (val) { +notify = xive_source_lsi_trigger(xsrc, srcno); +} else { +xsrc->status[srcno] &= ~XIVE_STATUS_ASSERTED; +} +} else { +if (val) { +notify = xive_source_esb_trigger(xsrc, srcno); +} } /* Forward the source event notification for routing */ @@ -292,9 +342,11 @@ void xive_source_pic_print_info(XiveSource *xsrc, uint32_t offset, Monitor *mon) continue; } -monitor_printf(mon, " %08x %c%c\n", i + offset, +monitor_printf(mon, " %08x %s %c%c%c\n", i + offset, + xive_source_irq_is_lsi(xsrc, i) ? "LSI" : "MSI", pq & XIVE_ESB_VAL_P ? 'P' : '-', - pq & XIVE_ESB_VAL_Q ? 'Q' : '-'); + pq & XIVE_ESB_VAL_Q ? 'Q' : '-', + xsrc->status[i] & XIVE_STATUS_ASSERTED ? 'A' : ' '); } } @@ -302,6 +354,8 @@ static void xive_source_reset(void *dev) { XiveSource *xsrc = XIVE_SOURCE(dev); +/* Do not clear the LSI bitmap */ + /* PQs are initialized to 0b01 (Q=1) which corresponds to "ints off" */ memset(xsrc->status, XIVE_ESB_OFF, xsrc->nr_irqs); } @@ -324,6 +378,7 @@ static void xive_source_realize(DeviceState *dev, Error **errp) } xsrc->status = g_malloc0(xsrc->nr_irqs); +xsrc->lsi_map = bitmap_new(xsrc->nr_irqs); memory_region_init_io(>esb_mmio, OBJECT(xsrc), _source_esb_ops, xsrc, "xive.esb", diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h index 7aa2e38012..7cebc32eba 100644 --- a/include/hw/ppc/xive.h +++ b/include/hw/ppc/xive.h @@ -162,8 +162,9 @@ typedef struct XiveSource { /* IRQs */ uint32_tnr_irqs; qemu_irq*qirqs; +unsigned long *lsi_map; -/* PQ bits */ +/* PQ bits and LSI assertion bit */ uint8_t *status; /* ESB memory region */ @@ -219,6 +220,7 @@ static inline hwaddr xive_source_esb_mgmt(XiveSource *xsrc, int srcno) * When doing an EOI, the Q bit will indicate if the interrupt * needs to be re-triggered. */ +#define XIVE_STATUS_ASSERTED 0x4 /* Extra bit for LSI */ #define XIVE_ESB_VAL_P0x2 #define
[Qemu-devel] [PULL 28/40] spapr/xive: introduce a XIVE interrupt controller
From: Cédric Le Goater sPAPRXive models the XIVE interrupt controller of the sPAPR machine. It inherits from the XiveRouter and provisions storage for the routing tables : - Event Assignment Structure (EAS) - Event Notification Descriptor (END) The sPAPRXive model incorporates an internal XiveSource for the IPIs and for the interrupts of the virtual devices of the guest. This model is consistent with XIVE architecture which also incorporates an internal IVSE for IPIs and accelerator interrupts in the IVRE sub-engine. The sPAPRXive model exports two memory regions, one for the ESB trigger and management pages used to control the sources and one for the TIMA pages. They are mapped by default at the addresses found on chip 0 of a baremetal system. This is also consistent with the XIVE architecture which defines a Virtualization Controller BAR for the internal IVSE ESB pages and a Thread Managment BAR for the TIMA. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson [dwg: Fold in field accessor fixes] Signed-off-by: David Gibson --- default-configs/ppc64-softmmu.mak | 1 + hw/intc/Makefile.objs | 1 + hw/intc/spapr_xive.c | 366 ++ include/hw/ppc/spapr_xive.h | 45 4 files changed, 413 insertions(+) create mode 100644 hw/intc/spapr_xive.c create mode 100644 include/hw/ppc/spapr_xive.h diff --git a/default-configs/ppc64-softmmu.mak b/default-configs/ppc64-softmmu.mak index 2d1e7c5c46..7f34ad0528 100644 --- a/default-configs/ppc64-softmmu.mak +++ b/default-configs/ppc64-softmmu.mak @@ -17,6 +17,7 @@ CONFIG_XICS=$(CONFIG_PSERIES) CONFIG_XICS_SPAPR=$(CONFIG_PSERIES) CONFIG_XICS_KVM=$(call land,$(CONFIG_PSERIES),$(CONFIG_KVM)) CONFIG_XIVE=$(CONFIG_PSERIES) +CONFIG_XIVE_SPAPR=$(CONFIG_PSERIES) CONFIG_MEM_DEVICE=y CONFIG_DIMM=y CONFIG_SPAPR_RNG=y diff --git a/hw/intc/Makefile.objs b/hw/intc/Makefile.objs index 72a46ed91c..301a8e972d 100644 --- a/hw/intc/Makefile.objs +++ b/hw/intc/Makefile.objs @@ -38,6 +38,7 @@ obj-$(CONFIG_XICS) += xics.o obj-$(CONFIG_XICS_SPAPR) += xics_spapr.o obj-$(CONFIG_XICS_KVM) += xics_kvm.o obj-$(CONFIG_XIVE) += xive.o +obj-$(CONFIG_XIVE_SPAPR) += spapr_xive.o obj-$(CONFIG_POWERNV) += xics_pnv.o obj-$(CONFIG_ALLWINNER_A10_PIC) += allwinner-a10-pic.o obj-$(CONFIG_S390_FLIC) += s390_flic.o diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c new file mode 100644 index 00..5f03adca56 --- /dev/null +++ b/hw/intc/spapr_xive.c @@ -0,0 +1,366 @@ +/* + * QEMU PowerPC sPAPR XIVE interrupt controller model + * + * Copyright (c) 2017-2018, IBM Corporation. + * + * This code is licensed under the GPL version 2 or later. See the + * COPYING file in the top-level directory. + */ + +#include "qemu/osdep.h" +#include "qemu/log.h" +#include "qapi/error.h" +#include "qemu/error-report.h" +#include "target/ppc/cpu.h" +#include "sysemu/cpus.h" +#include "monitor/monitor.h" +#include "hw/ppc/spapr.h" +#include "hw/ppc/spapr_xive.h" +#include "hw/ppc/xive.h" +#include "hw/ppc/xive_regs.h" + +/* + * XIVE Virtualization Controller BAR and Thread Managment BAR that we + * use for the ESB pages and the TIMA pages + */ +#define SPAPR_XIVE_VC_BASE 0x00060100ull +#define SPAPR_XIVE_TM_BASE 0x000603020318ull + +/* + * On sPAPR machines, use a simplified output for the XIVE END + * structure dumping only the information related to the OS EQ. + */ +static void spapr_xive_end_pic_print_info(sPAPRXive *xive, XiveEND *end, + Monitor *mon) +{ +uint32_t qindex = xive_get_field32(END_W1_PAGE_OFF, end->w1); +uint32_t qgen = xive_get_field32(END_W1_GENERATION, end->w1); +uint32_t qsize = xive_get_field32(END_W0_QSIZE, end->w0); +uint32_t qentries = 1 << (qsize + 10); +uint32_t nvt = xive_get_field32(END_W6_NVT_INDEX, end->w6); +uint8_t priority = xive_get_field32(END_W7_F0_PRIORITY, end->w7); + +monitor_printf(mon, "%3d/%d % 6d/%5d ^%d", nvt, + priority, qindex, qentries, qgen); + +xive_end_queue_pic_print_info(end, 6, mon); +monitor_printf(mon, "]"); +} + +void spapr_xive_pic_print_info(sPAPRXive *xive, Monitor *mon) +{ +XiveSource *xsrc = >source; +int i; + +monitor_printf(mon, " LSIN PQEISN CPU/PRIO EQ\n"); + +for (i = 0; i < xive->nr_irqs; i++) { +uint8_t pq = xive_source_esb_get(xsrc, i); +XiveEAS *eas = >eat[i]; + +if (!xive_eas_is_valid(eas)) { +continue; +} + +monitor_printf(mon, " %08x %s %c%c%c %s %08x ", i, + xive_source_irq_is_lsi(xsrc, i) ? "LSI" : "MSI", + pq & XIVE_ESB_VAL_P ? 'P' : '-', + pq & XIVE_ESB_VAL_Q ? 'Q' : '-', + xsrc->status[i] & XIVE_STATUS_ASSERTED ? 'A' : ' ', + xive_eas_is_masked(eas) ? "M" : " ", + (int) xive_get_field64(EAS_END_DATA, eas->w)); + +if
[Qemu-devel] [PULL 09/40] ppc405_uc: use g_new(T, n) instead of g_malloc(sizeof(T) * n)
From: Greg Kurz Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: David Gibson --- hw/ppc/ppc405_uc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/ppc/ppc405_uc.c b/hw/ppc/ppc405_uc.c index 5c58415cf1..e1aadf126d 100644 --- a/hw/ppc/ppc405_uc.c +++ b/hw/ppc/ppc405_uc.c @@ -1519,7 +1519,7 @@ CPUPPCState *ppc405cr_init(MemoryRegion *address_space_mem, /* OBP arbitrer */ ppc4xx_opba_init(0xef600600); /* Universal interrupt controller */ -irqs = g_malloc0(sizeof(qemu_irq) * PPCUIC_OUTPUT_NB); +irqs = g_new0(qemu_irq, PPCUIC_OUTPUT_NB); irqs[PPCUIC_OUTPUT_INT] = ((qemu_irq *)env->irq_inputs)[PPC40x_INPUT_INT]; irqs[PPCUIC_OUTPUT_CINT] = @@ -1877,7 +1877,7 @@ CPUPPCState *ppc405ep_init(MemoryRegion *address_space_mem, /* Initialize timers */ ppc_booke_timers_init(cpu, sysclk, 0); /* Universal interrupt controller */ -irqs = g_malloc0(sizeof(qemu_irq) * PPCUIC_OUTPUT_NB); +irqs = g_new0(qemu_irq, PPCUIC_OUTPUT_NB); irqs[PPCUIC_OUTPUT_INT] = ((qemu_irq *)env->irq_inputs)[PPC40x_INPUT_INT]; irqs[PPCUIC_OUTPUT_CINT] = -- 2.19.2
[Qemu-devel] [PULL 22/40] spapr: export and rename the xics_max_server_number() routine
From: Cédric Le Goater The XIVE sPAPR IRQ backend will use it to define the number of ENDs of the IC controller. Signed-off-by: Cédric Le Goater Signed-off-by: David Gibson --- hw/ppc/spapr.c | 8 include/hw/ppc/spapr.h | 1 + 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index c1c0e75fcd..fc47a058dd 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -150,7 +150,7 @@ static void pre_2_10_vmstate_unregister_dummy_icp(int i) (void *)(uintptr_t) i); } -static int xics_max_server_number(sPAPRMachineState *spapr) +int spapr_max_server_number(sPAPRMachineState *spapr) { assert(spapr->vsmt); return DIV_ROUND_UP(max_cpus * spapr->vsmt, smp_threads); @@ -1268,7 +1268,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr, _FDT(fdt_setprop_cell(fdt, 0, "#size-cells", 2)); /* /interrupt controller */ -spapr_dt_xics(xics_max_server_number(spapr), fdt, PHANDLE_XICP); +spapr_dt_xics(spapr_max_server_number(spapr), fdt, PHANDLE_XICP); ret = spapr_populate_memory(spapr, fdt); if (ret < 0) { @@ -2467,7 +2467,7 @@ static void spapr_init_cpus(sPAPRMachineState *spapr) if (smc->pre_2_10_has_unused_icps) { int i; -for (i = 0; i < xics_max_server_number(spapr); i++) { +for (i = 0; i < spapr_max_server_number(spapr); i++) { /* Dummy entries get deregistered when real ICPState objects * are registered during CPU core hotplug. */ @@ -2588,7 +2588,7 @@ static void spapr_machine_init(MachineState *machine) /* * VSMT must be set in order to be able to compute VCPU ids, ie to - * call xics_max_server_number() or spapr_vcpu_id(). + * call spapr_max_server_number() or spapr_vcpu_id(). */ spapr_set_vsmt_mode(spapr, _fatal); diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index 6279711fe8..198764066d 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -737,6 +737,7 @@ int spapr_hpt_shift_for_ramsize(uint64_t ramsize); void spapr_reallocate_hpt(sPAPRMachineState *spapr, int shift, Error **errp); void spapr_clear_pending_events(sPAPRMachineState *spapr); +int spapr_max_server_number(sPAPRMachineState *spapr); /* CPU and LMB DRC release callbacks. */ void spapr_core_release(DeviceState *dev); -- 2.19.2
[Qemu-devel] [PULL 20/40] spapr: initialize VSMT before initializing the IRQ backend
From: Cédric Le Goater We will need to use xics_max_server_number() to create the sPAPRXive object modeling the interrupt controller of the machine which is created before the CPUs. Signed-off-by: Cédric Le Goater Reviewed-by: Greg Kurz [dwg: Fix style nit] Signed-off-by: David Gibson --- hw/ppc/spapr.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 051d080fe5..2b2df6b848 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2464,11 +2464,6 @@ static void spapr_init_cpus(sPAPRMachineState *spapr) boot_cores_nr = possible_cpus->len; } -/* VSMT must be set in order to be able to compute VCPU ids, ie to - * call xics_max_server_number() or spapr_vcpu_id(). - */ -spapr_set_vsmt_mode(spapr, _fatal); - if (smc->pre_2_10_has_unused_icps) { int i; @@ -2591,6 +2586,12 @@ static void spapr_machine_init(MachineState *machine) /* Setup a load limit for the ramdisk leaving room for SLOF and FDT */ load_limit = MIN(spapr->rma_size, RTAS_MAX_ADDR) - FW_OVERHEAD; +/* + * VSMT must be set in order to be able to compute VCPU ids, ie to + * call xics_max_server_number() or spapr_vcpu_id(). + */ +spapr_set_vsmt_mode(spapr, _fatal); + /* Set up Interrupt Controller before we create the VCPUs */ smc->irq->init(spapr, _fatal); -- 2.19.2
[Qemu-devel] [PULL 26/40] ppc/xive: introduce a simplified XIVE presenter
From: Cédric Le Goater The last sub-engine of the XIVE architecture is the Interrupt Virtualization Presentation Engine (IVPE). On HW, the IVRE and the IVPE share elements, the Power Bus interface (CQ), the routing table descriptors, and they can be combined in the same HW logic. We do the same in QEMU and combine both engines in the XiveRouter for simplicity. When the IVRE has completed its job of matching an event source with a Notification Virtual Target (NVT) to notify, it forwards the event notification to the IVPE sub-engine. The IVPE scans the thread interrupt contexts of the Notification Virtual Targets (NVT) dispatched on the HW processor threads and if a match is found, it signals the thread. If not, the IVPE escalates the notification to some other targets and records the notification in a backlog queue. The IVPE maintains the thread interrupt context state for each of its NVTs not dispatched on HW processor threads in the Notification Virtual Target table (NVTT). The model currently only supports single NVT notifications. Signed-off-by: Cédric Le Goater [dwg: Folded in fix for field accessors] Signed-off-by: David Gibson --- hw/intc/xive.c | 190 + include/hw/ppc/xive.h | 14 +++ include/hw/ppc/xive_regs.h | 24 + 3 files changed, 228 insertions(+) diff --git a/hw/intc/xive.c b/hw/intc/xive.c index 06a835c454..1d737346c3 100644 --- a/hw/intc/xive.c +++ b/hw/intc/xive.c @@ -984,6 +984,188 @@ int xive_router_write_end(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx, return xrc->write_end(xrtr, end_blk, end_idx, end, word_number); } +int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx, +XiveNVT *nvt) +{ + XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr); + + return xrc->get_nvt(xrtr, nvt_blk, nvt_idx, nvt); +} + +int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx, +XiveNVT *nvt, uint8_t word_number) +{ + XiveRouterClass *xrc = XIVE_ROUTER_GET_CLASS(xrtr); + + return xrc->write_nvt(xrtr, nvt_blk, nvt_idx, nvt, word_number); +} + +/* + * The thread context register words are in big-endian format. + */ +static int xive_presenter_tctx_match(XiveTCTX *tctx, uint8_t format, + uint8_t nvt_blk, uint32_t nvt_idx, + bool cam_ignore, uint32_t logic_serv) +{ +uint32_t cam = xive_nvt_cam_line(nvt_blk, nvt_idx); +uint32_t qw2w2 = xive_tctx_word2(>regs[TM_QW2_HV_POOL]); +uint32_t qw1w2 = xive_tctx_word2(>regs[TM_QW1_OS]); +uint32_t qw0w2 = xive_tctx_word2(>regs[TM_QW0_USER]); + +/* + * TODO (PowerNV): ignore mode. The low order bits of the NVT + * identifier are ignored in the "CAM" match. + */ + +if (format == 0) { +if (cam_ignore == true) { +/* + * F=0 & i=1: Logical server notification (bits ignored at + * the end of the NVT identifier) + */ +qemu_log_mask(LOG_UNIMP, "XIVE: no support for LS NVT %x/%x\n", + nvt_blk, nvt_idx); + return -1; +} + +/* F=0 & i=0: Specific NVT notification */ + +/* TODO (PowerNV) : PHYS ring */ + +/* HV POOL ring */ +if ((be32_to_cpu(qw2w2) & TM_QW2W2_VP) && +cam == xive_get_field32(TM_QW2W2_POOL_CAM, qw2w2)) { +return TM_QW2_HV_POOL; +} + +/* OS ring */ +if ((be32_to_cpu(qw1w2) & TM_QW1W2_VO) && +cam == xive_get_field32(TM_QW1W2_OS_CAM, qw1w2)) { +return TM_QW1_OS; +} +} else { +/* F=1 : User level Event-Based Branch (EBB) notification */ + +/* USER ring */ +if ((be32_to_cpu(qw1w2) & TM_QW1W2_VO) && + (cam == xive_get_field32(TM_QW1W2_OS_CAM, qw1w2)) && + (be32_to_cpu(qw0w2) & TM_QW0W2_VU) && + (logic_serv == xive_get_field32(TM_QW0W2_LOGIC_SERV, qw0w2))) { +return TM_QW0_USER; +} +} +return -1; +} + +typedef struct XiveTCTXMatch { +XiveTCTX *tctx; +uint8_t ring; +} XiveTCTXMatch; + +static bool xive_presenter_match(XiveRouter *xrtr, uint8_t format, + uint8_t nvt_blk, uint32_t nvt_idx, + bool cam_ignore, uint8_t priority, + uint32_t logic_serv, XiveTCTXMatch *match) +{ +CPUState *cs; + +/* + * TODO (PowerNV): handle chip_id overwrite of block field for + * hardwired CAM compares + */ + +CPU_FOREACH(cs) { +PowerPCCPU *cpu = POWERPC_CPU(cs); +XiveTCTX *tctx = XIVE_TCTX(cpu->intc); +int ring; + +/* + * HW checks that the CPU is enabled in the Physical Thread + * Enable Register (PTER). + */ + +/* + * Check the thread context CAM lines and record matches. We + * will handle
[Qemu-devel] [PULL 08/40] ppc405_boards: use g_new(T, n) instead of g_malloc(sizeof(T) * n)
From: Greg Kurz Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz Signed-off-by: David Gibson --- hw/ppc/ppc405_boards.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/ppc/ppc405_boards.c b/hw/ppc/ppc405_boards.c index 1b0a0a8ba3..f47b15f10e 100644 --- a/hw/ppc/ppc405_boards.c +++ b/hw/ppc/ppc405_boards.c @@ -149,7 +149,7 @@ static void ref405ep_init(MachineState *machine) MemoryRegion *bios; MemoryRegion *sram = g_new(MemoryRegion, 1); ram_addr_t bdloc; -MemoryRegion *ram_memories = g_malloc(2 * sizeof(*ram_memories)); +MemoryRegion *ram_memories = g_new(MemoryRegion, 2); hwaddr ram_bases[2], ram_sizes[2]; target_ulong sram_size; long bios_size; @@ -448,7 +448,7 @@ static void taihu_405ep_init(MachineState *machine) qemu_irq *pic; MemoryRegion *sysmem = get_system_memory(); MemoryRegion *bios; -MemoryRegion *ram_memories = g_malloc(2 * sizeof(*ram_memories)); +MemoryRegion *ram_memories = g_new(MemoryRegion, 2); MemoryRegion *ram = g_malloc0(sizeof(*ram)); hwaddr ram_bases[2], ram_sizes[2]; long bios_size; -- 2.19.2
[Qemu-devel] [PULL 13/40] mac_newworld: simplify IRQ wiring
From: Greg Kurz The OpenPIC have 5 outputs per connected CPU. The machine init code hence needs a bi-dimensional array (smp_cpu lines, 5 columns) to wire up the irqs between the PIC and the CPUs. The current code first allocates an array of smp_cpus pointers to qemu_irq type, then it allocates another array of smp_cpus * 5 qemu_irq and fills the first array with pointers to each line of the second array. This is rather convoluted. Simplify the logic by introducing a structured type that describes all the OpenPIC outputs for a single CPU, ie, fixed size of 5 qemu_irq, and only allocate a smp_cpu sized array of those. This also allows to use g_new(T, n) instead of g_malloc(sizeof(T) * n) as recommended in HACKING. Signed-off-by: Greg Kurz Signed-off-by: David Gibson --- hw/ppc/mac_newworld.c| 30 +- include/hw/ppc/openpic.h | 2 ++ 2 files changed, 15 insertions(+), 17 deletions(-) diff --git a/hw/ppc/mac_newworld.c b/hw/ppc/mac_newworld.c index 7e45afae7c..bb19eaba36 100644 --- a/hw/ppc/mac_newworld.c +++ b/hw/ppc/mac_newworld.c @@ -115,7 +115,7 @@ static void ppc_core99_init(MachineState *machine) PowerPCCPU *cpu = NULL; CPUPPCState *env = NULL; char *filename; -qemu_irq **openpic_irqs; +IrqLines *openpic_irqs; int linux_boot, i, j, k; MemoryRegion *ram = g_new(MemoryRegion, 1), *bios = g_new(MemoryRegion, 1); hwaddr kernel_base, initrd_base, cmdline_base = 0; @@ -248,41 +248,37 @@ static void ppc_core99_init(MachineState *machine) memory_region_add_subregion(get_system_memory(), 0xf800, sysbus_mmio_get_region(s, 0)); -openpic_irqs = g_malloc0(smp_cpus * sizeof(qemu_irq *)); -openpic_irqs[0] = -g_malloc0(smp_cpus * sizeof(qemu_irq) * OPENPIC_OUTPUT_NB); +openpic_irqs = g_new0(IrqLines, smp_cpus); for (i = 0; i < smp_cpus; i++) { /* Mac99 IRQ connection between OpenPIC outputs pins * and PowerPC input pins */ switch (PPC_INPUT(env)) { case PPC_FLAGS_INPUT_6xx: -openpic_irqs[i] = openpic_irqs[0] + (i * OPENPIC_OUTPUT_NB); -openpic_irqs[i][OPENPIC_OUTPUT_INT] = +openpic_irqs[i].irq[OPENPIC_OUTPUT_INT] = ((qemu_irq *)env->irq_inputs)[PPC6xx_INPUT_INT]; -openpic_irqs[i][OPENPIC_OUTPUT_CINT] = +openpic_irqs[i].irq[OPENPIC_OUTPUT_CINT] = ((qemu_irq *)env->irq_inputs)[PPC6xx_INPUT_INT]; -openpic_irqs[i][OPENPIC_OUTPUT_MCK] = +openpic_irqs[i].irq[OPENPIC_OUTPUT_MCK] = ((qemu_irq *)env->irq_inputs)[PPC6xx_INPUT_MCP]; /* Not connected ? */ -openpic_irqs[i][OPENPIC_OUTPUT_DEBUG] = NULL; +openpic_irqs[i].irq[OPENPIC_OUTPUT_DEBUG] = NULL; /* Check this */ -openpic_irqs[i][OPENPIC_OUTPUT_RESET] = +openpic_irqs[i].irq[OPENPIC_OUTPUT_RESET] = ((qemu_irq *)env->irq_inputs)[PPC6xx_INPUT_HRESET]; break; #if defined(TARGET_PPC64) case PPC_FLAGS_INPUT_970: -openpic_irqs[i] = openpic_irqs[0] + (i * OPENPIC_OUTPUT_NB); -openpic_irqs[i][OPENPIC_OUTPUT_INT] = +openpic_irqs[i].irq[OPENPIC_OUTPUT_INT] = ((qemu_irq *)env->irq_inputs)[PPC970_INPUT_INT]; -openpic_irqs[i][OPENPIC_OUTPUT_CINT] = +openpic_irqs[i].irq[OPENPIC_OUTPUT_CINT] = ((qemu_irq *)env->irq_inputs)[PPC970_INPUT_INT]; -openpic_irqs[i][OPENPIC_OUTPUT_MCK] = +openpic_irqs[i].irq[OPENPIC_OUTPUT_MCK] = ((qemu_irq *)env->irq_inputs)[PPC970_INPUT_MCP]; /* Not connected ? */ -openpic_irqs[i][OPENPIC_OUTPUT_DEBUG] = NULL; +openpic_irqs[i].irq[OPENPIC_OUTPUT_DEBUG] = NULL; /* Check this */ -openpic_irqs[i][OPENPIC_OUTPUT_RESET] = +openpic_irqs[i].irq[OPENPIC_OUTPUT_RESET] = ((qemu_irq *)env->irq_inputs)[PPC970_INPUT_HRESET]; break; #endif /* defined(TARGET_PPC64) */ @@ -299,7 +295,7 @@ static void ppc_core99_init(MachineState *machine) k = 0; for (i = 0; i < smp_cpus; i++) { for (j = 0; j < OPENPIC_OUTPUT_NB; j++) { -sysbus_connect_irq(s, k++, openpic_irqs[i][j]); +sysbus_connect_irq(s, k++, openpic_irqs[i].irq[j]); } } g_free(openpic_irqs); diff --git a/include/hw/ppc/openpic.h b/include/hw/ppc/openpic.h index 5eb982197d..dad08fe9be 100644 --- a/include/hw/ppc/openpic.h +++ b/include/hw/ppc/openpic.h @@ -20,6 +20,8 @@ enum { OPENPIC_OUTPUT_NB, }; +typedef struct IrqLines { qemu_irq irq[OPENPIC_OUTPUT_NB]; } IrqLines; + #define OPENPIC_MODEL_RAVEN 0 #define OPENPIC_MODEL_FSL_MPIC_20 1 #define OPENPIC_MODEL_FSL_MPIC_42 2 -- 2.19.2
[Qemu-devel] [PULL 14/40] e500: simplify IRQ wiring
From: Greg Kurz The OpenPIC have 5 outputs per connected CPU. The machine init code hence needs a bi-dimensional array (smp_cpu lines, 5 columns) to wire up the irqs between the PIC and the CPUs. The current code first allocates an array of smp_cpus pointers to qemu_irq type, then it allocates another array of smp_cpus * 5 qemu_irq and fills the first array with pointers to each line of the second array. This is rather convoluted. Simplify the logic by introducing a structured type that describes all the OpenPIC outputs for a single CPU, ie, fixed size of 5 qemu_irq, and only allocate a smp_cpu sized array of those. This also allows to use g_new(T, n) instead of g_malloc(sizeof(T) * n) as recommended in HACKING. Signed-off-by: Greg Kurz Signed-off-by: David Gibson --- hw/ppc/e500.c | 18 -- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/hw/ppc/e500.c b/hw/ppc/e500.c index e6747fce28..b20fea0dfc 100644 --- a/hw/ppc/e500.c +++ b/hw/ppc/e500.c @@ -685,7 +685,7 @@ static void ppce500_cpu_reset(void *opaque) } static DeviceState *ppce500_init_mpic_qemu(PPCE500MachineState *pms, - qemu_irq **irqs) + IrqLines *irqs) { DeviceState *dev; SysBusDevice *s; @@ -705,7 +705,7 @@ static DeviceState *ppce500_init_mpic_qemu(PPCE500MachineState *pms, k = 0; for (i = 0; i < smp_cpus; i++) { for (j = 0; j < OPENPIC_OUTPUT_NB; j++) { -sysbus_connect_irq(s, k++, irqs[i][j]); +sysbus_connect_irq(s, k++, irqs[i].irq[j]); } } @@ -713,7 +713,7 @@ static DeviceState *ppce500_init_mpic_qemu(PPCE500MachineState *pms, } static DeviceState *ppce500_init_mpic_kvm(const PPCE500MachineClass *pmc, - qemu_irq **irqs, Error **errp) + IrqLines *irqs, Error **errp) { Error *err = NULL; DeviceState *dev; @@ -742,7 +742,7 @@ static DeviceState *ppce500_init_mpic_kvm(const PPCE500MachineClass *pmc, static DeviceState *ppce500_init_mpic(PPCE500MachineState *pms, MemoryRegion *ccsr, - qemu_irq **irqs) + IrqLines *irqs) { MachineState *machine = MACHINE(pms); const PPCE500MachineClass *pmc = PPCE500_MACHINE_GET_CLASS(pms); @@ -806,15 +806,14 @@ void ppce500_init(MachineState *machine) /* irq num for pin INTA, INTB, INTC and INTD is 1, 2, 3 and * 4 respectively */ unsigned int pci_irq_nrs[PCI_NUM_PINS] = {1, 2, 3, 4}; -qemu_irq **irqs; +IrqLines *irqs; DeviceState *dev, *mpicdev; CPUPPCState *firstenv = NULL; MemoryRegion *ccsr_addr_space; SysBusDevice *s; PPCE500CCSRState *ccsr; -irqs = g_malloc0(smp_cpus * sizeof(qemu_irq *)); -irqs[0] = g_malloc0(smp_cpus * sizeof(qemu_irq) * OPENPIC_OUTPUT_NB); +irqs = g_new0(IrqLines, smp_cpus); for (i = 0; i < smp_cpus; i++) { PowerPCCPU *cpu; CPUState *cs; @@ -834,10 +833,9 @@ void ppce500_init(MachineState *machine) firstenv = env; } -irqs[i] = irqs[0] + (i * OPENPIC_OUTPUT_NB); input = (qemu_irq *)env->irq_inputs; -irqs[i][OPENPIC_OUTPUT_INT] = input[PPCE500_INPUT_INT]; -irqs[i][OPENPIC_OUTPUT_CINT] = input[PPCE500_INPUT_CINT]; +irqs[i].irq[OPENPIC_OUTPUT_INT] = input[PPCE500_INPUT_INT]; +irqs[i].irq[OPENPIC_OUTPUT_CINT] = input[PPCE500_INPUT_CINT]; env->spr_cb[SPR_BOOKE_PIR].default_value = cs->cpu_index = i; env->mpic_iack = pmc->ccsrbar_base + MPC8544_MPIC_REGS_OFFSET + 0xa0; -- 2.19.2
[Qemu-devel] [PULL 25/40] ppc/xive: introduce the XIVE interrupt thread context
From: Cédric Le Goater Each POWER9 processor chip has a XIVE presenter that can generate four different exceptions to its threads: - hypervisor exception, - O/S exception - Event-Based Branch (EBB) - msgsnd (doorbell). Each exception has a state independent from the others called a Thread Interrupt Management context. This context is a set of registers which lets the thread handle priority management and interrupt acknowledgment among other things. The most important ones being : - Interrupt Priority Register (PIPR) - Interrupt Pending Buffer (IPB) - Current Processor Priority (CPPR) - Notification Source Register (NSR) These registers are accessible through a specific MMIO region, called the Thread Interrupt Management Area (TIMA), four aligned pages, each exposing a different view of the registers. First page (page address ending in 0b00) gives access to the entire context and is reserved for the ring 0 view for the physical thread context. The second (page address ending in 0b01) is for the hypervisor, ring 1 view. The third (page address ending in 0b10) is for the operating system, ring 2 view. The fourth (page address ending in 0b11) is for user level, ring 3 view. The thread interrupt context is modeled with a XiveTCTX object containing the values of the different exception registers. The TIMA region is mapped at the same address for each CPU. Signed-off-by: Cédric Le Goater Reviewed-by: David Gibson Signed-off-by: David Gibson --- hw/intc/xive.c | 424 + include/hw/ppc/xive.h | 44 include/hw/ppc/xive_regs.h | 82 +++ 3 files changed, 550 insertions(+) diff --git a/hw/intc/xive.c b/hw/intc/xive.c index 7b2ef7480d..06a835c454 100644 --- a/hw/intc/xive.c +++ b/hw/intc/xive.c @@ -16,6 +16,429 @@ #include "hw/qdev-properties.h" #include "monitor/monitor.h" #include "hw/ppc/xive.h" +#include "hw/ppc/xive_regs.h" + +/* + * XIVE Thread Interrupt Management context + */ + +static uint64_t xive_tctx_accept(XiveTCTX *tctx, uint8_t ring) +{ +return 0; +} + +static void xive_tctx_set_cppr(XiveTCTX *tctx, uint8_t ring, uint8_t cppr) +{ +if (cppr > XIVE_PRIORITY_MAX) { +cppr = 0xff; +} + +tctx->regs[ring + TM_CPPR] = cppr; +} + +/* + * XIVE Thread Interrupt Management Area (TIMA) + */ + +/* + * Define an access map for each page of the TIMA that we will use in + * the memory region ops to filter values when doing loads and stores + * of raw registers values + * + * Registers accessibility bits : + * + *0x0 - no access + *0x1 - write only + *0x2 - read only + *0x3 - read/write + */ + +static const uint8_t xive_tm_hw_view[] = { +/* QW-0 User */ 3, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 0, 0, 0, 0, +/* QW-1 OS */ 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 0, 0, 0, 0, +/* QW-2 POOL */ 0, 0, 3, 3, 0, 0, 0, 0, 3, 3, 3, 3, 0, 0, 0, 0, +/* QW-3 PHYS */ 3, 3, 3, 3, 0, 3, 0, 3, 3, 0, 0, 3, 3, 3, 3, 0, +}; + +static const uint8_t xive_tm_hv_view[] = { +/* QW-0 User */ 3, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 0, 0, 0, 0, +/* QW-1 OS */ 3, 3, 3, 3, 3, 3, 0, 3, 3, 3, 3, 3, 0, 0, 0, 0, +/* QW-2 POOL */ 0, 0, 3, 3, 0, 0, 0, 0, 0, 3, 3, 3, 0, 0, 0, 0, +/* QW-3 PHYS */ 3, 3, 3, 3, 0, 3, 0, 3, 3, 0, 0, 3, 0, 0, 0, 0, +}; + +static const uint8_t xive_tm_os_view[] = { +/* QW-0 User */ 3, 0, 0, 0, 0, 0, 0, 0, 3, 3, 3, 3, 0, 0, 0, 0, +/* QW-1 OS */ 2, 3, 2, 2, 2, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, +/* QW-2 POOL */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/* QW-3 PHYS */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +}; + +static const uint8_t xive_tm_user_view[] = { +/* QW-0 User */ 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/* QW-1 OS */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/* QW-2 POOL */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +/* QW-3 PHYS */ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, +}; + +/* + * Overall TIMA access map for the thread interrupt management context + * registers + */ +static const uint8_t *xive_tm_views[] = { +[XIVE_TM_HW_PAGE] = xive_tm_hw_view, +[XIVE_TM_HV_PAGE] = xive_tm_hv_view, +[XIVE_TM_OS_PAGE] = xive_tm_os_view, +[XIVE_TM_USER_PAGE] = xive_tm_user_view, +}; + +/* + * Computes a register access mask for a given offset in the TIMA + */ +static uint64_t xive_tm_mask(hwaddr offset, unsigned size, bool write) +{ +uint8_t page_offset = (offset >> TM_SHIFT) & 0x3; +uint8_t reg_offset = offset & 0x3F; +uint8_t reg_mask = write ? 0x1 : 0x2; +uint64_t mask = 0x0; +int i; + +for (i = 0; i < size; i++) { +if (xive_tm_views[page_offset][reg_offset + i] & reg_mask) { +mask |= (uint64_t) 0xff << (8 * (size - i - 1)); +} +} + +return mask; +} + +static void xive_tm_raw_write(XiveTCTX
[Qemu-devel] [PULL 05/40] spapr: drop redundant statement in spapr_populate_drconf_memory()
From: Greg Kurz Signed-off-by: Greg Kurz Signed-off-by: David Gibson Reviewed-by: Laurent Vivier --- hw/ppc/spapr.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index b423db311e..051d080fe5 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -889,8 +889,6 @@ static int spapr_populate_drconf_memory(sPAPRMachineState *spapr, void *fdt) /* ibm,associativity-lookup-arrays */ buf_len = (nr_nodes * 4 + 2) * sizeof(uint32_t); cur_index = int_buf = g_malloc0(buf_len); - -cur_index = int_buf; int_buf[0] = cpu_to_be32(nr_nodes); int_buf[1] = cpu_to_be32(4); /* Number of entries per associativity list */ cur_index += 2; -- 2.19.2
[Qemu-devel] [PULL 10/40] ppc440_bamboo: use g_new(T, n) instead of g_malloc(sizeof(T) * n)
From: Greg Kurz Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Edgar E. Iglesias Signed-off-by: David Gibson --- hw/ppc/ppc440_bamboo.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/hw/ppc/ppc440_bamboo.c b/hw/ppc/ppc440_bamboo.c index f5720f979e..b8aa55d526 100644 --- a/hw/ppc/ppc440_bamboo.c +++ b/hw/ppc/ppc440_bamboo.c @@ -169,8 +169,7 @@ static void bamboo_init(MachineState *machine) unsigned int pci_irq_nrs[4] = { 28, 27, 26, 25 }; MemoryRegion *address_space_mem = get_system_memory(); MemoryRegion *isa = g_new(MemoryRegion, 1); -MemoryRegion *ram_memories -= g_malloc(PPC440EP_SDRAM_NR_BANKS * sizeof(*ram_memories)); +MemoryRegion *ram_memories = g_new(MemoryRegion, PPC440EP_SDRAM_NR_BANKS); hwaddr ram_bases[PPC440EP_SDRAM_NR_BANKS]; hwaddr ram_sizes[PPC440EP_SDRAM_NR_BANKS]; qemu_irq *pic; @@ -200,7 +199,7 @@ static void bamboo_init(MachineState *machine) ppc_dcr_init(env, NULL, NULL); /* interrupt controller */ -irqs = g_malloc0(sizeof(qemu_irq) * PPCUIC_OUTPUT_NB); +irqs = g_new0(qemu_irq, PPCUIC_OUTPUT_NB); irqs[PPCUIC_OUTPUT_INT] = ((qemu_irq *)env->irq_inputs)[PPC40x_INPUT_INT]; irqs[PPCUIC_OUTPUT_CINT] = ((qemu_irq *)env->irq_inputs)[PPC40x_INPUT_CINT]; pic = ppcuic_init(env, irqs, 0x0C0, 0, 1); -- 2.19.2
[Qemu-devel] [PULL 12/40] virtex_ml507: use g_new(T, n) instead of g_malloc(sizeof(T) * n)
From: Greg Kurz Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz Reviewed-by: Philippe Mathieu-Daudé Reviewed-by: Edgar E. Iglesias Signed-off-by: David Gibson --- hw/ppc/virtex_ml507.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ppc/virtex_ml507.c b/hw/ppc/virtex_ml507.c index ee9b4b4490..5177120574 100644 --- a/hw/ppc/virtex_ml507.c +++ b/hw/ppc/virtex_ml507.c @@ -105,7 +105,7 @@ static PowerPCCPU *ppc440_init_xilinx(ram_addr_t *ram_size, ppc_dcr_init(env, NULL, NULL); /* interrupt controller */ -irqs = g_malloc0(sizeof(qemu_irq) * PPCUIC_OUTPUT_NB); +irqs = g_new0(qemu_irq, PPCUIC_OUTPUT_NB); irqs[PPCUIC_OUTPUT_INT] = ((qemu_irq *)env->irq_inputs)[PPC40x_INPUT_INT]; irqs[PPCUIC_OUTPUT_CINT] = ((qemu_irq *)env->irq_inputs)[PPC40x_INPUT_CINT]; ppcuic_init(env, irqs, 0x0C0, 0, 1); -- 2.19.2
[Qemu-devel] [PULL 07/40] spapr: use g_new(T, n) instead of g_malloc(sizeof(T) * n)
From: Greg Kurz Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz Signed-off-by: David Gibson --- hw/ppc/spapr_iommu.c | 2 +- hw/ppc/spapr_vio.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c index 1b0880ac9e..b56466f89a 100644 --- a/hw/ppc/spapr_iommu.c +++ b/hw/ppc/spapr_iommu.c @@ -93,7 +93,7 @@ static uint64_t *spapr_tce_alloc_table(uint32_t liobn, if (!table) { *fd = -1; -table = g_malloc0(nb_table * sizeof(uint64_t)); +table = g_new0(uint64_t, nb_table); } trace_spapr_iommu_new_table(liobn, table, *fd); diff --git a/hw/ppc/spapr_vio.c b/hw/ppc/spapr_vio.c index 840d4a3c45..7e8a9ad093 100644 --- a/hw/ppc/spapr_vio.c +++ b/hw/ppc/spapr_vio.c @@ -730,7 +730,7 @@ void spapr_dt_vdevice(VIOsPAPRBus *bus, void *fdt) } /* Copy out into an array of pointers */ -qdevs = g_malloc(sizeof(qdev) * num); +qdevs = g_new(DeviceState *, num); num = 0; QTAILQ_FOREACH(kid, >bus.children, sibling) { qdevs[num++] = kid->child; -- 2.19.2
[Qemu-devel] [PULL 23/40] Changes requirement for "vsubsbs" instruction
From: "Paul A. Clarke" Changes requirement for "vsubsbs" instruction, which has been supported since ISA 2.03. (Please see section 5.9.1.2 of ISA 2.03) Reported-by: Paul A. Clarke Signed-off-by: Paul A. Clarke Signed-off-by: Leonardo Bras Signed-off-by: David Gibson --- target/ppc/translate/vmx-ops.inc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/target/ppc/translate/vmx-ops.inc.c b/target/ppc/translate/vmx-ops.inc.c index 139f80cb24..84e05fb827 100644 --- a/target/ppc/translate/vmx-ops.inc.c +++ b/target/ppc/translate/vmx-ops.inc.c @@ -143,7 +143,7 @@ GEN_VXFORM(vaddsws, 0, 14), GEN_VXFORM_DUAL(vsububs, bcdadd, 0, 24, PPC_ALTIVEC, PPC_NONE), GEN_VXFORM_DUAL(vsubuhs, bcdsub, 0, 25, PPC_ALTIVEC, PPC_NONE), GEN_VXFORM(vsubuws, 0, 26), -GEN_VXFORM_DUAL(vsubsbs, bcdtrunc, 0, 28, PPC_NONE, PPC2_ISA300), +GEN_VXFORM_DUAL(vsubsbs, bcdtrunc, 0, 28, PPC_ALTIVEC, PPC2_ISA300), GEN_VXFORM(vsubshs, 0, 29), GEN_VXFORM_DUAL(vsubsws, xpnd04_2, 0, 30, PPC_ALTIVEC, PPC_NONE), GEN_VXFORM_207(vadduqm, 0, 4), -- 2.19.2
[Qemu-devel] [PULL 04/40] target/ppc: tcg: Implement addex instruction
From: Suraj Jitindar Singh Implement the addex instruction introduced in ISA V3.00 in qemu tcg. The add extended using alternate carry bit (addex) instruction performs the same operation as the add extended (adde) instruction, but using the overflow (ov) field in the fixed point exception register (xer) as the carry in and out instead of the carry (ca) field. The instruction has a Z23-form, not an XO form, as follows: -- | 31 | RT | RA | RB | CY | 170 | 0 | -- 0611 16 21 233132 However since the only valid form of the instruction defined so far is CY = 0, we can treat this like an XO form instruction. There is no dot form (addex.) of the instruction and the summary overflow (so) bit in the xer is not modified by this instruction. For simplicity we reuse the gen_op_arith_add function and add a function argument to specify where the carry in input should come from and the carry out output be stored (note must be the same location). Signed-off-by: Suraj Jitindar Singh Signed-off-by: David Gibson --- disas/ppc.c| 2 ++ target/ppc/translate.c | 60 +++--- 2 files changed, 35 insertions(+), 27 deletions(-) diff --git a/disas/ppc.c b/disas/ppc.c index 5ab9c35a84..da1140ba2b 100644 --- a/disas/ppc.c +++ b/disas/ppc.c @@ -3734,6 +3734,8 @@ const struct powerpc_opcode powerpc_opcodes[] = { { "addmeo.", XO(31,234,1,1), XORB_MASK, PPCCOM,{ RT, RA } }, { "ameo.", XO(31,234,1,1), XORB_MASK, PWRCOM,{ RT, RA } }, +{ "addex", XO(31,170,0,0), XO_MASK, POWER9, { RT, RA, RB } }, + { "mullw", XO(31,235,0,0), XO_MASK, PPCCOM, { RT, RA, RB } }, { "muls",XO(31,235,0,0), XO_MASK, PWRCOM, { RT, RA, RB } }, { "mullw.", XO(31,235,0,1), XO_MASK, PPCCOM, { RT, RA, RB } }, diff --git a/target/ppc/translate.c b/target/ppc/translate.c index 2b37910248..96894ab9a8 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -849,7 +849,7 @@ static inline void gen_op_arith_compute_ov(DisasContext *ctx, TCGv arg0, static inline void gen_op_arith_compute_ca32(DisasContext *ctx, TCGv res, TCGv arg0, TCGv arg1, - int sub) + TCGv ca32, int sub) { TCGv t0; @@ -864,13 +864,14 @@ static inline void gen_op_arith_compute_ca32(DisasContext *ctx, tcg_gen_xor_tl(t0, arg0, arg1); } tcg_gen_xor_tl(t0, t0, res); -tcg_gen_extract_tl(cpu_ca32, t0, 32, 1); +tcg_gen_extract_tl(ca32, t0, 32, 1); tcg_temp_free(t0); } /* Common add function */ static inline void gen_op_arith_add(DisasContext *ctx, TCGv ret, TCGv arg1, -TCGv arg2, bool add_ca, bool compute_ca, +TCGv arg2, TCGv ca, TCGv ca32, +bool add_ca, bool compute_ca, bool compute_ov, bool compute_rc0) { TCGv t0 = ret; @@ -888,29 +889,29 @@ static inline void gen_op_arith_add(DisasContext *ctx, TCGv ret, TCGv arg1, tcg_gen_xor_tl(t1, arg1, arg2);/* add without carry */ tcg_gen_add_tl(t0, arg1, arg2); if (add_ca) { -tcg_gen_add_tl(t0, t0, cpu_ca); +tcg_gen_add_tl(t0, t0, ca); } -tcg_gen_xor_tl(cpu_ca, t0, t1);/* bits changed w/ carry */ +tcg_gen_xor_tl(ca, t0, t1);/* bits changed w/ carry */ tcg_temp_free(t1); -tcg_gen_extract_tl(cpu_ca, cpu_ca, 32, 1); +tcg_gen_extract_tl(ca, ca, 32, 1); if (is_isa300(ctx)) { -tcg_gen_mov_tl(cpu_ca32, cpu_ca); +tcg_gen_mov_tl(ca32, ca); } } else { TCGv zero = tcg_const_tl(0); if (add_ca) { -tcg_gen_add2_tl(t0, cpu_ca, arg1, zero, cpu_ca, zero); -tcg_gen_add2_tl(t0, cpu_ca, t0, cpu_ca, arg2, zero); +tcg_gen_add2_tl(t0, ca, arg1, zero, ca, zero); +tcg_gen_add2_tl(t0, ca, t0, ca, arg2, zero); } else { -tcg_gen_add2_tl(t0, cpu_ca, arg1, zero, arg2, zero); +tcg_gen_add2_tl(t0, ca, arg1, zero, arg2, zero); } -gen_op_arith_compute_ca32(ctx, t0, arg1, arg2, 0); +gen_op_arith_compute_ca32(ctx, t0, arg1, arg2, ca32, 0); tcg_temp_free(zero); } } else { tcg_gen_add_tl(t0, arg1, arg2); if (add_ca) { -tcg_gen_add_tl(t0, t0, cpu_ca); +tcg_gen_add_tl(t0, t0, ca); } } @@ -927,40 +928,44 @@ static inline
[Qemu-devel] [PULL 06/40] target/ppc: use g_new(T, n) instead of g_malloc(sizeof(T) * n)
From: Greg Kurz Because it is a recommended coding practice (see HACKING). Signed-off-by: Greg Kurz Reviewed-by: Philippe Mathieu-Daudé Signed-off-by: David Gibson --- target/ppc/translate_init.inc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c index 168d0cec28..03f1d34a97 100644 --- a/target/ppc/translate_init.inc.c +++ b/target/ppc/translate_init.inc.c @@ -9081,13 +9081,13 @@ static void init_ppc_proc(PowerPCCPU *cpu) nb_tlb *= 2; switch (env->tlb_type) { case TLB_6XX: -env->tlb.tlb6 = g_malloc0(nb_tlb * sizeof(ppc6xx_tlb_t)); +env->tlb.tlb6 = g_new0(ppc6xx_tlb_t, nb_tlb); break; case TLB_EMB: -env->tlb.tlbe = g_malloc0(nb_tlb * sizeof(ppcemb_tlb_t)); +env->tlb.tlbe = g_new0(ppcemb_tlb_t, nb_tlb); break; case TLB_MAS: -env->tlb.tlbm = g_malloc0(nb_tlb * sizeof(ppcmas_tlb_t)); +env->tlb.tlbm = g_new0(ppcmas_tlb_t, nb_tlb); break; } /* Pre-compute some useful values */ -- 2.19.2
[Qemu-devel] [PULL 03/40] spapr: Fix ibm, max-associativity-domains property number of nodes
From: Serhii Popovych Laurent Vivier reported off by one with maximum number of NUMA nodes provided by qemu-kvm being less by one than required according to description of "ibm,max-associativity-domains" property in LoPAPR. It appears that I incorrectly treated LoPAPR description of this property assuming it provides last valid domain (NUMA node here) instead of maximum number of domains. ### Before hot-add (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 0 MB node 0 plugged: 0 MB node 1 cpus: node 1 size: 1024 MB node 1 plugged: 0 MB node 2 cpus: node 2 size: 0 MB node 2 plugged: 0 MB $ numactl -H available: 2 nodes (0-1) node 0 cpus: 0 node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: node 1 size: 999 MB node 1 free: 658 MB node distances: node 0 1 0: 10 40 1: 40 10 ### Hot-add (qemu) object_add memory-backend-ram,id=mem0,size=1G (qemu) device_add pc-dimm,id=dimm1,memdev=mem0,node=2 (qemu) [ 87.704898] pseries-hotplug-mem: Attempting to hot-add 4 ... [ 87.705128] lpar: Attempting to resize HPT to shift 21 ... ### After hot-add (qemu) info numa 3 nodes node 0 cpus: 0 node 0 size: 0 MB node 0 plugged: 0 MB node 1 cpus: node 1 size: 1024 MB node 1 plugged: 0 MB node 2 cpus: node 2 size: 1024 MB node 2 plugged: 1024 MB $ numactl -H available: 2 nodes (0-1) Still only two nodes (and memory hot-added to node 0 below) node 0 cpus: 0 node 0 size: 1024 MB node 0 free: 1021 MB node 1 cpus: node 1 size: 999 MB node 1 free: 658 MB node distances: node 0 1 0: 10 40 1: 40 10 After fix applied numactl(8) reports 3 nodes available and memory plugged into node 2 as expected. >From David Gibson: -- Qemu makes a distinction between "non NUMA" (nb_numa_nodes == 0) and "NUMA with one node" (nb_numa_nodes == 1). But from a PAPR guests's point of view these are equivalent. I don't want to present two different cases to the guest when we don't need to, so even though the guest can handle it, I'd prefer we put a '1' here for both the nb_numa_nodes == 0 and nb_numa_nodes == 1 case. This consolidates everything discussed previously on mailing list. Fixes: da9f80fbad21 ("spapr: Add ibm,max-associativity-domains property") Reported-by: Laurent Vivier Signed-off-by: Serhii Popovych Signed-off-by: David Gibson Reviewed-by: Greg Kurz Reviewed-by: Laurent Vivier --- hw/ppc/spapr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 55be0f56cb..b423db311e 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1033,7 +1033,7 @@ static void spapr_dt_rtas(sPAPRMachineState *spapr, void *fdt) cpu_to_be32(0), cpu_to_be32(0), cpu_to_be32(0), -cpu_to_be32(nb_numa_nodes ? nb_numa_nodes - 1 : 0), +cpu_to_be32(nb_numa_nodes ? nb_numa_nodes : 1), }; _FDT(rtas = fdt_add_subnode(fdt, 0, "rtas")); -- 2.19.2
[Qemu-devel] [PULL 02/40] target/ppc: Remove silly GETFIELD/SETFIELD/MASK_TO_LSH macros
The (only) obvious use for these macros is constructing and parsing guest visible register fields. But the way they're constructed, they're only valid when used on a *host* long, whose size shouldn't be visible to the guest at all. They also have no current users, so just get rid of them. Signed-off-by: David Gibson --- target/ppc/cpu.h | 12 1 file changed, 12 deletions(-) diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h index 527181c0f0..d5f99f1fc7 100644 --- a/target/ppc/cpu.h +++ b/target/ppc/cpu.h @@ -78,18 +78,6 @@ PPC_BIT32(bs)) #define PPC_BITMASK8(bs, be)((PPC_BIT8(bs) - PPC_BIT8(be)) | PPC_BIT8(bs)) -#if HOST_LONG_BITS == 32 -# define MASK_TO_LSH(m) (__builtin_ffsll(m) - 1) -#elif HOST_LONG_BITS == 64 -# define MASK_TO_LSH(m) (__builtin_ffsl(m) - 1) -#else -# error Unknown sizeof long -#endif - -#define GETFIELD(m, v) (((v) & (m)) >> MASK_TO_LSH(m)) -#define SETFIELD(m, v, val) \ -(((v) & ~(m)) | typeof(v))(val)) << MASK_TO_LSH(m)) & (m))) - /*/ /* Exception vectors definitions */ enum { -- 2.19.2
[Qemu-devel] [PULL 01/40] target/ppc: fix the PPC_BIT definitions
From: Cédric Le Goater Change the PPC_BIT macro to use ULL instead of UL and the PPC_BIT32 and PPC_BIT8 not to use any suffix. This fixes a compile breakage on windows. Signed-off-by: Cédric Le Goater Signed-off-by: David Gibson --- target/ppc/cpu.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h index ab68abe8a2..527181c0f0 100644 --- a/target/ppc/cpu.h +++ b/target/ppc/cpu.h @@ -70,9 +70,9 @@ #define PPC_ELF_MACHINE EM_PPC #endif -#define PPC_BIT(bit)(0x8000UL >> (bit)) -#define PPC_BIT32(bit) (0x8000UL >> (bit)) -#define PPC_BIT8(bit) (0x80UL >> (bit)) +#define PPC_BIT(bit)(0x8000ULL >> (bit)) +#define PPC_BIT32(bit) (0x8000 >> (bit)) +#define PPC_BIT8(bit) (0x80 >> (bit)) #define PPC_BITMASK(bs, be) ((PPC_BIT(bs) - PPC_BIT(be)) | PPC_BIT(bs)) #define PPC_BITMASK32(bs, be) ((PPC_BIT32(bs) - PPC_BIT32(be)) | \ PPC_BIT32(bs)) -- 2.19.2
[Qemu-devel] [PULL 00/40] ppc-for-4.0 queue 20181221
The following changes since commit 95de6f4b92efea391a3cbb8651d774a4d3529861: Merge remote-tracking branch 'remotes/armbru/tags/pull-misc-2018-12-20' into staging (2018-12-20 18:54:47 +) are available in the Git repository at: git://github.com/dgibson/qemu.git tags/ppc-for-4.0-20181221 for you to fetch changes up to b62c6e1237fb5ca2563f7e72b66ac0c40ff7a714: MAINTAINERS: PPC: add a XIVE section (2018-12-21 09:40:43 +1100) ppc patch queue 2018-12-21 This pull request supersedes the one from 2018-12-13. This is a revised first ppc pull request for qemu-4.0. Highlights are: * Most of the code for the POWER9 "XIVE" interrupt controller (not complete yet, but we're getting there) * A number of g_new vs. g_malloc cleanups * Some IRQ wiring cleanups * A fix for how we advertise NUMA nodes to the guest for pseries Alexey Kardashevskiy (1): spapr-iommu: Always advertise the maximum possible DMA window size Cédric Le Goater (25): target/ppc: fix the PPC_BIT definitions ppc/xive: introduce a XIVE interrupt source model ppc/xive: add support for the LSI interrupt sources ppc/xive: introduce the XiveNotifier interface ppc/xive: introduce the XiveRouter model ppc/xive: introduce the XIVE Event Notification Descriptors spapr: initialize VSMT before initializing the IRQ backend spapr: introduce a spapr_irq_init() routine spapr: export and rename the xics_max_server_number() routine ppc/xive: add support for the END Event State Buffers ppc/xive: introduce the XIVE interrupt thread context ppc/xive: introduce a simplified XIVE presenter ppc/xive: notify the CPU when the interrupt priority is more privileged spapr/xive: introduce a XIVE interrupt controller spapr/xive: use the VCPU id as a NVT identifier spapr: introduce a new machine IRQ backend for XIVE spapr: add hcalls support for the XIVE exploitation interrupt mode spapr: add device tree support for the XIVE exploitation mode spapr: allocate the interrupt thread context under the CPU core spapr: extend the sPAPR IRQ backend for XICS migration spapr: add a 'reset' method to the sPAPR IRQ backend spapr: add an extra OV5 field to the sPAPR IRQ backend spapr: introduce an 'ic-mode' machine option spapr: change default CPU type to POWER9 MAINTAINERS: PPC: add a XIVE section David Gibson (1): target/ppc: Remove silly GETFIELD/SETFIELD/MASK_TO_LSH macros Greg Kurz (10): spapr: drop redundant statement in spapr_populate_drconf_memory() target/ppc: use g_new(T, n) instead of g_malloc(sizeof(T) * n) spapr: use g_new(T, n) instead of g_malloc(sizeof(T) * n) ppc405_boards: use g_new(T, n) instead of g_malloc(sizeof(T) * n) ppc405_uc: use g_new(T, n) instead of g_malloc(sizeof(T) * n) ppc440_bamboo: use g_new(T, n) instead of g_malloc(sizeof(T) * n) sam460ex: use g_new(T, n) instead of g_malloc(sizeof(T) * n) virtex_ml507: use g_new(T, n) instead of g_malloc(sizeof(T) * n) mac_newworld: simplify IRQ wiring e500: simplify IRQ wiring Paul A. Clarke (1): Changes requirement for "vsubsbs" instruction Serhii Popovych (1): spapr: Fix ibm,max-associativity-domains property number of nodes Suraj Jitindar Singh (1): target/ppc: tcg: Implement addex instruction MAINTAINERS|8 + default-configs/ppc64-softmmu.mak |2 + disas/ppc.c|2 + hw/intc/Makefile.objs |2 + hw/intc/spapr_xive.c | 1486 + hw/intc/xics_spapr.c |3 +- hw/intc/xive.c | 1599 hw/ppc/e500.c | 18 +- hw/ppc/mac_newworld.c | 30 +- hw/ppc/ppc405_boards.c |4 +- hw/ppc/ppc405_uc.c |4 +- hw/ppc/ppc440_bamboo.c |5 +- hw/ppc/sam460ex.c |2 +- hw/ppc/spapr.c | 121 ++- hw/ppc/spapr_cpu_core.c|4 +- hw/ppc/spapr_iommu.c |2 +- hw/ppc/spapr_irq.c | 194 - hw/ppc/spapr_rtas_ddw.c| 19 +- hw/ppc/spapr_vio.c |2 +- hw/ppc/virtex_ml507.c |2 +- include/hw/ppc/openpic.h |2 + include/hw/ppc/spapr.h | 25 +- include/hw/ppc/spapr_irq.h | 12 + include/hw/ppc/spapr_xive.h| 52 ++ include/hw/ppc/xics.h |4 +- include/hw/ppc/xive.h | 429 ++ include/hw/ppc/xive_regs.h | 235 ++ target/ppc/cpu.h | 18 +- target/ppc/translate.c | 60 +- target/ppc/translate/vmx-ops.inc.c |2 +-
Re: [Qemu-devel] [PATCH V7 6/6] hostmem-file: add 'sync' option
On 2018-12-20 at 09:06:41 -0500, Michael S. Tsirkin wrote: > On Thu, Dec 20, 2018 at 01:37:40PM +0800, Yi Zhang wrote: > > On 2018-12-19 at 22:42:07 -0500, Michael S. Tsirkin wrote: > > > On Thu, Dec 20, 2018 at 11:03:12AM +0800, Yi Zhang wrote: > > > > On 2018-12-19 at 10:59:10 -0500, Michael S. Tsirkin wrote: > > > > > On Wed, Dec 19, 2018 at 05:10:18PM +0800, Yi Zhang wrote: > > > > > > > > + > > > > > > > > + - 'sync' option of memory-backend-file is not 'off', and > > > > > > > > + > > > > > > > > + - 'share' option of memory-backend-file is 'on'. > > > > > > > > + > > > > > > > > + - 'pmem' option of memory-backend-file is 'on' > > > > > > > > + > > > > > > > > > > > > > > Wait isn't this what pmem was supposed to do? > > > > > > > Doesn't it mean "persistent memory"? > > > > > > pmem is a option for memory-backend-file, user should know the > > > > > > backend > > > > > > is in host persistent memory, with this flags on, while there is a > > > > > > host crash > > > > > > or a power failures. > > > > > > > > > > > > [1] Qemu will take necessary operations to guarantee the > > > > > > persistence. > > > > > > https://patchwork.ozlabs.org/cover/944749/ > > > > > > > > > > > > [2] Host kernel also take opretions to consistent filesystem > > > > > > metadata. > > > > > > Add MAP_SYNC flags. > > > > > > > > > > OK so I'm a user. Can you educate me please? > > > > We suppose an administrator should know it, what is the back-end region > > > > coming from, > > > > is it persistent? what is the font-end device is? a volatile dimm or an > > > > nonvolatile dimm? then they can choice put the pmem=on[off] and > > > > sync=on[off]. > > > > If he didn't, we encourage that don't set these 2 flags. > > > > > > > > > When should MAP_SYNC not > > > > > be set? Are there any disadvantages (e.g. performance?)? > > > > Not only the performance, sometimes like the front-end device is an > > > > volatile ram, we don't wanna set such option although the backend is a > > > > novolatile memory, if power lose, all of thing should lose in this ram. > > > > > > > > > > > > I am not sure how does above answer the questions. If I don't know, > > > neither will the hypothetical administrator. Looks like a better > > > interface is needed to make the choice on behalf of the user. > > > > > By default, we have already ignore the 2 flags, unless the administrator > > know how to make that use. By-now, seems we don't have a better way to > > detect what > > memory-backend-file is, a persistent memory or not. > > In that case how about an interface where user tells QEMU "this backend > is in persistent memory"? Actually, [pmem=on] already did this, we can get the backend memory type from: file_memory_backend_get_pmem(), That is already being used in the memory_region_init_ram_from_file. that is why I reuse the RAM_PMEM to identify the region coming from a persitent memory? correct me if something I misunderstood? > > > > > > > > > > > > > > > > -- > > > > > MST > > > > > > > > >
Re: [Qemu-devel] [PATCH v8 11/12] spapr: introduce a new sPAPR IRQ backend supporting XIVE and XICS
On Wed, Dec 19, 2018 at 08:15:36PM +0100, Cédric Le Goater wrote: > [ ... ] > > >>> +static qemu_irq spapr_qirq_dual(sPAPRMachineState *spapr, int irq) > >>> +{ > >>> +return spapr_irq_current(spapr)->qirq(spapr, irq); > >>> +} > >> > >> This still makes me really nervous - I'd really prefer to have qirqs > >> independent of the backend, rather than relying on *every* irq using > >> device never looking up qirqs in advance. > > > > I will take a look. This is a large rework I won't have time to address > > this year. I have removed the dual machine from v9. > > > > You would move the qirq array at the machine level ? > > I took a look today and did a few changes : > > - move the qirq array at the machine level > - introduced a 'set_irq' method to sPAPR IRQ > - adapted the 'qirq' method of sPAPR IRQ. We still need to perform some >checks and to handle the IRQ number offset. > > It falls well in place, a part for the ICS source of the PnvPSI model > which does not have any qirq anymore. For PSI, I am thinking of moving > the qirq array under PnvPSI model, like I did for the machine. > > Would that be ok ? That sounds reasonable. I'd been thinking of having a qirq array at the machine level which dispatched to other qirq arrays at the ICS or XiveSource levels, but if you don't need that, that's ok too. > > I think there are a couple more possible cleanups on the different ICS > models to do if these changes are acceptable. > > C. > -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
[Qemu-devel] travis failures
Hi I am trying https://travis-ci.org/aik/qemu/ and that thing fails every time I am not so sure why. One example: https://travis-ci.org/aik/qemu/jobs/470796318 The errors are like this: GTESTER check-qtest-unicore32 GTESTER check-qtest-x86_64 Could not access KVM kernel module: No such file or directory qemu-system-x86_64: failed to initialize KVM: No such file or directory qemu-system-x86_64: Back to tcg accelerator Does anyone else see those? How do we fix them? Thanks. -- Alexey
[Qemu-devel] is the "tcg translation" necessary when the "kvm acceleration" emulation mode enabled?
hi folks: i am very puzzled about the relationship between "target cpu instruction" translated to host instructions through TCG module and the "kvm" acceleration" mode. think about three scenario of emulation: scenario 1, 2 and 3 as follows: 1. target cpu: x86_64, host cpu: x84_64, emulation OS: ubuntu_desktop_amd64.iso kvm: enabled. 2. target cpu: x86_64, host cpu: x84_64, emulation OS: ubuntu_desktop_amd64.iso kvm: disabled. 3. target cpu x86_64, host cpu: armv8 emulationOS: ubuntu_desktop_amd64.iso kvm: enabled my question is: scenario 1: did the tcg translation need to be done in this case now that the host and target arch is the same? or let the kvm emulation the system wit the original instructions with out the TCG translation? scenario 2: the pre condition is same with scenario 1 except the kvm is disable? so ,in this scenario, the TCG must be used in order to the pure software emulation without acceleration? scenario 3: in this scenario, the host and target arch is not the same, so how to use the "kvm mechainsim" in this case? so the instructions feed to the kvm module to run must be translated By TCG module? right? thanks for your kindly support for my puzzle.
Re: [Qemu-devel] [PATCH for-4.0 v4 0/4] allow to load initrd below 4G for recent kernel
ping On 12/6/18 10:32, Li Zhijian wrote: Long long ago, linux kernel has supported up to 4G initrd, but it's header still hard code to allow loading initrd below 2G only. cutting from arch/x86/head.S: # (Header version 0x0203 or later) the highest safe address for the contents # of an initrd. The current kernel allows up to 4 GB, but leave it at 2 GB to # avoid possible bootloader bugs. In order to support more than 2G initrd, qemu must allow loading initrd above 2G address. Luckly, recent kernel introduced a new field to linux header named xloadflags:XLF_CAN_BE_LOADED_ABOVE_4G which tells bootload an optional and safe address to load initrd. Current QEMU/BIOS always loads initrd below below_4g_mem_size which always less than 4G, so here limiting initrd_max to 4G - 1 simply is enough if this bit is set. Default roms(Seabios + optionrom(linuxboot_dma)) works as expected with this patchset. changes: V4: - add Reviwed-by tag to 1/4 and 2/4 - use scripts/update-linux-headers.sh to import bootparam.h - minor fix at commit log V3: - rebase code basing on http://patchwork.ozlabs.org/cover/1005990 and https://patchew.org/QEMU/20181122133507.30950-1-peter.mayd...@linaro.org - add new patch 3/4 to import header bootparam.h (Michael S. Tsirkin) V2: add 2 patches(3/5, 4/5) to fix potential loading issue. Li Zhijian (4): unify len and addr type for memory/address APIs refactor load_image_size i386: import & use bootparam.h i386: allow to load initrd below 4G for recent linux exec.c | 47 ++-- hw/core/loader.c | 11 +++ hw/i386/pc.c | 18 ++- include/exec/cpu-all.h | 2 +- include/exec/cpu-common.h| 8 ++--- include/exec/memory.h| 22 ++--- include/standard-headers/asm-x86/bootparam.h | 34 scripts/update-linux-headers.sh | 4 +++ 8 files changed, 92 insertions(+), 54 deletions(-) create mode 100644 include/standard-headers/asm-x86/bootparam.h
[Qemu-devel] [PATCH 11/15] qdev: pass an Object * to qbus_set_hotplug_handler()
From: Michael Roth Certain devices types, like memory/CPU, are now being handled using a hotplug interface provided by a top-level MachineClass. Hotpluggable host bridges are another such device where it makes sense to use a machine-level hotplug handler. However, unlike those devices, host-bridges have a parent bus (the main system bus), and devices with a parent bus use a different mechanism for registering their hotplug handlers: qbus_set_hotplug_handler(). This interface currently expects a handler to be a subclass of DeviceClass, but this is not the case for MachineClass, which derives directly from ObjectClass. Internally, the interface only requires an ObjectClass, so expose that in qbus_set_hotplug_handler(). Cc: Michael S. Tsirkin Cc: Eduardo Habkost Signed-off-by: Michael Roth Signed-off-by: Greg Kurz Reviewed-by: David Gibson --- hw/acpi/piix4.c |2 +- hw/char/virtio-serial-bus.c |2 +- hw/core/bus.c | 11 ++- hw/pci/pcie.c |2 +- hw/pci/shpc.c |2 +- hw/ppc/spapr_pci.c|2 +- hw/s390x/css-bridge.c |2 +- hw/s390x/s390-pci-bus.c |6 +++--- hw/scsi/virtio-scsi.c |2 +- hw/scsi/vmw_pvscsi.c |2 +- hw/usb/dev-smartcard-reader.c |2 +- include/hw/qdev-core.h|3 +-- 12 files changed, 15 insertions(+), 23 deletions(-) diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c index e330f24c71e0..d72aba555c5c 100644 --- a/hw/acpi/piix4.c +++ b/hw/acpi/piix4.c @@ -441,7 +441,7 @@ static void piix4_update_bus_hotplug(PCIBus *pci_bus, void *opaque) /* pci_bus cannot outlive PIIX4PMState, because /machine keeps it alive * and it's not hot-unpluggable */ -qbus_set_hotplug_handler(BUS(pci_bus), DEVICE(s), _abort); +qbus_set_hotplug_handler(BUS(pci_bus), OBJECT(s), _abort); } static void piix4_pm_machine_ready(Notifier *n, void *opaque) diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c index 04e3ebe3526a..e4310c78f2dc 100644 --- a/hw/char/virtio-serial-bus.c +++ b/hw/char/virtio-serial-bus.c @@ -1052,7 +1052,7 @@ static void virtio_serial_device_realize(DeviceState *dev, Error **errp) /* Spawn a new virtio-serial bus on which the ports will ride as devices */ qbus_create_inplace(>bus, sizeof(vser->bus), TYPE_VIRTIO_SERIAL_BUS, dev, vdev->bus_name); -qbus_set_hotplug_handler(BUS(>bus), DEVICE(vser), errp); +qbus_set_hotplug_handler(BUS(>bus), OBJECT(vser), errp); vser->bus.vser = vser; QTAILQ_INIT(>ports); diff --git a/hw/core/bus.c b/hw/core/bus.c index 4651f244864c..e09843f6abea 100644 --- a/hw/core/bus.c +++ b/hw/core/bus.c @@ -22,22 +22,15 @@ #include "hw/qdev.h" #include "qapi/error.h" -static void qbus_set_hotplug_handler_internal(BusState *bus, Object *handler, - Error **errp) +void qbus_set_hotplug_handler(BusState *bus, Object *handler, Error **errp) { - object_property_set_link(OBJECT(bus), OBJECT(handler), QDEV_HOTPLUG_HANDLER_PROPERTY, errp); } -void qbus_set_hotplug_handler(BusState *bus, DeviceState *handler, Error **errp) -{ -qbus_set_hotplug_handler_internal(bus, OBJECT(handler), errp); -} - void qbus_set_bus_hotplug_handler(BusState *bus, Error **errp) { -qbus_set_hotplug_handler_internal(bus, OBJECT(bus), errp); +qbus_set_hotplug_handler(bus, OBJECT(bus), errp); } int qbus_walk_children(BusState *bus, diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index 6c91bd44a0a5..d74b121e8b6e 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -444,7 +444,7 @@ void pcie_cap_slot_init(PCIDevice *dev, uint16_t slot) dev->exp.hpev_notified = false; qbus_set_hotplug_handler(BUS(pci_bridge_get_sec_bus(PCI_BRIDGE(dev))), - DEVICE(dev), NULL); + OBJECT(dev), NULL); } void pcie_cap_slot_reset(PCIDevice *dev) diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c index 96a43d2f709a..377aedeb27be 100644 --- a/hw/pci/shpc.c +++ b/hw/pci/shpc.c @@ -639,7 +639,7 @@ int shpc_init(PCIDevice *d, PCIBus *sec_bus, MemoryRegion *bar, shpc_cap_update_dword(d); memory_region_add_subregion(bar, offset, >mmio); -qbus_set_hotplug_handler(BUS(sec_bus), DEVICE(d), NULL); +qbus_set_hotplug_handler(BUS(sec_bus), OBJECT(d), NULL); d->cap_present |= QEMU_PCI_CAP_SHPC; return 0; diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index b772b72d6a48..292dd95cbef9 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -1723,7 +1723,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) >memspace, >iospace, PCI_DEVFN(0, 0), PCI_NUM_PINS, TYPE_PCI_BUS); phb->bus = bus; -qbus_set_hotplug_handler(BUS(phb->bus), DEVICE(sphb), NULL); +qbus_set_hotplug_handler(BUS(phb->bus), OBJECT(sphb),
[Qemu-devel] [PATCH 12/15] spapr_pci: provide node start offset via spapr_populate_pci_dt()
From: Michael Roth PHB hotplug re-uses PHB device tree generation code and passes it to a guest via RTAS. Doing this requires knowledge of where exactly in the device tree the node describing the PHB begins. Provide this via a new optional pointer that can be used to store the PHB node's start offset. Signed-off-by: Michael Roth Reviewed-by: David Gibson Signed-off-by: Greg Kurz --- hw/ppc/spapr.c |2 +- hw/ppc/spapr_pci.c |5 - include/hw/pci-host/spapr.h |2 +- 3 files changed, 6 insertions(+), 3 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 4f9d11b14666..5c405a5fafca 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1297,7 +1297,7 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr, QLIST_FOREACH(phb, >phbs, list) { ret = spapr_populate_pci_dt(phb, PHANDLE_XICP, fdt, -spapr->irq->nr_msis); +spapr->irq->nr_msis, NULL); if (ret < 0) { error_report("couldn't setup PCI devices in fdt"); exit(1); diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index 292dd95cbef9..5e7b40a8c910 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -2140,7 +2140,7 @@ static void spapr_phb_pci_enumerate(sPAPRPHBState *phb) } int spapr_populate_pci_dt(sPAPRPHBState *phb, uint32_t xics_phandle, void *fdt, - uint32_t nr_msis) + uint32_t nr_msis, int *node_offset) { int bus_off, i, j, ret; gchar *nodename; @@ -2195,6 +2195,9 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb, uint32_t xics_phandle, void *fdt, nodename = g_strdup_printf("pci@%" PRIx64, phb->buid); _FDT(bus_off = fdt_add_subnode(fdt, 0, nodename)); g_free(nodename); +if (node_offset) { +*node_offset = bus_off; +} /* Write PHB properties */ _FDT(fdt_setprop_string(fdt, bus_off, "device_type", "pci")); diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h index 83d5075a6ef3..6ef8f141ea92 100644 --- a/include/hw/pci-host/spapr.h +++ b/include/hw/pci-host/spapr.h @@ -114,7 +114,7 @@ static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin) } int spapr_populate_pci_dt(sPAPRPHBState *phb, uint32_t xics_phandle, void *fdt, - uint32_t nr_msis); + uint32_t nr_msis, int *node_offset); void spapr_pci_rtas_init(void);
[Qemu-devel] [PATCH 10/15] spapr_events: add support for phb hotplug events
From: Michael Roth Extend the existing EPOW event format we use for PCI devices to emit PHB plug/unplug events. Signed-off-by: Michael Roth Reviewed-by: David Gibson Signed-off-by: Greg Kurz --- hw/ppc/spapr_events.c |3 +++ 1 file changed, 3 insertions(+) diff --git a/hw/ppc/spapr_events.c b/hw/ppc/spapr_events.c index 32719a1b72d0..baf30c55710a 100644 --- a/hw/ppc/spapr_events.c +++ b/hw/ppc/spapr_events.c @@ -526,6 +526,9 @@ static void spapr_hotplug_req_event(uint8_t hp_id, uint8_t hp_action, case SPAPR_DR_CONNECTOR_TYPE_CPU: hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_CPU; break; +case SPAPR_DR_CONNECTOR_TYPE_PHB: +hp->hotplug_type = RTAS_LOG_V6_HP_TYPE_PHB; +break; default: /* we shouldn't be signaling hotplug events for resources * that don't support them
[Qemu-devel] [PATCH 08/15] spapr: create DR connectors for PHBs
From: Michael Roth Signed-off-by: Michael Roth Reviewed-by: David Gibson Signed-off-by: Greg Kurz --- hw/ppc/spapr.c | 13 + hw/ppc/spapr_drc.c | 17 + include/hw/ppc/spapr_drc.h |8 3 files changed, 38 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index fe3f9829ae6c..280e45037704 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2790,6 +2790,19 @@ static void spapr_machine_init(MachineState *machine) /* We always have at least the nvram device on VIO */ spapr_create_nvram(spapr); +/* + * Setup hotplug / dynamic-reconfiguration connectors. top-level + * connectors (described in root DT node's "ibm,drc-types" property) + * are pre-initialized here. additional child connectors (such as + * connectors for a PHBs PCI slots) are added as needed during their + * parent's realization. + */ +if (smc->dr_phb_enabled) { +for (i = 0; i < SPAPR_MAX_PHBS; i++) { +spapr_dr_connector_new(OBJECT(machine), TYPE_SPAPR_DRC_PHB, i); +} +} + /* Set up PCI */ spapr_pci_rtas_init(); diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c index 2edb7d1e9c8c..189ee681062a 100644 --- a/hw/ppc/spapr_drc.c +++ b/hw/ppc/spapr_drc.c @@ -696,6 +696,15 @@ static void spapr_drc_lmb_class_init(ObjectClass *k, void *data) drck->release = spapr_lmb_release; } +static void spapr_drc_phb_class_init(ObjectClass *k, void *data) +{ +sPAPRDRConnectorClass *drck = SPAPR_DR_CONNECTOR_CLASS(k); + +drck->typeshift = SPAPR_DR_CONNECTOR_TYPE_SHIFT_PHB; +drck->typename = "PHB"; +drck->drc_name_prefix = "PHB "; +} + static const TypeInfo spapr_dr_connector_info = { .name = TYPE_SPAPR_DR_CONNECTOR, .parent= TYPE_DEVICE, @@ -739,6 +748,13 @@ static const TypeInfo spapr_drc_lmb_info = { .class_init= spapr_drc_lmb_class_init, }; +static const TypeInfo spapr_drc_phb_info = { +.name = TYPE_SPAPR_DRC_PHB, +.parent= TYPE_SPAPR_DRC_LOGICAL, +.instance_size = sizeof(sPAPRDRConnector), +.class_init= spapr_drc_phb_class_init, +}; + /* helper functions for external users */ sPAPRDRConnector *spapr_drc_by_index(uint32_t index) @@ -1189,6 +1205,7 @@ static void spapr_drc_register_types(void) type_register_static(_drc_cpu_info); type_register_static(_drc_pci_info); type_register_static(_drc_lmb_info); +type_register_static(_drc_phb_info); spapr_rtas_register(RTAS_SET_INDICATOR, "set-indicator", rtas_set_indicator); diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h index f6ff32e7e2f2..56bba36ad4da 100644 --- a/include/hw/ppc/spapr_drc.h +++ b/include/hw/ppc/spapr_drc.h @@ -70,6 +70,14 @@ #define SPAPR_DRC_LMB(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \ TYPE_SPAPR_DRC_LMB) +#define TYPE_SPAPR_DRC_PHB "spapr-drc-phb" +#define SPAPR_DRC_PHB_GET_CLASS(obj) \ +OBJECT_GET_CLASS(sPAPRDRConnectorClass, obj, TYPE_SPAPR_DRC_PHB) +#define SPAPR_DRC_PHB_CLASS(klass) \ +OBJECT_CLASS_CHECK(sPAPRDRConnectorClass, klass, TYPE_SPAPR_DRC_PHB) +#define SPAPR_DRC_PHB(obj) OBJECT_CHECK(sPAPRDRConnector, (obj), \ +TYPE_SPAPR_DRC_PHB) + /* * Various hotplug types managed by sPAPRDRConnector *
[Qemu-devel] [PATCH 13/15] spapr_pci: add ibm, my-drc-index property for PHB hotplug
From: Michael Roth This is needed to denote a boot-time PHB as being hot-pluggable. Signed-off-by: Michael Roth Reviewed-by: David Gibson Signed-off-by: Greg Kurz --- hw/ppc/spapr_pci.c |9 + 1 file changed, 9 insertions(+) diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index 5e7b40a8c910..688cca83ef2f 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -2190,6 +2190,7 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb, uint32_t xics_phandle, void *fdt, sPAPRTCETable *tcet; PCIBus *bus = PCI_HOST_BRIDGE(phb)->bus; sPAPRFDT s_fdt; +sPAPRDRConnector *drc; /* Start populating the FDT */ nodename = g_strdup_printf("pci@%" PRIx64, phb->buid); @@ -2256,6 +2257,14 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb, uint32_t xics_phandle, void *fdt, tcet->liobn, tcet->bus_offset, tcet->nb_table << tcet->page_shift); +drc = spapr_drc_by_id(TYPE_SPAPR_DRC_PHB, phb->index); +if (drc) { +uint32_t drc_index = cpu_to_be32(spapr_drc_index(drc)); + +_FDT(fdt_setprop(fdt, bus_off, "ibm,my-drc-index", _index, + sizeof(drc_index))); +} + /* Walk the bridges and program the bus numbers*/ spapr_phb_pci_enumerate(phb); _FDT(fdt_setprop_cell(fdt, bus_off, "qemu,phb-enumerated", 0x1));
[Qemu-devel] [PATCH 07/15] spapr_pci: Define SPAPR_MAX_PHBS in hw/pci-host/spapr.h
PHB hotplug will bring more users for it. Let's define it along with the PHB defines from which it is derived for simplicity. While here fix a misleading comment about manual placement, which was abandoned with 30b3bc5aa9f4. Signed-off-by: Greg Kurz --- hw/ppc/spapr.c |2 -- include/hw/pci-host/spapr.h |6 -- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 621006eaa862..fe3f9829ae6c 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -3838,8 +3838,6 @@ static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t index, * 1TiB 64-bit MMIO windows for each PHB. */ const uint64_t base_buid = 0x8002000ULL; -#define SPAPR_MAX_PHBS ((SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / \ -SPAPR_PCI_MEM64_WIN_SIZE - 1) int i; /* Sanity check natural alignments */ diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h index 9d2ec1a410e8..83d5075a6ef3 100644 --- a/include/hw/pci-host/spapr.h +++ b/include/hw/pci-host/spapr.h @@ -94,11 +94,13 @@ struct sPAPRPHBState { ((1ULL << 32) - SPAPR_PCI_MEM_WIN_BUS_OFFSET) #define SPAPR_PCI_MEM64_WIN_SIZE 0x100ULL /* 1 TiB */ -/* Without manual configuration, all PCI outbound windows will be - * within this range */ +/* All PCI outbound windows will be within this range */ #define SPAPR_PCI_BASE (1ULL << 45) /* 32 TiB */ #define SPAPR_PCI_LIMIT (1ULL << 46) /* 64 TiB */ +#define SPAPR_MAX_PHBS ((SPAPR_PCI_LIMIT - SPAPR_PCI_BASE) / \ +SPAPR_PCI_MEM64_WIN_SIZE - 1) + #define SPAPR_PCI_2_7_MMIO_WIN_SIZE 0xf8000 #define SPAPR_PCI_IO_WIN_SIZE0x1
[Qemu-devel] [PATCH 03/15] pci: allow cleanup/unregistration of PCI root buses
From: Michael Roth This adds cleanup counterparts to pci_register_root_bus(), pci_root_bus_new(), and pci_bus_irqs(). These cleanup routines are needed in the case of hotpluggable PCIHostBridge implementations. Currently we can rely on the object_unparent()'ing of the PCIHostState recursively unparenting and cleaning up it's child buses, but we need explicit calls to also: 1) remove the PCIHostState from pci_host_bridges global list. otherwise, we risk accessing freed memory when we access the list later 2) clean up memory allocated in pci_bus_irqs() Both are handled outside the context of any particular bus or host bridge's init/realize functions, making it difficult to avoid the need for explicit cleanup functions without remodeling how PCIHostBridges are created. So keep it simple and just add them for now. Cc: Michael S. Tsirkin Cc: Paolo Bonzini Signed-off-by: Michael Roth Reviewed-by: David Gibson Signed-off-by: Greg Kurz --- hw/pci/pci.c | 33 + include/hw/pci/pci.h |3 +++ 2 files changed, 36 insertions(+) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index efb5ce196ffb..16354f91206c 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -333,6 +333,13 @@ static void pci_host_bus_register(DeviceState *host) QLIST_INSERT_HEAD(_host_bridges, host_bridge, next); } +static void pci_host_bus_unregister(DeviceState *host) +{ +PCIHostState *host_bridge = PCI_HOST_BRIDGE(host); + +QLIST_REMOVE(host_bridge, next); +} + PCIBus *pci_device_root_bus(const PCIDevice *d) { PCIBus *bus = pci_get_bus(d); @@ -379,6 +386,11 @@ static void pci_root_bus_init(PCIBus *bus, DeviceState *parent, pci_host_bus_register(parent); } +static void pci_bus_uninit(PCIBus *bus) +{ +pci_host_bus_unregister(BUS(bus)->parent); +} + bool pci_bus_is_express(PCIBus *bus) { return object_dynamic_cast(OBJECT(bus), TYPE_PCIE_BUS); @@ -413,6 +425,12 @@ PCIBus *pci_root_bus_new(DeviceState *parent, const char *name, return bus; } +void pci_root_bus_cleanup(PCIBus *bus) +{ +pci_bus_uninit(bus); +object_unparent(OBJECT(bus)); +} + void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, pci_map_irq_fn map_irq, void *irq_opaque, int nirq) { @@ -423,6 +441,15 @@ void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, pci_map_irq_fn map_irq, bus->irq_count = g_malloc0(nirq * sizeof(bus->irq_count[0])); } +void pci_bus_irqs_cleanup(PCIBus *bus) +{ +bus->set_irq = NULL; +bus->map_irq = NULL; +bus->irq_opaque = NULL; +bus->nirq = 0; +g_free(bus->irq_count); +} + PCIBus *pci_register_root_bus(DeviceState *parent, const char *name, pci_set_irq_fn set_irq, pci_map_irq_fn map_irq, void *irq_opaque, @@ -439,6 +466,12 @@ PCIBus *pci_register_root_bus(DeviceState *parent, const char *name, return bus; } +void pci_unregister_root_bus(PCIBus *bus) +{ +pci_bus_irqs_cleanup(bus); +pci_root_bus_cleanup(bus); +} + int pci_bus_num(PCIBus *s) { return PCI_BUS_GET_CLASS(s)->bus_num(s); diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h index e6514bba23aa..8998e3be3390 100644 --- a/include/hw/pci/pci.h +++ b/include/hw/pci/pci.h @@ -405,8 +405,10 @@ PCIBus *pci_root_bus_new(DeviceState *parent, const char *name, MemoryRegion *address_space_mem, MemoryRegion *address_space_io, uint8_t devfn_min, const char *typename); +void pci_root_bus_cleanup(PCIBus *bus); void pci_bus_irqs(PCIBus *bus, pci_set_irq_fn set_irq, pci_map_irq_fn map_irq, void *irq_opaque, int nirq); +void pci_bus_irqs_cleanup(PCIBus *bus); int pci_bus_get_irq_level(PCIBus *bus, int irq_num); /* 0 <= pin <= 3 0 = INTA, 1 = INTB, 2 = INTC, 3 = INTD */ int pci_swizzle_map_irq_fn(PCIDevice *pci_dev, int pin); @@ -417,6 +419,7 @@ PCIBus *pci_register_root_bus(DeviceState *parent, const char *name, MemoryRegion *address_space_io, uint8_t devfn_min, int nirq, const char *typename); +void pci_unregister_root_bus(PCIBus *bus); void pci_bus_set_route_irq_fn(PCIBus *, pci_route_irq_fn); PCIINTxRoute pci_device_route_intx_to_irq(PCIDevice *dev, int pin); bool pci_intx_route_changed(PCIINTxRoute *old, PCIINTxRoute *new);
[Qemu-devel] [PATCH 09/15] spapr: populate PHB DRC entries for root DT node
From: Nathan Fontenot This add entries to the root OF node to advertise our PHBs as being DR-capable in accordance with PAPR specification. Signed-off-by: Nathan Fontenot Signed-off-by: Michael Roth Reviewed-by: David Gibson Signed-off-by: Greg Kurz --- hw/ppc/spapr.c |8 1 file changed, 8 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 280e45037704..4f9d11b14666 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1350,6 +1350,14 @@ static void *spapr_build_fdt(sPAPRMachineState *spapr, exit(1); } +if (smc->dr_phb_enabled) { +ret = spapr_drc_populate_dt(fdt, 0, NULL, SPAPR_DR_CONNECTOR_TYPE_PHB); +if (ret < 0) { +error_report("Couldn't set up PHB DR device tree properties"); +exit(1); +} +} + return fdt; }
[Qemu-devel] [PATCH 04/15] spapr_pci: add proper rollback on PHB realize error path
The current realize code assumes the PHB is coldplugged, ie, QEMU will terminate if an error is detected, and does not bother to free anything it has already allocated. In order to support PHB hotplug, let's first ensure spapr_phb_realize() doesn't leak anything in case of error. Signed-off-by: Greg Kurz --- hw/ppc/spapr_pci.c | 40 +++- 1 file changed, 35 insertions(+), 5 deletions(-) diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index e59adbe706bb..46d7062dd143 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -1570,6 +1570,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) sPAPRTCETable *tcet; const unsigned windows_supported = sphb->ddw_enabled ? SPAPR_PCI_DMA_MAX_WINDOWS : 1; +Object *drcs[PCI_SLOT_MAX * 8]; if (!spapr) { error_setg(errp, TYPE_SPAPR_PCI_HOST_BRIDGE " needs a pseries machine"); @@ -1733,7 +1734,10 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) spapr_irq_claim(spapr, irq, true, _err); if (local_err) { error_propagate_prepend(errp, local_err, "can't allocate LSIs: "); -return; +while (--i >= 0) { +spapr_irq_free(spapr, sphb->lsi_table[i].irq, 1); +} +goto fail_del_msiwindow; } sphb->lsi_table[i].irq = irq; @@ -1741,9 +1745,10 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) /* allocate connectors for child PCI devices */ if (sphb->dr_enabled) { -for (i = 0; i < PCI_SLOT_MAX * 8; i++) { -spapr_dr_connector_new(OBJECT(phb), TYPE_SPAPR_DRC_PCI, - (sphb->index << 16) | i); +for (i = 0; i < ARRAY_SIZE(drcs); i++) { +drcs[i] = +OBJECT(spapr_dr_connector_new(OBJECT(phb), TYPE_SPAPR_DRC_PCI, + (sphb->index << 16) | i)); } } @@ -1753,13 +1758,38 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) if (!tcet) { error_setg(errp, "Creating window#%d failed for %s", i, sphb->dtbusname); -return; +while (--i >= 0) { +tcet = spapr_tce_find_by_liobn(sphb->dma_liobn[i]); +assert(tcet); +memory_region_del_subregion(>iommu_root, +spapr_tce_get_iommu(tcet)); +object_unparent(OBJECT(tcet)); +} +goto fail_free_drcs; } memory_region_add_subregion(>iommu_root, 0, spapr_tce_get_iommu(tcet)); } sphb->msi = g_hash_table_new_full(g_int_hash, g_int_equal, g_free, g_free); +return; + +fail_free_drcs: +if (sphb->dr_enabled) { +for (i = 0; i < ARRAY_SIZE(drcs); i++) { +object_unparent(drcs[i]); +} +} +fail_del_msiwindow: +memory_region_del_subregion(>iommu_root, >msiwindow); +address_space_destroy(>iommu_as); +qbus_set_hotplug_handler(BUS(phb->bus), NULL, _abort); +pci_unregister_root_bus(phb->bus); +memory_region_del_subregion(get_system_memory(), >iowindow); +if (sphb->mem64_win_pciaddr != (hwaddr)-1) { +memory_region_del_subregion(get_system_memory(), >mem64window); +} +memory_region_del_subregion(get_system_memory(), >mem32window); } static int spapr_phb_children_reset(Object *child, void *opaque)
[Qemu-devel] [PATCH 02/15] spapr: move spapr_create_phb() to core machine code
This function is only used when creating the default PHB. Let's rename it and move it to the core machine code for clarity. Signed-off-by: Greg Kurz Reviewed-by: Alexey Kardashevskiy Reviewed-by: David Gibson --- hw/ppc/spapr.c | 13 - hw/ppc/spapr_pci.c | 11 --- include/hw/pci-host/spapr.h |2 -- 3 files changed, 12 insertions(+), 14 deletions(-) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 8ea680fcde1e..1f17b5d01f4f 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2551,6 +2551,17 @@ static void spapr_init_cpus(sPAPRMachineState *spapr) } } +static PCIHostState *spapr_create_default_phb(void) +{ +DeviceState *dev; + +dev = qdev_create(NULL, TYPE_SPAPR_PCI_HOST_BRIDGE); +qdev_prop_set_uint32(dev, "index", 0); +qdev_init_nofail(dev); + +return PCI_HOST_BRIDGE(dev); +} + /* pSeries LPAR / sPAPR hardware init */ static void spapr_machine_init(MachineState *machine) { @@ -2782,7 +2793,7 @@ static void spapr_machine_init(MachineState *machine) /* Set up PCI */ spapr_pci_rtas_init(); -phb = spapr_create_phb(spapr, 0); +phb = spapr_create_default_phb(); for (i = 0; i < nb_nics; i++) { NICInfo *nd = _table[i]; diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index 2374d55fc112..e59adbe706bb 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -1979,17 +1979,6 @@ static const TypeInfo spapr_phb_info = { } }; -PCIHostState *spapr_create_phb(sPAPRMachineState *spapr, int index) -{ -DeviceState *dev; - -dev = qdev_create(NULL, TYPE_SPAPR_PCI_HOST_BRIDGE); -qdev_prop_set_uint32(dev, "index", index); -qdev_init_nofail(dev); - -return PCI_HOST_BRIDGE(dev); -} - typedef struct sPAPRFDT { void *fdt; int node_off; diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h index 7c66c3872f96..a65cfef16945 100644 --- a/include/hw/pci-host/spapr.h +++ b/include/hw/pci-host/spapr.h @@ -111,8 +111,6 @@ static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, int pin) return spapr_qirq(spapr, phb->lsi_table[pin].irq); } -PCIHostState *spapr_create_phb(sPAPRMachineState *spapr, int index); - int spapr_populate_pci_dt(sPAPRPHBState *phb, uint32_t xics_phandle, void *fdt, uint32_t nr_msis);
[Qemu-devel] [PATCH 06/15] spapr: enable PHB hotplug for default pseries machine type
From: Michael Roth The 'dr_phb_enabled' field of that class can be set as part of machine-specific init code. It will be used to conditionally enable creation of DRC objects and device-tree description to facilitate hotplug of PHBs. Since we can't migrate this state to older machine types, default the option to true and disable it for older machine types. Signed-off-by: Michael Roth Signed-off-by: Greg Kurz --- hw/ppc/spapr.c |2 ++ include/hw/ppc/spapr.h |1 + 2 files changed, 3 insertions(+) diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 1f17b5d01f4f..621006eaa862 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -4011,6 +4011,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) smc->default_caps.caps[SPAPR_CAP_NESTED_KVM_HV] = SPAPR_CAP_OFF; spapr_caps_add_properties(smc, _abort); smc->irq = _irq_xics; +smc->dr_phb_enabled = true; } static const TypeInfo spapr_machine_info = { @@ -4079,6 +4080,7 @@ static void spapr_machine_3_1_class_options(MachineClass *mc) SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_3_1); mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0"); smc->update_dt_enabled = false; +smc->dr_phb_enabled = false; } DEFINE_SPAPR_MACHINE(3_1, "3.1", false); diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h index 36033b89d31a..e96deefa30de 100644 --- a/include/hw/ppc/spapr.h +++ b/include/hw/ppc/spapr.h @@ -104,6 +104,7 @@ struct sPAPRMachineClass { /*< public >*/ bool dr_lmb_enabled; /* enable dynamic-reconfig/hotplug of LMBs */ bool update_dt_enabled;/* enable KVMPPC_H_UPDATE_DT */ +bool dr_phb_enabled; /* enable dynamic-reconfig/hotplug of PHBs */ bool use_ohci_by_default; /* use USB-OHCI instead of XHCI */ bool pre_2_10_has_unused_icps; bool legacy_irq_allocation;
[Qemu-devel] [PATCH 05/15] spapr_pci: add PHB unrealize
From: Michael Roth To support PHB hotplug we need to clean up lingering references, memory, child properties, etc. prior to the PHB object being finalized. Generally this will be called as a result of calling object_unparent() on the PHB object, which in turn would normally be called as the result of an unplug() operation. When the PHB is finalized, child objects will be unparented in turn, and finalized if the PHB was the only reference holder. so we don't bother to explicitly unparent child objects of the PHB (spapr_iommu, spapr_drc, etc). The formula that gives the number of DMA windows is moved to an inline function in the hw/pci-host/spapr.h header because it will have other users. Signed-off-by: Michael Roth Signed-off-by: Greg Kurz --- hw/ppc/spapr_pci.c | 56 +-- include/hw/pci-host/spapr.h |4 +++ 2 files changed, 58 insertions(+), 2 deletions(-) diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index 46d7062dd143..b772b72d6a48 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -1551,6 +1551,57 @@ static void spapr_pci_unplug_request(HotplugHandler *plug_handler, } } +static void spapr_phb_finalizefn(Object *obj) +{ +sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(obj); + +g_free(sphb->dtbusname); +sphb->dtbusname = NULL; +} + +static void spapr_phb_unrealize(DeviceState *dev, Error **errp) +{ +sPAPRMachineState *spapr = SPAPR_MACHINE(qdev_get_machine()); +SysBusDevice *s = SYS_BUS_DEVICE(dev); +PCIHostState *phb = PCI_HOST_BRIDGE(s); +sPAPRPHBState *sphb = SPAPR_PCI_HOST_BRIDGE(phb); +sPAPRTCETable *tcet; +int i; +const unsigned windows_supported = spapr_phb_windows_supported(sphb); + +g_hash_table_unref(sphb->msi); + +/* + * Remove IO/MMIO subregions and aliases, rest should get cleaned + * via PHB's unrealize->object_finalize + */ +for (i = windows_supported - 1; i >= 0; i--) { +tcet = spapr_tce_find_by_liobn(sphb->dma_liobn[i]); +assert(tcet); +memory_region_del_subregion(>iommu_root, +spapr_tce_get_iommu(tcet)); +} + +for (i = PCI_NUM_PINS - 1; i >= 0; i--) { +spapr_irq_free(spapr, sphb->lsi_table[i].irq, 1); +} + +QLIST_REMOVE(sphb, list); + +memory_region_del_subregion(>iommu_root, >msiwindow); + +address_space_destroy(>iommu_as); + +qbus_set_hotplug_handler(BUS(phb->bus), NULL, _abort); +pci_unregister_root_bus(phb->bus); + +memory_region_del_subregion(get_system_memory(), >iowindow); +if (sphb->mem64_win_pciaddr != (hwaddr)-1) { +memory_region_del_subregion(get_system_memory(), >mem64window); +} +memory_region_del_subregion(get_system_memory(), >mem32window); +} + static void spapr_phb_realize(DeviceState *dev, Error **errp) { /* We don't use SPAPR_MACHINE() in order to exit gracefully if the user @@ -1568,8 +1619,7 @@ static void spapr_phb_realize(DeviceState *dev, Error **errp) PCIBus *bus; uint64_t msi_window_size = 4096; sPAPRTCETable *tcet; -const unsigned windows_supported = -sphb->ddw_enabled ? SPAPR_PCI_DMA_MAX_WINDOWS : 1; +const unsigned windows_supported = spapr_phb_windows_supported(sphb); Object *drcs[PCI_SLOT_MAX * 8]; if (!spapr) { @@ -1988,6 +2038,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data) hc->root_bus_path = spapr_phb_root_bus_path; dc->realize = spapr_phb_realize; +dc->unrealize = spapr_phb_unrealize; dc->props = spapr_phb_properties; dc->reset = spapr_phb_reset; dc->vmsd = _spapr_pci; @@ -2002,6 +2053,7 @@ static const TypeInfo spapr_phb_info = { .name = TYPE_SPAPR_PCI_HOST_BRIDGE, .parent= TYPE_PCI_HOST_BRIDGE, .instance_size = sizeof(sPAPRPHBState), +.instance_finalize = spapr_phb_finalizefn, .class_init= spapr_phb_class_init, .interfaces= (InterfaceInfo[]) { { TYPE_HOTPLUG_HANDLER }, diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h index a65cfef16945..9d2ec1a410e8 100644 --- a/include/hw/pci-host/spapr.h +++ b/include/hw/pci-host/spapr.h @@ -162,4 +162,8 @@ static inline void spapr_phb_vfio_reset(DeviceState *qdev) void spapr_phb_dma_reset(sPAPRPHBState *sphb); +static inline unsigned spapr_phb_windows_supported(sPAPRPHBState *sphb) +{ +return sphb->ddw_enabled ? SPAPR_PCI_DMA_MAX_WINDOWS : 1; +} #endif /* PCI_HOST_SPAPR_H */
[Qemu-devel] [PATCH 01/15] ppc/spapr: Receive and store device tree blob from SLOF
From: Alexey Kardashevskiy SLOF receives a device tree and updates it with various properties before switching to the guest kernel and QEMU is not aware of any changes made by SLOF. Since there is no real RTAS (QEMU implements it), it makes sense to pass the SLOF final device tree to QEMU to let it implement RTAS related tasks better, such as PCI host bus adapter hotplug. Specifially, now QEMU can find out the actual XICS phandle (for PHB hotplug) and the RTAS linux,rtas-entry/base properties (for firmware assisted NMI - FWNMI). This stores the initial DT blob in the sPAPR machine and replaces it in the KVMPPC_H_UPDATE_DT (new private hypercall) handler. This adds an @update_dt_enabled machine property to allow backward migration. SLOF already has a hypercall since https://github.com/aik/SLOF/commit/e6fc84652c9c0073f9183 This makes use of the new fdt_check_full() helper. In order to allow the configure script to pick the correct DTC version, this adjusts the DTC presense test. Signed-off-by: Alexey Kardashevskiy Reviewed-by: Greg Kurz Signed-off-by: David Gibson Signed-off-by: Greg Kurz --- configure |2 +- hw/ppc/spapr.c | 43 ++- hw/ppc/spapr_hcall.c | 42 ++ hw/ppc/trace-events|3 +++ include/hw/ppc/spapr.h |7 ++- 5 files changed, 94 insertions(+), 3 deletions(-) diff --git a/configure b/configure index 224d3071ac61..baeeabc29f56 100755 --- a/configure +++ b/configure @@ -3916,7 +3916,7 @@ if test "$fdt" != "no" ; then cat > $TMPC << EOF #include #include -int main(void) { fdt_first_subnode(0, 0); return 0; } +int main(void) { fdt_check_full(NULL, 0); return 0; } EOF if compile_prog "" "$fdt_libs" ; then # system DTC is good - use it diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 17ad84396b31..8ea680fcde1e 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -1668,7 +1668,10 @@ static void spapr_machine_reset(void) /* Load the fdt */ qemu_fdt_dumpdtb(fdt, fdt_totalsize(fdt)); cpu_physical_memory_write(fdt_addr, fdt, fdt_totalsize(fdt)); -g_free(fdt); +g_free(spapr->fdt_blob); +spapr->fdt_size = fdt_totalsize(fdt); +spapr->fdt_initial_size = spapr->fdt_size; +spapr->fdt_blob = fdt; /* Set up the entry state */ spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, fdt_addr); @@ -1919,6 +1922,39 @@ static const VMStateDescription vmstate_spapr_irq_map = { }, }; +static bool spapr_dtb_needed(void *opaque) +{ +sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(opaque); + +return smc->update_dt_enabled; +} + +static int spapr_dtb_pre_load(void *opaque) +{ +sPAPRMachineState *spapr = (sPAPRMachineState *)opaque; + +g_free(spapr->fdt_blob); +spapr->fdt_blob = NULL; +spapr->fdt_size = 0; + +return 0; +} + +static const VMStateDescription vmstate_spapr_dtb = { +.name = "spapr_dtb", +.version_id = 1, +.minimum_version_id = 1, +.needed = spapr_dtb_needed, +.pre_load = spapr_dtb_pre_load, +.fields = (VMStateField[]) { +VMSTATE_UINT32(fdt_initial_size, sPAPRMachineState), +VMSTATE_UINT32(fdt_size, sPAPRMachineState), +VMSTATE_VBUFFER_ALLOC_UINT32(fdt_blob, sPAPRMachineState, 0, NULL, + fdt_size), +VMSTATE_END_OF_LIST() +}, +}; + static const VMStateDescription vmstate_spapr = { .name = "spapr", .version_id = 3, @@ -1948,6 +1984,7 @@ static const VMStateDescription vmstate_spapr = { _spapr_cap_ibs, _spapr_irq_map, _spapr_cap_nested_kvm_hv, +_spapr_dtb, NULL } }; @@ -3929,6 +3966,7 @@ static void spapr_machine_class_init(ObjectClass *oc, void *data) hc->unplug = spapr_machine_device_unplug; smc->dr_lmb_enabled = true; +smc->update_dt_enabled = true; mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power9_v2.0"); mc->has_hotpluggable_cpus = true; smc->resize_hpt_default = SPAPR_RESIZE_HPT_ENABLED; @@ -4024,9 +4062,12 @@ DEFINE_SPAPR_MACHINE(4_0, "4.0", true); static void spapr_machine_3_1_class_options(MachineClass *mc) { +sPAPRMachineClass *smc = SPAPR_MACHINE_CLASS(mc); + spapr_machine_4_0_class_options(mc); SET_MACHINE_COMPAT(mc, SPAPR_COMPAT_3_1); mc->default_cpu_type = POWERPC_CPU_TYPE_NAME("power8_v2.0"); +smc->update_dt_enabled = false; } DEFINE_SPAPR_MACHINE(3_1, "3.1", false); diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c index ae913d070f50..78fecc8fe906 100644 --- a/hw/ppc/spapr_hcall.c +++ b/hw/ppc/spapr_hcall.c @@ -1717,6 +1717,46 @@ static target_ulong h_get_cpu_characteristics(PowerPCCPU *cpu, args[0] = characteristics; args[1] = behaviour; +return H_SUCCESS; +} + +static target_ulong h_update_dt(PowerPCCPU *cpu, sPAPRMachineState *spapr, +target_ulong opcode, target_ulong *args) +{ +
[Qemu-devel] [PATCH 00/15] spapr: Add support for PHB hotplug
Previous work on PHB hotplug was last posted more than one year ago: https://lists.gnu.org/archive/html/qemu-devel/2017-07/msg07906.html Quite a few significant changes happened since then: - fixed PHB indexes - fixed IRQ numbers for LSIs - SLOF capable of updating the FDT in QEMU - XIVE First step in this new series is to teach QEMU how to get the FDT from SLOF thanks to the recent patch from Alexey, rebased against David's ppc-for-4.0 branch (SHA1: 11ce774130e7). Most of the other patches come from the previous version with minor modifications, but I guess even the ones with Reviewed-by tags deserve to be reviewed again in case I've missed something. Finally, the XIVE and XICS backends are changed to expose the name of the interrupt controller node in the device tree. The machine code can then exploit this to reach out to its phandle property, in case it got changed by SLOF. This is needed to wire up interrupts during hotplug. This was only lightly tested at the moment. I'll post about that later. Please comment. Cheers, -- Greg --- Alexey Kardashevskiy (1): ppc/spapr: Receive and store device tree blob from SLOF Greg Kurz (4): spapr: move spapr_create_phb() to core machine code spapr_pci: add proper rollback on PHB realize error path spapr_pci: Define SPAPR_MAX_PHBS in hw/pci-host/spapr.h spapr: Expose the name of the interrupt controller node Michael Roth (9): pci: allow cleanup/unregistration of PCI buses spapr_pci: add PHB unrealize spapr: enable PHB hotplug for default pseries machine type spapr: create DR connectors for PHBs spapr_events: add support for phb hotplug events qdev: pass an Object * to qbus_set_hotplug_handler() spapr_pci: provide node start offset via spapr_populate_pci_dt() spapr_pci: add ibm, my-drc-index property for PHB hotplug spapr: add hotplug hooks for PHB hotplug Nathan Fontenot (1): spapr: populate PHB DRC entries for root DT node configure |2 hw/acpi/piix4.c |2 hw/char/virtio-serial-bus.c |2 hw/core/bus.c | 11 -- hw/intc/spapr_xive.c |9 +- hw/intc/xics_spapr.c |9 +- hw/pci/pci.c | 33 ++ hw/pci/pcie.c |2 hw/pci/shpc.c |2 hw/ppc/spapr.c| 230 - hw/ppc/spapr_drc.c| 18 +++ hw/ppc/spapr_events.c |3 + hw/ppc/spapr_hcall.c | 42 +++ hw/ppc/spapr_irq.c|3 + hw/ppc/spapr_pci.c| 139 +++-- hw/ppc/trace-events |3 + hw/s390x/css-bridge.c |2 hw/s390x/s390-pci-bus.c |6 + hw/scsi/virtio-scsi.c |2 hw/scsi/vmw_pvscsi.c |2 hw/usb/dev-smartcard-reader.c |2 include/hw/pci-host/spapr.h | 14 ++ include/hw/pci/pci.h |3 + include/hw/ppc/spapr.h|9 +- include/hw/ppc/spapr_drc.h|8 + include/hw/ppc/spapr_irq.h|1 include/hw/ppc/spapr_xive.h |1 include/hw/ppc/xics.h |1 include/hw/qdev-core.h|3 - 29 files changed, 491 insertions(+), 73 deletions(-)
Re: [Qemu-devel] did the qemu can emulate the whole system with the processor that without support the "virtulization and kvm"?
Hi alex: now my host machine did not have the "/dev/kvm" nodes and no chance to make it exists. so i want to know whether the "/dev/kvm" is mandatory for qemu to emulate the whole system. so i can make the emulation without the support of kvm mechanism. thanks for your support.! At 2018-12-20 20:31:09, "Alex Bennée" wrote: > >tugouxp <13824125...@163.com> writes: > >> hi folks: >> did the qemu can emulate the whole system(such as ubuntu) with >> the processor that without support the "virtulization and kvm"? > >I don't quite follow your question. However if you are asking about the >cross-architecture emulation (usually called TCG) the features of the >processor depend on how complete the front-end is. > >So you can for example boot an aarch64 system under emulation that can >itself run KVM guests (because we implement EL0/1/2/3). However you >the system can't currently use VHE extensions because the support for >that has yet to be added. > >What is your use-case? > >-- >Alex Bennée
Re: [Qemu-devel] [PATCH] linux-user: Add safe_syscall for riscv64 host
On Thu, Dec 20, 2018 at 12:15 PM Richard Henderson wrote: > > Signed-off-by: Richard Henderson Reviewed-by: Alistair Francis Alistair > --- > > At some point we should make this routine be non-optional for > porting to a new host. > > > r~ > > --- > linux-user/host/riscv64/hostdep.h | 23 +++ > linux-user/host/riscv64/safe-syscall.inc.S | 77 ++ > 2 files changed, 100 insertions(+) > create mode 100644 linux-user/host/riscv64/safe-syscall.inc.S > > diff --git a/linux-user/host/riscv64/hostdep.h > b/linux-user/host/riscv64/hostdep.h > index 28467ba00b..865f0fb9ff 100644 > --- a/linux-user/host/riscv64/hostdep.h > +++ b/linux-user/host/riscv64/hostdep.h > @@ -8,4 +8,27 @@ > #ifndef RISCV64_HOSTDEP_H > #define RISCV64_HOSTDEP_H > > +/* We have a safe-syscall.inc.S */ > +#define HAVE_SAFE_SYSCALL > + > +#ifndef __ASSEMBLER__ > + > +/* These are defined by the safe-syscall.inc.S file */ > +extern char safe_syscall_start[]; > +extern char safe_syscall_end[]; > + > +/* Adjust the signal context to rewind out of safe-syscall if we're in it */ > +static inline void rewind_if_in_safe_syscall(void *puc) > +{ > +ucontext_t *uc = puc; > +unsigned long *pcreg = >uc_mcontext.__gregs[REG_PC]; > + > +if (*pcreg > (uintptr_t)safe_syscall_start > +&& *pcreg < (uintptr_t)safe_syscall_end) { > +*pcreg = (uintptr_t)safe_syscall_start; > +} > +} > + > +#endif /* __ASSEMBLER__ */ > + > #endif > diff --git a/linux-user/host/riscv64/safe-syscall.inc.S > b/linux-user/host/riscv64/safe-syscall.inc.S > new file mode 100644 > index 00..9ca3fbfd1e > --- /dev/null > +++ b/linux-user/host/riscv64/safe-syscall.inc.S > @@ -0,0 +1,77 @@ > +/* > + * safe-syscall.inc.S : host-specific assembly fragment > + * to handle signals occurring at the same time as system calls. > + * This is intended to be included by linux-user/safe-syscall.S > + * > + * Written by Richard Henderson > + * Copyright (C) 2018 Linaro, Inc. > + * > + * This work is licensed under the terms of the GNU GPL, version 2 or later. > + * See the COPYING file in the top-level directory. > + */ > + > + .global safe_syscall_base > + .global safe_syscall_start > + .global safe_syscall_end > + .type safe_syscall_base, @function > + .type safe_syscall_start, @function > + .type safe_syscall_end, @function > + > + /* > +* This is the entry point for making a system call. The calling > +* convention here is that of a C varargs function with the > +* first argument an 'int *' to the signal_pending flag, the > +* second one the system call number (as a 'long'), and all further > +* arguments being syscall arguments (also 'long'). > +* We return a long which is the syscall's return value, which > +* may be negative-errno on failure. Conversion to the > +* -1-and-errno-set convention is done by the calling wrapper. > +*/ > +safe_syscall_base: > + .cfi_startproc > + /* > +* The syscall calling convention is nearly the same as C: > +* we enter with a0 == *signal_pending > +* a1 == syscall number > +* a2 ... a7 == syscall arguments > +* and return the result in a0 > +* and the syscall instruction needs > +* a7 == syscall number > +* a0 ... a5 == syscall arguments > +* and returns the result in a0 > +* Shuffle everything around appropriately. > +*/ > + mv t0, a0 /* signal_pending pointer */ > + mv t1, a1 /* syscall number */ > + mv a0, a2 /* syscall arguments */ > + mv a1, a3 > + mv a2, a4 > + mv a3, a5 > + mv a4, a6 > + mv a5, a7 > + mv a7, t1 > + > + /* > +* This next sequence of code works in conjunction with the > +* rewind_if_safe_syscall_function(). If a signal is taken > +* and the interrupted PC is anywhere between 'safe_syscall_start' > +* and 'safe_syscall_end' then we rewind it to 'safe_syscall_start'. > +* The code sequence must therefore be able to cope with this, and > +* the syscall instruction must be the final one in the sequence. > +*/ > +safe_syscall_start: > + /* If signal_pending is non-zero, don't do the call */ > + lw t1, 0(t0) > + bnezt1, 0f > + scall > +safe_syscall_end: > + /* code path for having successfully executed the syscall */ > + ret > + > +0: > + /* code path when we didn't execute the syscall */ > + li a0, -TARGET_ERESTARTSYS > + ret > + .cfi_endproc > + > + .size safe_syscall_base, .-safe_syscall_base > -- > 2.17.2 > >
Re: [Qemu-devel] [PATCH v5 09/11] iotests: change qmp_log filters to expect QMP objects only
On 12/20/18 6:21 AM, Vladimir Sementsov-Ogievskiy wrote: > 20.12.2018 5:29, John Snow wrote: >> As laid out in the previous commit's message: >> >> ``` >> Several places in iotests deal with serializing objects into JSON >> strings, but to add pretty-printing it seems desireable to localize >> all of those cases. >> >> log() seems like a good candidate for that centralized behavior. >> log() can already serialize json objects, but when it does so, >> it assumes filters=[] operates on QMP objects, not strings. >> >> qmp_log currently operates by dumping outgoing and incoming QMP >> objects into strings and filtering them assuming that filters=[] >> are string filters. >> ``` >> >> Therefore: >> >> Change qmp_log to treat filters as if they're always qmp object filters, >> then change the logging call to rely on log()'s ability to serialize QMP >> objects, so we're not duplicating that effort. >> >> Add a qmp version of filter_testfiles and adjust the only caller using >> it for qmp_log to use the qmp version. >> >> Signed-off-by: John Snow >> Signed-off-by: John Snow >> --- >> tests/qemu-iotests/206| 4 ++-- >> tests/qemu-iotests/iotests.py | 24 +--- >> 2 files changed, 23 insertions(+), 5 deletions(-) >> >> diff --git a/tests/qemu-iotests/206 b/tests/qemu-iotests/206 >> index e92550fa59..5bb738bf23 100755 >> --- a/tests/qemu-iotests/206 >> +++ b/tests/qemu-iotests/206 >> @@ -27,7 +27,7 @@ iotests.verify_image_format(supported_fmts=['qcow2']) >> >> def blockdev_create(vm, options): >> result = vm.qmp_log('blockdev-create', >> -filters=[iotests.filter_testfiles], >> +filters=[iotests.filter_qmp_testfiles], >> job_id='job0', options=options) >> >> if 'return' in result: >> @@ -55,7 +55,7 @@ with iotests.FilePath('t.qcow2') as disk_path, \ >> 'size': 0 }) >> >> vm.qmp_log('blockdev-add', >> - filters=[iotests.filter_testfiles], >> + filters=[iotests.filter_qmp_testfiles], >> driver='file', filename=disk_path, >> node_name='imgfile') >> >> diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py >> index 57fe20db45..dcd0c6f71d 100644 >> --- a/tests/qemu-iotests/iotests.py >> +++ b/tests/qemu-iotests/iotests.py >> @@ -246,10 +246,29 @@ def filter_qmp_event(event): >> event['timestamp']['microseconds'] = 'USECS' >> return event >> >> +def filter_qmp(qmsg, filter_fn): >> +'''Given a string filter, filter a QMP object's values. >> +filter_fn takes a (key, value) pair.''' > > hm, I decided to look into PEP8, which in turn refers to PEP257, > which asks: > - For consistency, always use """triple double quotes""" around docstring > - Unless the entire docstring fits on a line, place the closing quotes on a > line by themselves > > Unfortunately, iotests.py prefers to be in opposition.. And consistency > within the file is more > important. May be we'll fix it one day.. > > >> +for key in qmsg: > > and here again we can benefit (or all right-value qmsg[key]) of using > qmsg.items() > >> +if isinstance(qmsg[key], list): >> +qmsg[key] = [filter_qmp(atom, filter_fn) for atom in qmsg[key]] > > hmm, stop. filter_qmp() assumes that its argument is dict. but atom may not > be dict. > Oh, good catch. Will fix. > so, to fit into the concept of fn(key, value) filtering function, we should > do something like > this: > > for i in len(qmsg[key]): >if isinstance(qmsg[key], dict): > qmsg[key][i] = filter_qmp(qmsg[key][i], filter_fn) > > qmsg[key] = filter_fn(key, qmsg[key]) > > --- > or, we may want to apply filter_fn only to lists of non-dicts, and filter > only list of dicts, > assuming that we don't have mixed lists. > > >> +elif isinstance(qmsg[key], dict): >> +qmsg[key] = filter_qmp(qmsg[key], filter_fn) >> +else: >> +qmsg[key] = filter_fn(key, qmsg[key]) > +return qmsg >> + >> def filter_testfiles(msg): >> prefix = os.path.join(test_dir, "%s-" % (os.getpid())) >> return msg.replace(prefix, 'TEST_DIR/PID-') >> >> +def filter_qmp_testfiles(qmsg): >> +def _filter(key, value): >> +if key == 'filename' or key == 'backing-file': >> +return filter_testfiles(value) >> +return value >> +return filter_qmp(qmsg, _filter) >> + >> def filter_generated_node_ids(msg): >> return re.sub("#block[0-9]+", "NODE_NAME", msg) >> >> @@ -465,10 +484,9 @@ class VM(qtest.QEMUQtestMachine): >> ("execute", cmd), >> ("arguments", ordered_kwargs(kwargs)) >> )) >> -logmsg = json.dumps(full_cmd) >> -log(logmsg, filters) >> +log(full_cmd, filters) >> result = self.qmp(cmd, **kwargs) >> -log(json.dumps(result, sort_keys=True), filters) >> +
Re: [Qemu-devel] [PATCH v5 09/11] iotests: change qmp_log filters to expect QMP objects only
On 12/19/18 9:53 PM, Eric Blake wrote: > On 12/19/18 8:29 PM, John Snow wrote: >> As laid out in the previous commit's message: >> >> ``` >> Several places in iotests deal with serializing objects into JSON >> strings, but to add pretty-printing it seems desireable to localize > > s/desireable/desirable/ > >> all of those cases. >> >> log() seems like a good candidate for that centralized behavior. >> log() can already serialize json objects, but when it does so, >> it assumes filters=[] operates on QMP objects, not strings. >> >> qmp_log currently operates by dumping outgoing and incoming QMP >> objects into strings and filtering them assuming that filters=[] >> are string filters. >> ``` >> >> Therefore: >> >> Change qmp_log to treat filters as if they're always qmp object filters, >> then change the logging call to rely on log()'s ability to serialize QMP >> objects, so we're not duplicating that effort. >> >> Add a qmp version of filter_testfiles and adjust the only caller using >> it for qmp_log to use the qmp version. >> >> Signed-off-by: John Snow >> Signed-off-by: John Snow > > Odd double S-o-B differing only by space. I fixed my auto-signer! It has rudely detected my typo and decided that it needed a fresh SOB. > >> --- >> tests/qemu-iotests/206 | 4 ++-- >> tests/qemu-iotests/iotests.py | 24 +--- >> 2 files changed, 23 insertions(+), 5 deletions(-) >> > > Reviewed-by: Eric Blake > Thanks!
Re: [Qemu-devel] [PATCH v6 07/28] compat: replace PC_COMPAT_3_0 & HW_COMPAT_3_0 macros
On Fri, Dec 21, 2018 at 12:08 AM Eduardo Habkost wrote: > > On Thu, Dec 13, 2018 at 01:48:29AM +0400, Marc-André Lureau wrote: > > Use static arrays instead. > > > > Suggested-by: Eduardo Habkost > > Signed-off-by: Marc-André Lureau > > In case you need to respin the series: I suggest squashing > patches 07-19 together. I don't mind, but it would be quite harder to review if it was squashed. Up to the maintainer I would say. thanks
Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.
On Wed, Dec 19, 2018 at 11:40:37AM +0100, Igor Mammedov wrote: > On Wed, 19 Dec 2018 10:57:17 +0800 > Yu Zhang wrote: > > > On Tue, Dec 18, 2018 at 03:55:36PM +0100, Igor Mammedov wrote: > > > On Tue, 18 Dec 2018 17:27:23 +0800 > > > Yu Zhang wrote: > > > > > > > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > > > > On Wed, 12 Dec 2018 21:05:38 +0800 > > > > > Yu Zhang wrote: > > > > > > > > > > > Currently, vIOMMU is using the value of IOVA address width, instead > > > > > > of > > > > > > the host address width(HAW) to calculate the number of reserved > > > > > > bits in > > > > > > data structures such as root entries, context entries, and entries > > > > > > of > > > > > > DMA paging structures etc. > > > > > > > > > > > > However values of IOVA address width and of the HAW may not equal. > > > > > > For > > > > > > example, a 48-bit IOVA can only be mapped to host addresses no > > > > > > wider than > > > > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may > > > > > > result > > > > > > in an invalid IOVA being accepted. > > > > > > > > > > > > To fix this, a new field - haw_bits is introduced in struct > > > > > > IntelIOMMUState, > > > > > > whose value is initialized based on the maximum physical address > > > > > > set to > > > > > > guest CPU. > > > > > > > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > > > > to clarify. > > > > > > > > > > > > Signed-off-by: Yu Zhang > > > > > > Reviewed-by: Peter Xu > > > > > > --- > > > > > [...] > > > > > > > > > > > @@ -3100,6 +3104,8 @@ static void > > > > > > vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n) > > > > > > static void vtd_init(IntelIOMMUState *s) > > > > > > { > > > > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > > > > +CPUState *cs = first_cpu; > > > > > > +X86CPU *cpu = X86_CPU(cs); > > > > > > > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > > > > +if (s->aw_bits == VTD_AW_48BIT) { > > > > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > > > > } > > > > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > > > > +s->haw_bits = cpu->phys_bits; > > > > > Is it possible to avoid accessing CPU fields directly or cpu > > > > > altogether > > > > > and set phys_bits when iommu is created? > > > > > > > > Thanks for your comments, Igor. > > > > > > > > Well, I guess you prefer not to query the CPU capabilities while > > > > deciding > > > > the vIOMMU features. But to me, they are not that irrelevant.:) > > > > > > > > Here the hardware address width in vt-d, and the one in > > > > cpuid.MAXPHYSADDR > > > > are referring to the same concept. In VM, both are the maximum guest > > > > physical > > > > address width. If we do not check the CPU field here, we will still > > > > have to > > > > check the CPU field in other places such as build_dmar_q35(), and reset > > > > the > > > > s->haw_bits again. > > > > > > > > Is this explanation convincing enough? :) > > > current build_dmar_q35() doesn't do it, it's all new code in this series > > > that > > > contains not acceptable direct access from one device (iommu) to another > > > (cpu). > > > Proper way would be for the owner of iommu to fish limits from somewhere > > > and set > > > values during iommu creation. > > > > Well, current build_dmar_q35() doesn't do it, because it is using the > > incorrect value. :) > > According to the spec, the host address width is the maximum physical > > address width, > > yet current implementation is using the DMA address width. For me, this is > > not only > > wrong, but also unsecure. For this point, I think we all agree this need to > > be fixed. > > > > As to how to fix it - should we query the cpu fields, I still do not > > understand why > > this is not acceptable. :) > > > > I had thought of other approaches before, yet I did not choose: > > > > 1> Introduce a new parameter, say, "x-haw-bits" which is used for iommu to > > limit its > > physical address width(similar to the "x-aw-bits" for IOVA). But what > > should we check > > this parameter or not? What if this parameter is set to sth. different than > > the "phys-bits" > > or not? > > > > 2> Another choice I had thought of is, to query the physical iommu. I > > abandoned this > > idea because my understanding is that vIOMMU is not a passthrued device, it > > is emulated. > > > So Igor, may I ask why you think checking against the cpu fields so not > > acceptable? :) > Because accessing private
Re: [Qemu-devel] [PATCH] linux-user: Add safe_syscall for riscv64 host
On 12/20/18 12:40 PM, Peter Maydell wrote: > On Thu, 20 Dec 2018 at 20:16, Richard Henderson > wrote: >> >> Signed-off-by: Richard Henderson >> --- >> >> At some point we should make this routine be non-optional for >> porting to a new host. > > Yes, I agree -- how many hosts do we still have which are > missing support for it? Ignoring tci, mips (3 abis), ppc32, sparc (2 abis). r~
Re: [Qemu-devel] [PATCH v5 03/11] blockdev: n-ary bitmap merge
On 12/19/18 9:48 PM, Eric Blake wrote: > On 12/19/18 8:29 PM, John Snow wrote: >> Especially outside of transactions, it is helpful to provide >> all-or-nothing semantics for bitmap merges. This facilitates >> the coalescing of multiple bitmaps into a single target for >> the "checkpoint" interpretation when assembling bitmaps that >> represent arbitrary points in time from component bitmaps. >> >> This is an incompatible change from the preliminary version >> of the API. > > but that doesn't matter because it was in the x- namespace, and we're > about to rename it anyway. > Yes, just an "FYI". >> >> Signed-off-by: John Snow >> --- >> blockdev.c | 75 ++-- >> qapi/block-core.json | 22 ++--- >> 2 files changed, 62 insertions(+), 35 deletions(-) >> > >> +static BdrvDirtyBitmap *do_block_dirty_bitmap_merge(const char *node, >> + const char *target, >> + strList *bitmaps, >> + HBitmap **backup, >> + Error **errp) >> { > >> - bdrv_merge_dirty_bitmap(dst, src, NULL, errp); >> + for (lst = bitmaps; lst; lst = lst->next) { >> + src = bdrv_find_dirty_bitmap(bs, lst->value); >> + if (!src) { >> + error_setg(errp, "Dirty bitmap '%s' not found", lst->value); >> + dst = NULL; >> + goto out; >> + } >> + >> + bdrv_merge_dirty_bitmap(anon, src, NULL, _err); >> + if (local_err) { >> + error_propagate(errp, local_err); >> + dst = NULL; >> + goto out; >> + } >> + } > > Appears to be a silent no-op when given "bitmaps":[] as the source. An > alternative would be requiring at least one source in the list, but I > don't see it as worth changing the patch to special-case an empty list > differently from a no-op. > >> @@ -1943,23 +1943,23 @@ >> ## >> # @x-block-dirty-bitmap-merge: >> # >> -# FIXME: Rename @src_name and @dst_name to src-name and dst-name. >> -# >> -# Merge @src_name dirty bitmap to @dst_name dirty bitmap. @src_name >> dirty >> -# bitmap is unchanged. On error, @dst_name is unchanged. >> +# Merge dirty bitmaps listed in @bitmaps to the @target dirty bitmap. >> +# The @bitmaps dirty bitmaps are unchanged. >> +# On error, @target is unchanged. >> # >> # Returns: nothing on success >> # If @node is not a valid block device, DeviceNotFound >> -# If @dst_name or @src_name is not found, GenericError >> -# If bitmaps has different sizes or granularities, GenericError >> +# If any bitmap in @bitmaps or @target is not found, >> GenericError >> +# If any of the bitmaps have different sizes or granularities, >> +# GenericError >> # >> # Since: 3.0 > > Could do s/3.0/4.0/ to match the incompatible change here, but you do it > in the later patch where your remove the x-. > > Reviewed-by: Eric Blake > Yeah, I think I'll just leave it this way, so all the version graduations are in the same patch. Thank you!
Re: [Qemu-devel] [PATCH v3 1/2] intel-iommu: differentiate host address width from IOVA address width.
On Tue, Dec 18, 2018 at 05:27:23PM +0800, Yu Zhang wrote: > On Mon, Dec 17, 2018 at 02:17:40PM +0100, Igor Mammedov wrote: > > On Wed, 12 Dec 2018 21:05:38 +0800 > > Yu Zhang wrote: > > > > > Currently, vIOMMU is using the value of IOVA address width, instead of > > > the host address width(HAW) to calculate the number of reserved bits in > > > data structures such as root entries, context entries, and entries of > > > DMA paging structures etc. > > > > > > However values of IOVA address width and of the HAW may not equal. For > > > example, a 48-bit IOVA can only be mapped to host addresses no wider than > > > 46 bits. Using 48, instead of 46 to calculate the reserved bit may result > > > in an invalid IOVA being accepted. > > > > > > To fix this, a new field - haw_bits is introduced in struct > > > IntelIOMMUState, > > > whose value is initialized based on the maximum physical address set to > > > guest CPU. > > > > > Also, definitions such as VTD_HOST_AW_39/48BIT etc. are renamed > > > to clarify. > > > > > > Signed-off-by: Yu Zhang > > > Reviewed-by: Peter Xu > > > --- > > [...] > > > > > @@ -3100,6 +3104,8 @@ static void vtd_iommu_replay(IOMMUMemoryRegion > > > *iommu_mr, IOMMUNotifier *n) > > > static void vtd_init(IntelIOMMUState *s) > > > { > > > X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s); > > > +CPUState *cs = first_cpu; > > > +X86CPU *cpu = X86_CPU(cs); > > > > > > memset(s->csr, 0, DMAR_REG_SIZE); > > > memset(s->wmask, 0, DMAR_REG_SIZE); > > > @@ -3119,23 +3125,24 @@ static void vtd_init(IntelIOMMUState *s) > > > s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND | > > > VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS | > > > VTD_CAP_SAGAW_39bit | VTD_CAP_MGAW(s->aw_bits); > > > -if (s->aw_bits == VTD_HOST_AW_48BIT) { > > > +if (s->aw_bits == VTD_AW_48BIT) { > > > s->cap |= VTD_CAP_SAGAW_48bit; > > > } > > > s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO; > > > +s->haw_bits = cpu->phys_bits; > > Is it possible to avoid accessing CPU fields directly or cpu altogether > > and set phys_bits when iommu is created? > > Thanks for your comments, Igor. > > Well, I guess you prefer not to query the CPU capabilities while deciding > the vIOMMU features. But to me, they are not that irrelevant.:) > > Here the hardware address width in vt-d, and the one in cpuid.MAXPHYSADDR > are referring to the same concept. In VM, both are the maximum guest physical > address width. If we do not check the CPU field here, we will still have to > check the CPU field in other places such as build_dmar_q35(), and reset the > s->haw_bits again. > > Is this explanation convincing enough? :) > > > > > Perhaps Eduardo > > can suggest better approach, since he's more familiar with phys_bits topic > > @Eduardo, any comments? Thanks! Configuring IOMMU phys-bits automatically depending on the configured CPU is OK, but accessing first_cpu directly in iommu code is. I suggest delegating this to the machine object, e.g.: uint32_t pc_max_phys_bits(PCMachineState *pcms) { return object_property_get_uint(OBJECT(first_cpu), "phys-bits", _abort); } as the machine itself is responsible for creating the CPU objects, and I believe there are other places in PC code where we do physical address calculations that could be affected by the physical address space size. -- Eduardo
Re: [Qemu-devel] [PATCH v5 11/11] iotests: add iotest 236 for testing bitmap merge
On 12/19/18 10:02 PM, Eric Blake wrote: > On 12/19/18 8:29 PM, John Snow wrote: >> New interface, new smoke test. >> >> Signed-off-by: John Snow >> --- >> tests/qemu-iotests/236 | 161 + >> tests/qemu-iotests/236.out | 351 + >> tests/qemu-iotests/group | 1 + >> 3 files changed, 513 insertions(+) >> create mode 100755 tests/qemu-iotests/236 >> create mode 100644 tests/qemu-iotests/236.out >> > > Reviewed-by: Eric Blake > > (and glad that my insistence on beefing up the test has caught bugs) > Me too. Sorry to have been lazy about it. Your suggestions are always worth following. --js
Re: [Qemu-devel] [PATCH v5 11/11] iotests: add iotest 236 for testing bitmap merge
On 12/20/18 7:12 AM, Vladimir Sementsov-Ogievskiy wrote: > 20.12.2018 5:29, John Snow wrote: >> New interface, new smoke test. >> >> Signed-off-by: John Snow >> --- > > [...] > >> +# A: 7 clusters >> +# B: 4 clusters >> +# C: 6 clusters >> +log(query_bitmaps(vm), indent=2) >> + >> +log('\n--- Submitting Bad Merge ---\n') > > aha, spent some time, trying to understand, what is bad with merge, until > understand that > that is abort. I didn't sleep enough last night, but anyway, 'Aborting Merge > Transaction' > is a bit clearer, I think. > Sure, I'll rephrase it: "Submitting & Aborting Merge Transaction" >> +vm.qmp_log("transaction", indent=2, actions=[ >> +{ "type": "block-dirty-bitmap-add", >> + "data": { "node": "drive0", "name": "bitmapD", >> +"disabled": True, "granularity": granularity }}, >> +{ "type": "block-dirty-bitmap-merge", >> + "data": { "node": "drive0", "target": "bitmapD", >> +"bitmaps": ["bitmapB", "bitmapC"] }}, >> +{ "type": "abort", "data": {}} >> +]) >> +log(query_bitmaps(vm), indent=2) >> + >> +log('\n--- Creating D as a merge of B & C ---\n') >> +# Good hygiene: create a disabled bitmap as a merge target. >> +vm.qmp_log("transaction", indent=2, actions=[ >> +{ "type": "block-dirty-bitmap-add", >> + "data": { "node": "drive0", "name": "bitmapD", >> +"disabled": True, "granularity": granularity }}, >> +{ "type": "block-dirty-bitmap-merge", >> + "data": { "node": "drive0", "target": "bitmapD", >> +"bitmaps": ["bitmapB", "bitmapC"] }} >> +]) >> + >> +# A and D should now both have 7 clusters apiece. >> +# B and C remain unchanged with 4 and 6 respectively. >> +log(query_bitmaps(vm), indent=2) >> + >> +# A and D should be equivalent. >> +# Some formats round the size of the disk, so don't print the checksums. > > Just interested: round 64M? to what? > VPC does weird stuff. If you ask for 64M you get 64M+16K. "round" is maybe a bad adjective here, but VPC really won't give you what you ask for. Loosening the restriction to "generic" was a good idea. >> +check_a = vm.qmp('x-debug-block-dirty-bitmap-sha256', >> + node="drive0", name="bitmapA")['return']['sha256'] >> +check_b = vm.qmp('x-debug-block-dirty-bitmap-sha256', >> + node="drive0", name="bitmapD")['return']['sha256'] >> +assert(check_a == check_b) > > hmm, a funny suggestion: s/check_b/check_d/ Oh, yes, that would be better. > >> + >> +log('\n--- Removing bitmaps A, B, C, and D ---\n') > > what about failed transaction with remove command, for a full kit? > Remove isn't transactionable! >> +vm.qmp_log("block-dirty-bitmap-remove", node="drive0", name="bitmapA") >> +vm.qmp_log("block-dirty-bitmap-remove", node="drive0", name="bitmapB") >> +vm.qmp_log("block-dirty-bitmap-remove", node="drive0", name="bitmapC") >> +vm.qmp_log("block-dirty-bitmap-remove", node="drive0", name="bitmapD") >> + >> +log('\n--- Final Query ---\n') >> +log(query_bitmaps(vm), indent=2) >> + >> +log('\n--- Done ---\n') >> +vm.shutdown() > > > with or without any of my suggestions: > Reviewed-by: Vladimir Sementsov-Ogievskiy > Thanks! --js
Re: [Qemu-devel] [PATCH] linux-user: Add safe_syscall for riscv64 host
On Thu, 20 Dec 2018 at 20:16, Richard Henderson wrote: > > Signed-off-by: Richard Henderson > --- > > At some point we should make this routine be non-optional for > porting to a new host. Yes, I agree -- how many hosts do we still have which are missing support for it? thanks -- PMM
Re: [Qemu-devel] [PULL 0/2] Miscellaneous patches for 2018-12-20
On Thu, 20 Dec 2018 at 09:47, Markus Armbruster wrote: > > The following changes since commit b72566a4ffaddbc0c0c1f6f5ee91b42ab13ff429: > > Merge remote-tracking branch > 'remotes/vivier2/tags/trivial-patches-pull-request' into staging (2018-12-19 > 15:31:02 +) > > are available in the Git repository at: > > git://repo.or.cz/qemu/armbru.git tags/pull-misc-2018-12-20 > > for you to fetch changes up to 3a6b016d6487f3492bc1b80b2c3bc25c67aab8e2: > > build: Remake config-host.mak when VERSION changes (2018-12-20 10:31:08 > +0100) > > > Miscellaneous patches for 2018-12-20 > > > Markus Armbruster (2): > Clean up includes > build: Remake config-host.mak when VERSION changes Applied, thanks. -- PMM
[Qemu-devel] [PATCH] linux-user: Add safe_syscall for riscv64 host
Signed-off-by: Richard Henderson --- At some point we should make this routine be non-optional for porting to a new host. r~ --- linux-user/host/riscv64/hostdep.h | 23 +++ linux-user/host/riscv64/safe-syscall.inc.S | 77 ++ 2 files changed, 100 insertions(+) create mode 100644 linux-user/host/riscv64/safe-syscall.inc.S diff --git a/linux-user/host/riscv64/hostdep.h b/linux-user/host/riscv64/hostdep.h index 28467ba00b..865f0fb9ff 100644 --- a/linux-user/host/riscv64/hostdep.h +++ b/linux-user/host/riscv64/hostdep.h @@ -8,4 +8,27 @@ #ifndef RISCV64_HOSTDEP_H #define RISCV64_HOSTDEP_H +/* We have a safe-syscall.inc.S */ +#define HAVE_SAFE_SYSCALL + +#ifndef __ASSEMBLER__ + +/* These are defined by the safe-syscall.inc.S file */ +extern char safe_syscall_start[]; +extern char safe_syscall_end[]; + +/* Adjust the signal context to rewind out of safe-syscall if we're in it */ +static inline void rewind_if_in_safe_syscall(void *puc) +{ +ucontext_t *uc = puc; +unsigned long *pcreg = >uc_mcontext.__gregs[REG_PC]; + +if (*pcreg > (uintptr_t)safe_syscall_start +&& *pcreg < (uintptr_t)safe_syscall_end) { +*pcreg = (uintptr_t)safe_syscall_start; +} +} + +#endif /* __ASSEMBLER__ */ + #endif diff --git a/linux-user/host/riscv64/safe-syscall.inc.S b/linux-user/host/riscv64/safe-syscall.inc.S new file mode 100644 index 00..9ca3fbfd1e --- /dev/null +++ b/linux-user/host/riscv64/safe-syscall.inc.S @@ -0,0 +1,77 @@ +/* + * safe-syscall.inc.S : host-specific assembly fragment + * to handle signals occurring at the same time as system calls. + * This is intended to be included by linux-user/safe-syscall.S + * + * Written by Richard Henderson + * Copyright (C) 2018 Linaro, Inc. + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING file in the top-level directory. + */ + + .global safe_syscall_base + .global safe_syscall_start + .global safe_syscall_end + .type safe_syscall_base, @function + .type safe_syscall_start, @function + .type safe_syscall_end, @function + + /* +* This is the entry point for making a system call. The calling +* convention here is that of a C varargs function with the +* first argument an 'int *' to the signal_pending flag, the +* second one the system call number (as a 'long'), and all further +* arguments being syscall arguments (also 'long'). +* We return a long which is the syscall's return value, which +* may be negative-errno on failure. Conversion to the +* -1-and-errno-set convention is done by the calling wrapper. +*/ +safe_syscall_base: + .cfi_startproc + /* +* The syscall calling convention is nearly the same as C: +* we enter with a0 == *signal_pending +* a1 == syscall number +* a2 ... a7 == syscall arguments +* and return the result in a0 +* and the syscall instruction needs +* a7 == syscall number +* a0 ... a5 == syscall arguments +* and returns the result in a0 +* Shuffle everything around appropriately. +*/ + mv t0, a0 /* signal_pending pointer */ + mv t1, a1 /* syscall number */ + mv a0, a2 /* syscall arguments */ + mv a1, a3 + mv a2, a4 + mv a3, a5 + mv a4, a6 + mv a5, a7 + mv a7, t1 + + /* +* This next sequence of code works in conjunction with the +* rewind_if_safe_syscall_function(). If a signal is taken +* and the interrupted PC is anywhere between 'safe_syscall_start' +* and 'safe_syscall_end' then we rewind it to 'safe_syscall_start'. +* The code sequence must therefore be able to cope with this, and +* the syscall instruction must be the final one in the sequence. +*/ +safe_syscall_start: + /* If signal_pending is non-zero, don't do the call */ + lw t1, 0(t0) + bnezt1, 0f + scall +safe_syscall_end: + /* code path for having successfully executed the syscall */ + ret + +0: + /* code path when we didn't execute the syscall */ + li a0, -TARGET_ERESTARTSYS + ret + .cfi_endproc + + .size safe_syscall_base, .-safe_syscall_base -- 2.17.2
Re: [Qemu-devel] [PATCH v6 07/28] compat: replace PC_COMPAT_3_0 & HW_COMPAT_3_0 macros
On Thu, Dec 13, 2018 at 01:48:29AM +0400, Marc-André Lureau wrote: > Use static arrays instead. > > Suggested-by: Eduardo Habkost > Signed-off-by: Marc-André Lureau In case you need to respin the series: I suggest squashing patches 07-19 together. -- Eduardo
[Qemu-devel] [Bug 1809304] [NEW] qemu-img convert is freezing for some DMG files.
Public bug reported: Recently, I created a file using hdiutil from MacOS (using Zlib compression): $ hdiutil create -volname MyVolName -srcfolder /path/to/my/vol/ -ov -format UDZO myvolname.dmg But, when I try to convert this volume using qemu-img convert, this command is freezing. I'm using the upstream version to test it. It is freezing inside the binary search method to retrieve the chunk. But, I still don't know why. I'm attaching the file as an example. It can be mounted using MacOS or other Linux apps like hfsleuth and darling-dmg. ** Affects: qemu Importance: Undecided Status: New ** Tags: dmg qemu-img ** Attachment added: "Firefox-Zlib.dmg" https://bugs.launchpad.net/bugs/1809304/+attachment/5223852/+files/Firefox-Zlib.dmg -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1809304 Title: qemu-img convert is freezing for some DMG files. Status in QEMU: New Bug description: Recently, I created a file using hdiutil from MacOS (using Zlib compression): $ hdiutil create -volname MyVolName -srcfolder /path/to/my/vol/ -ov -format UDZO myvolname.dmg But, when I try to convert this volume using qemu-img convert, this command is freezing. I'm using the upstream version to test it. It is freezing inside the binary search method to retrieve the chunk. But, I still don't know why. I'm attaching the file as an example. It can be mounted using MacOS or other Linux apps like hfsleuth and darling-dmg. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1809304/+subscriptions
[Qemu-devel] [Bug 1737883] Re: Cannot boot FreeBSD on versatilepb machine
** Tags removed: qemu ** Tags added: arm -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1737883 Title: Cannot boot FreeBSD on versatilepb machine Status in QEMU: Incomplete Bug description: I know some years ago it was possible to boot FreeBSD in QEMU versatilepb machine https://kernelnomicon.org/?p=229 (you can download image and kernel using web.archive.org) Now when I try to do that I get only black screen with no output even in QEMU console. I also added -global versatile_pci.broken-irq-mapping=1, but this seem to have no effect. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1737883/+subscriptions
Re: [Qemu-devel] [PATCH v2 00/23] Add RISC-V TCG backend support
On Thu, 20 Dec 2018 09:20:05 PST (-0800), alistai...@gmail.com wrote: On Wed, Dec 19, 2018 at 10:07 PM Richard Henderson wrote: On 12/19/18 11:16 AM, Alistair Francis wrote: > This patch set adds RISC-V backend support to QEMU. This is based on > Michael Clark's original work with extra work on top. > > This has been somewhat tested and can run other architecture softmmu > code. It seems that any complex OS will eventually hang, but we can > run the BIOS and OS startup code for a number of different operating > systems. > > I haven't tested linux user support at all yet. I think Michael had that > working reliably though and hopefully my changes haven't broken it. > > There are still some todos in the code (there are missing instructions > and byte swapping) but these should assert instead of generating invalid > code. Queued to tcg-next, with the extrh fix. Thanks Richard! Sounds good to me. I'm still attempting to collect the RISC-V patches to get a PR out, a few things came up but I should have time now. This was the biggest patch set, so it should be a lot easier now. Thanks for picking this up! Some of those todos are no longer todos, since e.g. bswap is now optional. Those asserts should never fire (as a good assert should do, I suppose). The missing instructions are only for riscv32, which afaik is just now making its way to glibc. So a chroot complete enough to build qemu is a ways away. I'm ok with leaving that incomplete for now. We've decided to delay the rv32i glibc port until after the next glibc release, which is targeted for the beginning of February. glibc should freeze at the end of the year, at which point we're going to do a rv32i glibc prerelease and try to build a proper userspace with the theory being that we'll shake out ABI bugs that way.
Re: [Qemu-devel] [PATCH v2 00/23] Add RISC-V TCG backend support
On Thu, Dec 20, 2018 at 10:45 AM Palmer Dabbelt wrote: > > On Thu, 20 Dec 2018 09:20:05 PST (-0800), alistai...@gmail.com wrote: > > On Wed, Dec 19, 2018 at 10:07 PM Richard Henderson > > wrote: > >> > >> On 12/19/18 11:16 AM, Alistair Francis wrote: > >> > This patch set adds RISC-V backend support to QEMU. This is based on > >> > Michael Clark's original work with extra work on top. > >> > > >> > This has been somewhat tested and can run other architecture softmmu > >> > code. It seems that any complex OS will eventually hang, but we can > >> > run the BIOS and OS startup code for a number of different operating > >> > systems. > >> > > >> > I haven't tested linux user support at all yet. I think Michael had that > >> > working reliably though and hopefully my changes haven't broken it. > >> > > >> > There are still some todos in the code (there are missing instructions > >> > and byte swapping) but these should assert instead of generating invalid > >> > code. > >> > >> Queued to tcg-next, with the extrh fix. > > > > Thanks Richard! > > Sounds good to me. I'm still attempting to collect the RISC-V patches to get > a > PR out, a few things came up but I should have time now. This was the biggest > patch set, so it should be a lot easier now. > > Thanks for picking this up! > > >> Some of those todos are no longer todos, since e.g. bswap is now optional. > >> Those asserts should never fire (as a good assert should do, I suppose). > >> > >> The missing instructions are only for riscv32, which afaik is just now > >> making > >> its way to glibc. So a chroot complete enough to build qemu is a ways > >> away. > >> I'm ok with leaving that incomplete for now. > > We've decided to delay the rv32i glibc port until after the next glibc > release, > which is targeted for the beginning of February. glibc should freeze at the > end of the year, at which point we're going to do a rv32i glibc prerelease and > try to build a proper userspace with the theory being that we'll shake out ABI > bugs that way. Yocto/OE has full support for building 32-bit userspaces with the latest 32-bit glibc patchset so that is probably a good place to start testing. It even runs QEMU! Alistair
[Qemu-devel] [PULL v3 42/44] q35: set split kernel irqchip as default
From: Peter Xu Starting from QEMU 4.0, let's specify "split" as the default value for kernel-irqchip. So for QEMU>=4.0 we'll have: allowed=Y,required=N,split=Y for QEMU<=3.1 we'll have: allowed=Y,required=N,split=N (omitting all the "kernel_irqchip_" prefix) Note that this will let the default q35 machine type to depend on Linux version 4.4 or newer because that's where split irqchip is introduced in kernel. But it's fine since we're boosting supported Linux version for QEMU 4.0 to around Linux 4.5. For more information please refer to the discussion on AMD's RDTSCP: https://lore.kernel.org/lkml/20181210181328.ga...@zn.tnic/ Signed-off-by: Peter Xu Reviewed-by: Eduardo Habkost Acked-by: Paolo Bonzini Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- include/hw/boards.h | 1 + hw/core/machine.c | 2 ++ hw/i386/pc_q35.c| 2 ++ 3 files changed, 5 insertions(+) diff --git a/include/hw/boards.h b/include/hw/boards.h index f82f28468b..362384815e 100644 --- a/include/hw/boards.h +++ b/include/hw/boards.h @@ -195,6 +195,7 @@ struct MachineClass { const char *hw_version; ram_addr_t default_ram_size; const char *default_cpu_type; +bool default_kernel_irqchip_split; bool option_rom_has_mr; bool rom_file_has_mr; int minimum_page_bits; diff --git a/hw/core/machine.c b/hw/core/machine.c index c51423b647..4439ea663f 100644 --- a/hw/core/machine.c +++ b/hw/core/machine.c @@ -653,8 +653,10 @@ static void machine_class_base_init(ObjectClass *oc, void *data) static void machine_initfn(Object *obj) { MachineState *ms = MACHINE(obj); +MachineClass *mc = MACHINE_GET_CLASS(obj); ms->kernel_irqchip_allowed = true; +ms->kernel_irqchip_split = mc->default_kernel_irqchip_split; ms->kvm_shadow_mem = -1; ms->dump_guest_core = true; ms->mem_merge = true; diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c index 8836d21485..0a3f3f18e4 100644 --- a/hw/i386/pc_q35.c +++ b/hw/i386/pc_q35.c @@ -304,6 +304,7 @@ static void pc_q35_machine_options(MachineClass *m) m->units_per_default_bus = 1; m->default_machine_opts = "firmware=bios-256k.bin"; m->default_display = "std"; +m->default_kernel_irqchip_split = true; m->no_floppy = 1; machine_class_allow_dynamic_sysbus_dev(m, TYPE_AMD_IOMMU_DEVICE); machine_class_allow_dynamic_sysbus_dev(m, TYPE_INTEL_IOMMU_DEVICE); @@ -323,6 +324,7 @@ DEFINE_Q35_MACHINE(v4_0, "pc-q35-4.0", NULL, static void pc_q35_3_1_machine_options(MachineClass *m) { pc_q35_4_0_machine_options(m); +m->default_kernel_irqchip_split = false; m->alias = NULL; SET_MACHINE_COMPAT(m, PC_COMPAT_3_1); } -- MST
[Qemu-devel] [Bug 1781463] Re: qemu don't start *.abs firmware files
** Changed in: qemu Status: New => Opinion -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1781463 Title: qemu don't start *.abs firmware files Status in QEMU: Opinion Bug description: Hello Devs, I'm here to report this bug/issue because i'm using Win64 Qemu but i can't start a *.abs firmware at normally this firmware is based in Linux Kernel and this type of firmware is made for STB Receivers, So this is all information i provide to get support. Files extracted by ( binwalk -e ) Terminal output: # binwalk -e AMIKO_HD8150_2.4.43_emu.abs DECIMAL HEXADECIMAL DESCRIPTION 1967360x30080 LZMA compressed data, properties: 0x6C, dictionary size: 8388608 bytes, uncompressed size: 11883876 bytes 3866752 0x3B0080LZMA compressed data, properties: 0x6C, dictionary size: 8388608 bytes, uncompressed size: 3255512 bytes 5636224 0x560080LZMA compressed data, properties: 0x6C, dictionary size: 8388608 bytes, uncompressed size: 87904 bytes Files extracted with ALI TOOLS or Ali FirmwareDecriptor. Windows files output: Software used: Ali Main Code Decrypter 8.9 Files unpacked: bootloader MemCfg maincode(AV) seecode default_lang cipluskey countryband logo_user logo_menu logo_radio logo_boot patch defaultdb(PRC) userdb(64+64) Terminal OUTPUT: # hexdump -C part of file 00b51a30 00 00 00 00 4c 69 62 63 6f 72 65 20 76 65 72 73 |Libcore vers| 00b51a40 69 6f 6e 20 31 33 2e 31 36 2e 30 40 53 44 4b 34 |ion 13.16.0@SDK4| 00b51a50 2e 30 66 61 2e 31 33 2e 31 36 5f 32 30 31 36 31 |.0fa.13.16_20161| 00b51a60 30 31 39 28 67 63 63 20 76 65 72 73 69 6f 6e 20 |019(gcc version | 00b51a70 33 2e 34 2e 34 20 6d 69 70 73 73 64 65 2d 36 2e |3.4.4 mipssde-6.| 00b51a80 30 36 2e 30 31 2d 32 30 30 37 30 34 32 30 29 28 |06.01-20070420)(| 00b51a90 41 64 6d 69 6e 69 73 74 72 61 74 6f 72 40 20 46 |Administrator@ F| 00b51aa0 72 69 2c 20 4a 75 6c 20 32 38 2c 20 32 30 31 37 |ri, Jul 28, 2017| 00b51ab0 20 31 32 3a 35 33 3a 32 38 20 41 4d 29 0a 00 00 | 12:53:28 AM)...| 00b51ac0 44 4d 58 5f 53 33 36 30 31 5f 30 00 00 a1 03 18 |DMX_S3601_0.| When I use readelf it says files isn't an ELF file, so i can't run it like a kernel (Bootloader,Maincode, and etc. ) so this is the cmd output when i use qemu Win64 (I don't whant to use linux to do the emulation about this *.abs extension firmware so please help me for win64 version from Qemu) CMD OUTPUT: C:\Program Files\qemu>qemu-system-mips.exe -machine mips -cpu mips32r6-generic -drive file=C:\30080.bin,index=0,media=disk,format=raw qemu-system-mips.exe: warning: could not load MIPS bios 'mips_bios.bin' I also tried a lot of diferents qemu-system... and a lot of diferent configs like -machine -cpu -kernel -driver root= -PFLASH and etc... and nothing hapenned How can i reproduce this issue ? Reply:. Donwload *.abs firmware in amikoreceiver.com (only *.abs) and download AliDekompressor in http://www.satedu.cba.pl/ Direct tools: FirmwareDecrypter_v8.9.zip : http://www.satedu.cba.pl/index.php?action=downloadfile=FirmwareDecrypter_v8.9.zip=Test%20Folder; Ali__tools_Console_v4.0__CRC_FIXER.rar : http://www.satedu.cba.pl/index.php?action=downloadfile=Ali__tools_Console_v4.0__CRC_FIXER.rar=Test%20Folder; so if Qemu can explain how can i fix this issue this can be highly helpfull. With my best regards, David Martins Screamfox To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1781463/+subscriptions
[Qemu-devel] [Bug 1809291] Re: ARM PL181 (mmc for Security Digital Card) not working in Ubuntu 18.10 (CMD 2, 3 timeout). The SDC driver worked fine in Ubuntu 18.04 and earlier versions but not in Ubun
'Hi, from this report your setup is unclear to me'. Hi, I am not using Linux kernel. The t.bin image is a program built with .s and .c files using gcc-arm-none-eabi for ARM The sdimage is just a regular 1MB file, which is used by the -sd sdmage as a virtual SDC card for qemu-system-arm under Ubuntu 18.10 The t.bin code calls sdc_init() to initialize the PL181 mmc card, which is int do_command(int cmd, int arg, int resp) { *(u32 *)(base + ARGUMENT) = (u32)arg; *(u32 *)(base + COMMAND) = 0x400 | (resp<<6) | cmd; delay(); } int sdc_init() { u32 RCA = (u32)0x4567; // QEMU's hard-coded RCA base= (u32)0x10005000; // PL180 base address printf("sdc_init : "); *(u32 *)(base + POWER) = (u32)0xBF; // power on *(u32 *)(base + CLOCK) = (u32)0xC6; // default CLK // send init command sequence do_command(0, 0, MMC_RSP_NONE);// idle state do_command(55, 0, MMC_RSP_R1); // ready state do_command(41, 1, MMC_RSP_R3); // argument must not be zero do_command(2, 0, MMC_RSP_R2); // ask card CID do_command(3, RCA, MMC_RSP_R1); // assign RCA do_command(7, RCA, MMC_RSP_R1); // transfer state: must use RCA do_command(16, 512, MMC_RSP_R1); // set data block length // set interrupt MASK0 registers bits = RxAvail|TxEmpty *(u32 *)(base + MASK0) = (1<<21)|(1<<18); //0x0024; printf("done\n"); } After each command, read the MMC status to check for errors. Commands 2, 3, 7, 16 all failed due to timeout, indicating the MMC card does not respond. But the PL181 does generate interrupts for read/write sector commands. As stated before, the SAME driver code worked fine for all earlier versions of Ubuntu: 15 to 18.04 -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/1809291 Title: ARM PL181 (mmc for Security Digital Card) not working in Ubuntu 18.10 (CMD 2,3 timeout). The SDC driver worked fine in Ubuntu 18.04 and earlier versions but not in Ubuntu 18.10 Status in QEMU: New Bug description: ARM PL181 MMC card no longer working in qemu-system-arm in Ubuntu 18.10 The MMC driver code worked fine in Ubuntu 15.10 to 18.04. The command to run qemu-system-arm is qemu-system-arm -M versatilepb -m 256M -sd sdimage -kernel t.bin -serial mon:stdio During SDC initialization, SDC commands 2, 3, 9, 13, 7, 16 all timeout, which cause subsequent read/write commands 17/24 to fail also. Tried both ARM versatilepb and realview-pb-a8, realview-pbx-a9 boards: all the same. To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/1809291/+subscriptions
[Qemu-devel] [PULL v3 44/44] x86-iommu: turn on IR by default if proper
From: Peter Xu When the user didn't specify "intremap" for the IOMMU device, we turn it on by default if it is supported. This will turn IR on for the default Q35 platform as long as the IOMMU device is specified on new kernels. Signed-off-by: Peter Xu Acked-by: Paolo Bonzini Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- hw/i386/x86-iommu.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/hw/i386/x86-iommu.c b/hw/i386/x86-iommu.c index 61ee0f1eaa..d1534c1ae0 100644 --- a/hw/i386/x86-iommu.c +++ b/hw/i386/x86-iommu.c @@ -112,6 +112,7 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp) PCMachineState *pcms = PC_MACHINE(object_dynamic_cast(OBJECT(ms), TYPE_PC_MACHINE)); QLIST_INIT(_iommu->iec_notifiers); +bool irq_all_kernel = kvm_irqchip_in_kernel() && !kvm_irqchip_is_split(); if (!pcms || !pcms->bus) { error_setg(errp, "Machine-type '%s' not supported by IOMMU", @@ -121,12 +122,12 @@ static void x86_iommu_realize(DeviceState *dev, Error **errp) /* If the user didn't specify IR, choose a default value for it */ if (x86_iommu->intr_supported == ON_OFF_AUTO_AUTO) { -x86_iommu->intr_supported = ON_OFF_AUTO_OFF; +x86_iommu->intr_supported = irq_all_kernel ? +ON_OFF_AUTO_OFF : ON_OFF_AUTO_ON; } /* Both Intel and AMD IOMMU IR only support "kernel-irqchip={off|split}" */ -if (x86_iommu_ir_supported(x86_iommu) && kvm_irqchip_in_kernel() && -!kvm_irqchip_is_split()) { +if (x86_iommu_ir_supported(x86_iommu) && irq_all_kernel) { error_setg(errp, "Interrupt Remapping cannot work with " "kernel-irqchip=on, please use 'split|off'."); return; -- MST
[Qemu-devel] [PULL v3 38/44] pci: Reuse pci-bridge hotplug handler handlers for pcie-pci-bridge
From: David Hildenbrand These functions are essentially the same, we only have to use object_get_typename() for reporting errors. So let's share the implementation of hotplug handler callbacks. Suggested-by: Igor Mammedov Reviewed-by: Igor Mammedov Signed-off-by: David Hildenbrand Reviewed-by: David Gibson Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- include/hw/pci/pci_bridge.h | 4 hw/pci-bridge/pci_bridge_dev.c | 12 ++-- hw/pci-bridge/pcie_pci_bridge.c | 30 ++ 3 files changed, 12 insertions(+), 34 deletions(-) diff --git a/include/hw/pci/pci_bridge.h b/include/hw/pci/pci_bridge.h index cdff7edfd1..6e37c7551a 100644 --- a/include/hw/pci/pci_bridge.h +++ b/include/hw/pci/pci_bridge.h @@ -99,6 +99,10 @@ void pci_bridge_reset(DeviceState *qdev); void pci_bridge_initfn(PCIDevice *pci_dev, const char *typename); void pci_bridge_exitfn(PCIDevice *pci_dev); +void pci_bridge_dev_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, +Error **errp); +void pci_bridge_dev_unplug_request_cb(HotplugHandler *hotplug_dev, + DeviceState *dev, Error **errp); /* * before qdev initialization(qdev_init()), this function sets bus_name and diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c index e1df9a52ac..fa0be13ac4 100644 --- a/hw/pci-bridge/pci_bridge_dev.c +++ b/hw/pci-bridge/pci_bridge_dev.c @@ -206,27 +206,27 @@ static const VMStateDescription pci_bridge_dev_vmstate = { } }; -static void pci_bridge_dev_plug_cb(HotplugHandler *hotplug_dev, - DeviceState *dev, Error **errp) +void pci_bridge_dev_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, +Error **errp) { PCIDevice *pci_hotplug_dev = PCI_DEVICE(hotplug_dev); if (!shpc_present(pci_hotplug_dev)) { error_setg(errp, "standard hotplug controller has been disabled for " - "this %s", TYPE_PCI_BRIDGE_DEV); + "this %s", object_get_typename(OBJECT(hotplug_dev))); return; } shpc_device_plug_cb(hotplug_dev, dev, errp); } -static void pci_bridge_dev_unplug_request_cb(HotplugHandler *hotplug_dev, - DeviceState *dev, Error **errp) +void pci_bridge_dev_unplug_request_cb(HotplugHandler *hotplug_dev, + DeviceState *dev, Error **errp) { PCIDevice *pci_hotplug_dev = PCI_DEVICE(hotplug_dev); if (!shpc_present(pci_hotplug_dev)) { error_setg(errp, "standard hotplug controller has been disabled for " - "this %s", TYPE_PCI_BRIDGE_DEV); + "this %s", object_get_typename(OBJECT(hotplug_dev))); return; } shpc_device_unplug_request_cb(hotplug_dev, dev, errp); diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c index c634353b06..0ffea680d5 100644 --- a/hw/pci-bridge/pcie_pci_bridge.c +++ b/hw/pci-bridge/pcie_pci_bridge.c @@ -137,32 +137,6 @@ static const VMStateDescription pcie_pci_bridge_dev_vmstate = { } }; -static void pcie_pci_bridge_plug_cb(HotplugHandler *hotplug_dev, -DeviceState *dev, Error **errp) -{ -PCIDevice *pci_hotplug_dev = PCI_DEVICE(hotplug_dev); - -if (!shpc_present(pci_hotplug_dev)) { -error_setg(errp, "standard hotplug controller has been disabled for " - "this %s", TYPE_PCIE_PCI_BRIDGE_DEV); -return; -} -shpc_device_plug_cb(hotplug_dev, dev, errp); -} - -static void pcie_pci_bridge_unplug_request_cb(HotplugHandler *hotplug_dev, - DeviceState *dev, Error **errp) -{ -PCIDevice *pci_hotplug_dev = PCI_DEVICE(hotplug_dev); - -if (!shpc_present(pci_hotplug_dev)) { -error_setg(errp, "standard hotplug controller has been disabled for " - "this %s", TYPE_PCIE_PCI_BRIDGE_DEV); -return; -} -shpc_device_unplug_request_cb(hotplug_dev, dev, errp); -} - static void pcie_pci_bridge_class_init(ObjectClass *klass, void *data) { PCIDeviceClass *k = PCI_DEVICE_CLASS(klass); @@ -179,8 +153,8 @@ static void pcie_pci_bridge_class_init(ObjectClass *klass, void *data) dc->props = pcie_pci_bridge_dev_properties; dc->reset = _pci_bridge_reset; set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories); -hc->plug = pcie_pci_bridge_plug_cb; -hc->unplug_request = pcie_pci_bridge_unplug_request_cb; +hc->plug = pci_bridge_dev_plug_cb; +hc->unplug_request = pci_bridge_dev_unplug_request_cb; } static const TypeInfo pcie_pci_bridge_info = { -- MST
Re: [Qemu-devel] [PULL v3 00/35] Misc patches for 2018-12-18
On 20/12/18 18:41, Peter Maydell wrote: >> This seemed to work on most of my test hosts but something >> weird happened here: hyperlong repetitive command line and >> looks like make got an "fwrite(): Resource temporarily unavailable" >> halfway through writing it out?? >> >> This was on my x86-64 Linux Ubuntu system, clang build >> which I configure with '--cc=clang' '--cxx=clang++' '--enable-gtk' >> '--extra-cflags=-fsanitize=undefined -fno-sanitize=shift-base -Werror'; >> the error is during the 'make -C build/clang check V=1' phase. > > Checking my logfiles I think that the previous pull request > apply attempt also failed this way, but I didn't notice > because of the other failures on other hosts. It may be > the combination of the huge command line and the way my > test setup uses GNU parallel and ssh to capture the make > output into a logfile. Ok, I got it even better than the old harness. Now each test is printed individually and the V=1 command lines are very easily cut-and-pasted: TESTcheck-qtest-x86_64: tests/endianness-test TESTcheck-qtest-x86_64: tests/fdc-test TESTcheck-qtest-x86_64: tests/ide-test TESTcheck-qtest-x86_64: tests/ahci-test With V=1: export MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img; tests/endianness-test -m=quick -k --tap | ./scripts/tap-driver.pl --test-name="endianness-test" PASS 1 endianness-test /x86_64/endianness/pc PASS 2 endianness-test /x86_64/endianness/split/pc PASS 3 endianness-test /x86_64/endianness/combine/pc export MALLOC_PERTURB_=${MALLOC_PERTURB_:-$(( ${RANDOM:-0} % 255 + 1))} QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 QTEST_QEMU_IMG=qemu-img; tests/fdc-test -m=quick -k --tap | ./scripts/tap-driver.pl --test-name="fdc-test" PASS 1 fdc-test /x86_64/fdc/cmos PASS 2 fdc-test /x86_64/fdc/no_media_on_start PASS 3 fdc-test /x86_64/fdc/read_without_media ... (as long as you don't use -k). Thanks for the test. Paolo
[Qemu-devel] [PULL v3 39/44] pci/shpc: perform unplug via the hotplug handler
From: David Hildenbrand Introduce and use the "unplug" callback. This is a preparation for multi-stage hotplug handlers, whereby the bus hotplug handler is overwritten by the machine hotplug handler. This handler will then pass control to the bus hotplug handler. So to get this running cleanly, we also have to make sure to go via the hotplug handler chain when actually unplugging a device after an unplug request. Lookup the hotplug handler and call "unplug". Reviewed-by: David Gibson Signed-off-by: David Hildenbrand Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- include/hw/pci/pci_bridge.h | 2 ++ include/hw/pci/shpc.h | 2 ++ hw/pci-bridge/pci_bridge_dev.c | 10 ++ hw/pci-bridge/pcie_pci_bridge.c | 1 + hw/pci/shpc.c | 11 ++- 5 files changed, 25 insertions(+), 1 deletion(-) diff --git a/include/hw/pci/pci_bridge.h b/include/hw/pci/pci_bridge.h index 6e37c7551a..ba488818d2 100644 --- a/include/hw/pci/pci_bridge.h +++ b/include/hw/pci/pci_bridge.h @@ -101,6 +101,8 @@ void pci_bridge_exitfn(PCIDevice *pci_dev); void pci_bridge_dev_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp); +void pci_bridge_dev_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, + Error **errp); void pci_bridge_dev_unplug_request_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp); diff --git a/include/hw/pci/shpc.h b/include/hw/pci/shpc.h index 71293aca58..18f6ec1cd5 100644 --- a/include/hw/pci/shpc.h +++ b/include/hw/pci/shpc.h @@ -47,6 +47,8 @@ void shpc_cap_write_config(PCIDevice *d, uint32_t addr, uint32_t val, int len); void shpc_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp); +void shpc_device_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, + Error **errp); void shpc_device_unplug_request_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp); diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c index fa0be13ac4..ff6b8323da 100644 --- a/hw/pci-bridge/pci_bridge_dev.c +++ b/hw/pci-bridge/pci_bridge_dev.c @@ -219,6 +219,15 @@ void pci_bridge_dev_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, shpc_device_plug_cb(hotplug_dev, dev, errp); } +void pci_bridge_dev_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, + Error **errp) +{ +PCIDevice *pci_hotplug_dev = PCI_DEVICE(hotplug_dev); + +g_assert(shpc_present(pci_hotplug_dev)); +shpc_device_unplug_cb(hotplug_dev, dev, errp); +} + void pci_bridge_dev_unplug_request_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { @@ -251,6 +260,7 @@ static void pci_bridge_dev_class_init(ObjectClass *klass, void *data) dc->vmsd = _bridge_dev_vmstate; set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories); hc->plug = pci_bridge_dev_plug_cb; +hc->unplug = pci_bridge_dev_unplug_cb; hc->unplug_request = pci_bridge_dev_unplug_request_cb; } diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c index 0ffea680d5..d491b40d04 100644 --- a/hw/pci-bridge/pcie_pci_bridge.c +++ b/hw/pci-bridge/pcie_pci_bridge.c @@ -154,6 +154,7 @@ static void pcie_pci_bridge_class_init(ObjectClass *klass, void *data) dc->reset = _pci_bridge_reset; set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories); hc->plug = pci_bridge_dev_plug_cb; +hc->unplug = pci_bridge_dev_unplug_cb; hc->unplug_request = pci_bridge_dev_unplug_request_cb; } diff --git a/hw/pci/shpc.c b/hw/pci/shpc.c index 7851bfa8ad..45053b39b9 100644 --- a/hw/pci/shpc.c +++ b/hw/pci/shpc.c @@ -238,6 +238,7 @@ static void shpc_invalid_command(SHPCDevice *shpc) static void shpc_free_devices_in_slot(SHPCDevice *shpc, int slot) { +HotplugHandler *hotplug_ctrl; int devfn; int pci_slot = SHPC_IDX_TO_PCI(slot); for (devfn = PCI_DEVFN(pci_slot, 0); @@ -245,7 +246,9 @@ static void shpc_free_devices_in_slot(SHPCDevice *shpc, int slot) ++devfn) { PCIDevice *affected_dev = shpc->sec_bus->devices[devfn]; if (affected_dev) { -object_unparent(OBJECT(affected_dev)); +hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(affected_dev)); +hotplug_handler_unplug(hotplug_ctrl, DEVICE(affected_dev), + _abort); } } } @@ -540,6 +543,12 @@ void shpc_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, shpc_interrupt_update(pci_hotplug_dev); } +void shpc_device_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, + Error **errp) +{ +object_unparent(OBJECT(dev)); +} + void shpc_device_unplug_request_cb(HotplugHandler *hotplug_dev,
[Qemu-devel] [PULL v3 34/44] pci/pcihp: perform check for bus capability in pre_plug handler
From: David Hildenbrand Perform the check in the pre_plug handler. In addition, we need the capability only if the device is actually hotplugged (and not created during machine initialization). This is a preparation for coldplugging pci devices via that hotplug handler. Reviewed-by: Igor Mammedov Signed-off-by: David Hildenbrand Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- include/hw/acpi/pcihp.h | 2 ++ hw/acpi/pcihp.c | 21 +++-- hw/acpi/piix4.c | 16 ++-- 3 files changed, 31 insertions(+), 8 deletions(-) diff --git a/include/hw/acpi/pcihp.h b/include/hw/acpi/pcihp.h index 8a65f99fc8..ce31625850 100644 --- a/include/hw/acpi/pcihp.h +++ b/include/hw/acpi/pcihp.h @@ -56,6 +56,8 @@ typedef struct AcpiPciHpState { void acpi_pcihp_init(Object *owner, AcpiPciHpState *, PCIBus *root, MemoryRegion *address_space_io, bool bridges_enabled); +void acpi_pcihp_device_pre_plug_cb(HotplugHandler *hotplug_dev, + DeviceState *dev, Error **errp); void acpi_pcihp_device_plug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState *s, DeviceState *dev, Error **errp); void acpi_pcihp_device_unplug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState *s, diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c index 80d42e12ff..5e7cef173c 100644 --- a/hw/acpi/pcihp.c +++ b/hw/acpi/pcihp.c @@ -217,17 +217,24 @@ void acpi_pcihp_reset(AcpiPciHpState *s) acpi_pcihp_update(s); } +void acpi_pcihp_device_pre_plug_cb(HotplugHandler *hotplug_dev, + DeviceState *dev, Error **errp) +{ +/* Only hotplugged devices need the hotplug capability. */ +if (dev->hotplugged && +acpi_pcihp_get_bsel(pci_get_bus(PCI_DEVICE(dev))) < 0) { +error_setg(errp, "Unsupported bus. Bus doesn't have property '" + ACPI_PCIHP_PROP_BSEL "' set"); +return; +} +} + void acpi_pcihp_device_plug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState *s, DeviceState *dev, Error **errp) { PCIDevice *pdev = PCI_DEVICE(dev); int slot = PCI_SLOT(pdev->devfn); -int bsel = acpi_pcihp_get_bsel(pci_get_bus(pdev)); -if (bsel < 0) { -error_setg(errp, "Unsupported bus. Bus doesn't have property '" - ACPI_PCIHP_PROP_BSEL "' set"); -return; -} +int bsel; /* Don't send event when device is enabled during qemu machine creation: * it is present on boot, no hotplug event is necessary. We do send an @@ -236,6 +243,8 @@ void acpi_pcihp_device_plug_cb(HotplugHandler *hotplug_dev, AcpiPciHpState *s, return; } +bsel = acpi_pcihp_get_bsel(pci_get_bus(pdev)); +g_assert(bsel >= 0); s->acpi_pcihp_pci_status[bsel].up |= (1U << slot); acpi_send_event(DEVICE(hotplug_dev), ACPI_PCI_HOTPLUG_STATUS); } diff --git a/hw/acpi/piix4.c b/hw/acpi/piix4.c index 2f4dd03b83..f68e62d36c 100644 --- a/hw/acpi/piix4.c +++ b/hw/acpi/piix4.c @@ -371,6 +371,18 @@ static void piix4_pm_powerdown_req(Notifier *n, void *opaque) acpi_pm1_evt_power_down(>ar); } +static void piix4_device_pre_plug_cb(HotplugHandler *hotplug_dev, +DeviceState *dev, Error **errp) +{ +if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { +acpi_pcihp_device_pre_plug_cb(hotplug_dev, dev, errp); +} else if (!object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) && + !object_dynamic_cast(OBJECT(dev), TYPE_CPU)) { +error_setg(errp, "acpi: device pre plug request for not supported" + " device type: %s", object_get_typename(OBJECT(dev))); +} +} + static void piix4_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { @@ -393,8 +405,7 @@ static void piix4_device_plug_cb(HotplugHandler *hotplug_dev, acpi_cpu_plug_cb(hotplug_dev, >cpuhp_state, dev, errp); } } else { -error_setg(errp, "acpi: device plug request for not supported device" - " type: %s", object_get_typename(OBJECT(dev))); +g_assert_not_reached(); } } @@ -703,6 +714,7 @@ static void piix4_pm_class_init(ObjectClass *klass, void *data) */ dc->user_creatable = false; dc->hotpluggable = false; +hc->pre_plug = piix4_device_pre_plug_cb; hc->plug = piix4_device_plug_cb; hc->unplug_request = piix4_device_unplug_request_cb; hc->unplug = piix4_device_unplug_cb; -- MST
Re: [Qemu-devel] [PATCH v2 00/23] Add RISC-V TCG backend support
On Thu, 20 Dec 2018 11:04:41 PST (-0800), alistai...@gmail.com wrote: On Thu, Dec 20, 2018 at 10:45 AM Palmer Dabbelt wrote: On Thu, 20 Dec 2018 09:20:05 PST (-0800), alistai...@gmail.com wrote: > On Wed, Dec 19, 2018 at 10:07 PM Richard Henderson > wrote: >> >> On 12/19/18 11:16 AM, Alistair Francis wrote: >> > This patch set adds RISC-V backend support to QEMU. This is based on >> > Michael Clark's original work with extra work on top. >> > >> > This has been somewhat tested and can run other architecture softmmu >> > code. It seems that any complex OS will eventually hang, but we can >> > run the BIOS and OS startup code for a number of different operating >> > systems. >> > >> > I haven't tested linux user support at all yet. I think Michael had that >> > working reliably though and hopefully my changes haven't broken it. >> > >> > There are still some todos in the code (there are missing instructions >> > and byte swapping) but these should assert instead of generating invalid >> > code. >> >> Queued to tcg-next, with the extrh fix. > > Thanks Richard! Sounds good to me. I'm still attempting to collect the RISC-V patches to get a PR out, a few things came up but I should have time now. This was the biggest patch set, so it should be a lot easier now. Thanks for picking this up! >> Some of those todos are no longer todos, since e.g. bswap is now optional. >> Those asserts should never fire (as a good assert should do, I suppose). >> >> The missing instructions are only for riscv32, which afaik is just now making >> its way to glibc. So a chroot complete enough to build qemu is a ways away. >> I'm ok with leaving that incomplete for now. We've decided to delay the rv32i glibc port until after the next glibc release, which is targeted for the beginning of February. glibc should freeze at the end of the year, at which point we're going to do a rv32i glibc prerelease and try to build a proper userspace with the theory being that we'll shake out ABI bugs that way. Yocto/OE has full support for building 32-bit userspaces with the latest 32-bit glibc patchset so that is probably a good place to start testing. It even runs QEMU! That's my plan. The issue is less on the distro side and more on the "go through the whole rv32i glibc ABI to make sure it's sane" side.
[Qemu-devel] [PULL v3 30/44] hw/i386: Remove deprecated machines pc-0.10 and pc-0.11
From: Thomas Huth They've been deprecated for two releases and nobody complained that they are still required anymore, so it's time to remove these now. And while we're at it, mark the other remaining old 0.x machine types as deprecated (since they can not properly be used for live-migration anyway). Signed-off-by: Thomas Huth Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin Reviewed-by: Eduardo Habkost --- hw/i386/pc_piix.c | 70 ++- tests/cpu-plug-test.c | 4 +-- qemu-deprecated.texi | 2 +- 3 files changed, 4 insertions(+), 72 deletions(-) diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index e000c7511a..7f1cb527b5 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -368,7 +368,7 @@ static void pc_compat_1_2(MachineState *machine) x86_cpu_change_kvm_default("kvm-pv-eoi", NULL); } -/* PC compat function for pc-0.10 to pc-0.13 */ +/* PC compat function for pc-0.12 and pc-0.13 */ static void pc_compat_0_13(MachineState *machine) { pc_compat_1_2(machine); @@ -834,6 +834,7 @@ static void pc_i440fx_0_15_machine_options(MachineClass *m) { pc_i440fx_1_0_machine_options(m); m->hw_version = "0.15"; +m->deprecation_reason = "use a newer machine type instead"; SET_MACHINE_COMPAT(m, PC_COMPAT_0_15); } @@ -951,73 +952,6 @@ static void pc_i440fx_0_12_machine_options(MachineClass *m) DEFINE_I440FX_MACHINE(v0_12, "pc-0.12", pc_compat_0_13, pc_i440fx_0_12_machine_options); - -#define PC_COMPAT_0_11 \ -PC_CPU_MODEL_IDS("0.11") \ -{\ -.driver = "virtio-blk-pci",\ -.property = "vectors",\ -.value= stringify(0),\ -},{\ -.driver = TYPE_PCI_DEVICE,\ -.property = "rombar",\ -.value= stringify(0),\ -},{\ -.driver = "ide-drive",\ -.property = "ver",\ -.value= "0.11",\ -},{\ -.driver = "scsi-disk",\ -.property = "ver",\ -.value= "0.11",\ -}, - -static void pc_i440fx_0_11_machine_options(MachineClass *m) -{ -pc_i440fx_0_12_machine_options(m); -m->hw_version = "0.11"; -m->deprecation_reason = "use a newer machine type instead"; -SET_MACHINE_COMPAT(m, PC_COMPAT_0_11); -} - -DEFINE_I440FX_MACHINE(v0_11, "pc-0.11", pc_compat_0_13, - pc_i440fx_0_11_machine_options); - - -#define PC_COMPAT_0_10 \ -PC_CPU_MODEL_IDS("0.10") \ -{\ -.driver = "virtio-blk-pci",\ -.property = "class",\ -.value= stringify(PCI_CLASS_STORAGE_OTHER),\ -},{\ -.driver = "virtio-serial-pci",\ -.property = "class",\ -.value= stringify(PCI_CLASS_DISPLAY_OTHER),\ -},{\ -.driver = "virtio-net-pci",\ -.property = "vectors",\ -.value= stringify(0),\ -},{\ -.driver = "ide-drive",\ -.property = "ver",\ -.value= "0.10",\ -},{\ -.driver = "scsi-disk",\ -.property = "ver",\ -.value= "0.10",\ -}, - -static void pc_i440fx_0_10_machine_options(MachineClass *m) -{ -pc_i440fx_0_11_machine_options(m); -m->hw_version = "0.10"; -SET_MACHINE_COMPAT(m, PC_COMPAT_0_10); -} - -DEFINE_I440FX_MACHINE(v0_10, "pc-0.10", pc_compat_0_13, - pc_i440fx_0_10_machine_options); - typedef struct { uint16_t gpu_device_id; uint16_t pch_device_id; diff --git a/tests/cpu-plug-test.c b/tests/cpu-plug-test.c index f4a677d238..668f00144e 100644 --- a/tests/cpu-plug-test.c +++ b/tests/cpu-plug-test.c @@ -157,9 +157,7 @@ static void add_pc_test_case(const char *mname) (strcmp(mname, "pc-0.15") == 0) || (strcmp(mname, "pc-0.14") == 0) || (strcmp(mname, "pc-0.13") == 0) || -(strcmp(mname, "pc-0.12") == 0) || -(strcmp(mname, "pc-0.11") == 0) || -(strcmp(mname, "pc-0.10") == 0)) { +(strcmp(mname, "pc-0.12") == 0)) { path = g_strdup_printf("cpu-plug/%s/init/%ux%ux%u=%u", mname, data->sockets, data->cores, data->threads, data->maxcpus); diff --git a/qemu-deprecated.texi b/qemu-deprecated.texi index e362d37225..c3735b698e 100644 --- a/qemu-deprecated.texi +++ b/qemu-deprecated.texi @@ -134,7 +134,7 @@ their usecases. @section System emulator machines -@subsection pc-0.10 and pc-0.11 (since 3.0) +@subsection pc-0.12, pc-0.13, pc-0.14 and pc-0.15 (since 4.0) These machine types are very old and likely can not be used for live migration from old QEMU versions anymore. A newer machine type should be used instead. -- MST
[Qemu-devel] [PULL v3 41/44] pci: Adjust PCI config limit based on bus topology
From: Alex Williamson A conventional PCI bus does not support config space accesses above the standard 256 byte configuration space. PCIe-to-PCI bridges are not permitted to forward transactions if the extended register address field is non-zero and must handle it as an unsupported request (PCIe bridge spec rev 1.0, 4.1.3, 4.1.4). Therefore, we should not support extended config space if there is a conventional bus anywhere on the path to a device. Signed-off-by: Alex Williamson Reviewed-by: Marcel Apfelbaum Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- hw/pci/pci_host.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/hw/pci/pci_host.c b/hw/pci/pci_host.c index 5eaa935cb5..5f5345dbac 100644 --- a/hw/pci/pci_host.c +++ b/hw/pci/pci_host.c @@ -20,6 +20,7 @@ #include "qemu/osdep.h" #include "hw/pci/pci.h" +#include "hw/pci/pci_bridge.h" #include "hw/pci/pci_host.h" #include "hw/pci/pci_bus.h" #include "trace.h" @@ -50,9 +51,29 @@ static inline PCIDevice *pci_dev_find_by_addr(PCIBus *bus, uint32_t addr) return pci_find_device(bus, bus_num, devfn); } +static void pci_adjust_config_limit(PCIBus *bus, uint32_t *limit) +{ +if (*limit > PCI_CONFIG_SPACE_SIZE) { +if (!pci_bus_is_express(bus)) { +*limit = PCI_CONFIG_SPACE_SIZE; +return; +} + +if (!pci_bus_is_root(bus)) { +PCIDevice *bridge = pci_bridge_get_device(bus); +pci_adjust_config_limit(pci_get_bus(bridge), limit); +} +} +} + void pci_host_config_write_common(PCIDevice *pci_dev, uint32_t addr, uint32_t limit, uint32_t val, uint32_t len) { +pci_adjust_config_limit(pci_get_bus(pci_dev), ); +if (limit <= addr) { +return; +} + assert(len <= 4); /* non-zero functions are only exposed when function 0 is present, * allowing direct removal of unexposed functions. @@ -71,6 +92,11 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, uint32_t addr, { uint32_t ret; +pci_adjust_config_limit(pci_get_bus(pci_dev), ); +if (limit <= addr) { +return ~0x0; +} + assert(len <= 4); /* non-zero functions are only exposed when function 0 is present, * allowing direct removal of unexposed functions. -- MST
[Qemu-devel] [PULL v3 37/44] pci/pcie: perform unplug via the hotplug handler
From: David Hildenbrand Introduce and use the "unplug" callback. This is a preparation for multi-stage hotplug handlers, whereby the bus hotplug handler is overwritten by the machine hotplug handler. This handler will then pass control to the bus hotplug handler. So to get this running cleanly, we also have to make sure to go via the hotplug handler chain when actually unplugging a device after an unplug request. Lookup the hotplug handler and call "unplug". Reviewed-by: David Gibson Reviewed-by: Igor Mammedov Signed-off-by: David Hildenbrand Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- include/hw/pci/pcie.h | 2 ++ hw/pci/pcie.c | 10 +- hw/pci/pcie_port.c| 1 + 3 files changed, 12 insertions(+), 1 deletion(-) diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h index d51ca23f07..cd318646a2 100644 --- a/include/hw/pci/pcie.h +++ b/include/hw/pci/pcie.h @@ -134,6 +134,8 @@ void pcie_ats_init(PCIDevice *dev, uint16_t offset); void pcie_cap_slot_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp); +void pcie_cap_slot_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, + Error **errp); void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp); #endif /* QEMU_PCIE_H */ diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c index ec3b1145f3..2d3d8a047b 100644 --- a/hw/pci/pcie.c +++ b/hw/pci/pcie.c @@ -442,11 +442,19 @@ void pcie_cap_slot_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, } } -static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque) +void pcie_cap_slot_unplug_cb(HotplugHandler *hotplug_dev, DeviceState *dev, + Error **errp) { object_unparent(OBJECT(dev)); } +static void pcie_unplug_device(PCIBus *bus, PCIDevice *dev, void *opaque) +{ +HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(DEVICE(dev)); + +hotplug_handler_unplug(hotplug_ctrl, DEVICE(dev), _abort); +} + void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev, DeviceState *dev, Error **errp) { diff --git a/hw/pci/pcie_port.c b/hw/pci/pcie_port.c index 73e81e5847..bc07abc31b 100644 --- a/hw/pci/pcie_port.c +++ b/hw/pci/pcie_port.c @@ -155,6 +155,7 @@ static void pcie_slot_class_init(ObjectClass *oc, void *data) dc->props = pcie_slot_props; hc->plug = pcie_cap_slot_plug_cb; +hc->unplug = pcie_cap_slot_unplug_cb; hc->unplug_request = pcie_cap_slot_unplug_request_cb; } -- MST
[Qemu-devel] [PULL v3 28/44] hw: acpi: Export and share the ARM RSDP build
From: Samuel Ortiz Now that build_rsdp() supports building both legacy and current RSDP tables, we can move it to a generic folder (hw/acpi) and have the i386 ACPI code reuse it in order to reduce code duplication. Signed-off-by: Samuel Ortiz Reviewed-by: Igor Mammedov Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin Reviewed-by: Andrew Jones --- include/hw/acpi/aml-build.h | 2 ++ hw/acpi/aml-build.c | 68 + hw/arm/virt-acpi-build.c| 65 --- hw/i386/acpi-build.c| 49 +++--- 4 files changed, 89 insertions(+), 95 deletions(-) diff --git a/include/hw/acpi/aml-build.h b/include/hw/acpi/aml-build.h index 6c36903c0a..1a563ad756 100644 --- a/include/hw/acpi/aml-build.h +++ b/include/hw/acpi/aml-build.h @@ -388,6 +388,8 @@ void acpi_add_table(GArray *table_offsets, GArray *table_data); void acpi_build_tables_init(AcpiBuildTables *tables); void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre); void +build_rsdp(GArray *tbl, BIOSLinker *linker, AcpiRsdpData *rsdp_data); +void build_rsdt(GArray *table_data, BIOSLinker *linker, GArray *table_offsets, const char *oem_id, const char *oem_table_id); void diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c index 1e43cd736d..555c24f21d 100644 --- a/hw/acpi/aml-build.c +++ b/hw/acpi/aml-build.c @@ -1589,6 +1589,74 @@ void acpi_build_tables_cleanup(AcpiBuildTables *tables, bool mfre) g_array_free(tables->vmgenid, mfre); } +/* + * ACPI spec 5.2.5.3 Root System Description Pointer (RSDP). + * (Revision 1.0 or later) + */ +void +build_rsdp(GArray *tbl, BIOSLinker *linker, AcpiRsdpData *rsdp_data) +{ +int tbl_off = tbl->len; /* Table offset in the RSDP file */ + +switch (rsdp_data->revision) { +case 0: +/* With ACPI 1.0, we must have an RSDT pointer */ +g_assert(rsdp_data->rsdt_tbl_offset); +break; +case 2: +/* With ACPI 2.0+, we must have an XSDT pointer */ +g_assert(rsdp_data->xsdt_tbl_offset); +break; +default: +/* Only revisions 0 (ACPI 1.0) and 2 (ACPI 2.0+) are valid for RSDP */ +g_assert_not_reached(); +} + +bios_linker_loader_alloc(linker, ACPI_BUILD_RSDP_FILE, tbl, 16, + true /* fseg memory */); + +g_array_append_vals(tbl, "RSD PTR ", 8); /* Signature */ +build_append_int_noprefix(tbl, 0, 1); /* Checksum */ +g_array_append_vals(tbl, rsdp_data->oem_id, 6); /* OEMID */ +build_append_int_noprefix(tbl, rsdp_data->revision, 1); /* Revision */ +build_append_int_noprefix(tbl, 0, 4); /* RsdtAddress */ +if (rsdp_data->rsdt_tbl_offset) { +/* RSDT address to be filled by guest linker */ +bios_linker_loader_add_pointer(linker, ACPI_BUILD_RSDP_FILE, + tbl_off + 16, 4, + ACPI_BUILD_TABLE_FILE, + *rsdp_data->rsdt_tbl_offset); +} + +/* Checksum to be filled by guest linker */ +bios_linker_loader_add_checksum(linker, ACPI_BUILD_RSDP_FILE, +tbl_off, 20, /* ACPI rev 1.0 RSDP size */ +8); + +if (rsdp_data->revision == 0) { +/* ACPI 1.0 RSDP, we're done */ +return; +} + +build_append_int_noprefix(tbl, 36, 4); /* Length */ + +/* XSDT address to be filled by guest linker */ +build_append_int_noprefix(tbl, 0, 8); /* XsdtAddress */ +/* We already validated our xsdt pointer */ +bios_linker_loader_add_pointer(linker, ACPI_BUILD_RSDP_FILE, + tbl_off + 24, 8, + ACPI_BUILD_TABLE_FILE, + *rsdp_data->xsdt_tbl_offset); + +build_append_int_noprefix(tbl, 0, 1); /* Extended Checksum */ +build_append_int_noprefix(tbl, 0, 3); /* Reserved */ + +/* Extended checksum to be filled by Guest linker */ +bios_linker_loader_add_checksum(linker, ACPI_BUILD_RSDP_FILE, +tbl_off, 36, /* ACPI rev 2.0 RSDP size */ +32); +} + /* Build rsdt table */ void build_rsdt(GArray *table_data, BIOSLinker *linker, GArray *table_offsets, diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 05f6654371..95fad6f0ce 100644 --- a/hw/arm/virt-acpi-build.c +++ b/hw/arm/virt-acpi-build.c @@ -366,71 +366,6 @@ static void acpi_dsdt_add_power_button(Aml *scope) aml_append(scope, dev); } -/* RSDP */ -static void -build_rsdp(GArray *tbl, BIOSLinker *linker, AcpiRsdpData *rsdp_data) -{ -int tbl_off = tbl->len; /* Table offset in the RSDP file */ - -switch (rsdp_data->revision) { -case 0: -/* With ACPI 1.0, we must have an RSDT pointer */ -g_assert(rsdp_data->rsdt_tbl_offset); -break; -case 2: -/* With ACPI 2.0+, we
[Qemu-devel] [PULL v3 40/44] spapr_pci: perform unplug via the hotplug handler
From: David Hildenbrand Introduce and use the "unplug" callback. This is a preparation for multi-stage hotplug handlers, whereby the bus hotplug handler is overwritten by the machine hotplug handler. This handler will then pass control to the bus hotplug handler. So to get this running cleanly, we also have to make sure to go via the hotplug handler chain when actually unplugging a device after an unplug request. Lookup the hotplug handler and call "unplug". Reviewed-by: Greg Kurz Reviewed-by: Igor Mammedov Acked-by: David Gibson Signed-off-by: David Hildenbrand Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- hw/ppc/spapr_pci.c | 33 + 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c index 2374d55fc1..bfb02ee96b 100644 --- a/hw/ppc/spapr_pci.c +++ b/hw/ppc/spapr_pci.c @@ -1370,18 +1370,9 @@ static int spapr_create_pci_child_dt(sPAPRPHBState *phb, PCIDevice *dev, /* Callback to be called during DRC release. */ void spapr_phb_remove_pci_device_cb(DeviceState *dev) { -/* some version guests do not wait for completion of a device - * cleanup (generally done asynchronously by the kernel) before - * signaling to QEMU that the device is safe, but instead sleep - * for some 'safe' period of time. unfortunately on a busy host - * this sleep isn't guaranteed to be long enough, resulting in - * bad things like IRQ lines being left asserted during final - * device removal. to deal with this we call reset just prior - * to finalizing the device, which will put the device back into - * an 'idle' state, as the device cleanup code expects. - */ -pci_device_reset(PCI_DEVICE(dev)); -object_unparent(OBJECT(dev)); +HotplugHandler *hotplug_ctrl = qdev_get_hotplug_handler(dev); + +hotplug_handler_unplug(hotplug_ctrl, dev, _abort); } static sPAPRDRConnector *spapr_phb_get_pci_func_drc(sPAPRPHBState *phb, @@ -1490,6 +1481,23 @@ out: } } +static void spapr_pci_unplug(HotplugHandler *plug_handler, + DeviceState *plugged_dev, Error **errp) +{ +/* some version guests do not wait for completion of a device + * cleanup (generally done asynchronously by the kernel) before + * signaling to QEMU that the device is safe, but instead sleep + * for some 'safe' period of time. unfortunately on a busy host + * this sleep isn't guaranteed to be long enough, resulting in + * bad things like IRQ lines being left asserted during final + * device removal. to deal with this we call reset just prior + * to finalizing the device, which will put the device back into + * an 'idle' state, as the device cleanup code expects. + */ +pci_device_reset(PCI_DEVICE(plugged_dev)); +object_unparent(OBJECT(plugged_dev)); +} + static void spapr_pci_unplug_request(HotplugHandler *plug_handler, DeviceState *plugged_dev, Error **errp) { @@ -1965,6 +1973,7 @@ static void spapr_phb_class_init(ObjectClass *klass, void *data) dc->user_creatable = true; set_bit(DEVICE_CATEGORY_BRIDGE, dc->categories); hp->plug = spapr_pci_plug; +hp->unplug = spapr_pci_unplug; hp->unplug_request = spapr_pci_unplug_request; } -- MST