Re: [Xen-devel] [PATCH v0 RFC 0/2] xl/libxl support for PVUSB
On Mon, Nov 10, 2014 at 01:37:44AM -0700, Chun Yan Liu wrote: Is there any progress on this work? I didn't see new version after this. Anyone knows the status? I believe Olaf and Juergen were looking at this for Xen 4.6? CC-ing them. Thanks, Chunyan On 8/11/2014 at 04:23 AM, in message 1407702234-22309-1-git-send-email-caobosi...@gmail.com, Bo Cao caobosi...@gmail.com wrote: Finally I have a workable version xl/libxl support for PVUSB. Most of its commands work property now, but there are still some probelm to be solved. Please take a loot and give me some advices. == What have been implemented ? == I have implemented libxl functions for PVUSB in libxl_usb.c. It mainly consists of two part: usbctrl_add/remove/list and usb_add/remove/list in which usbctrl denote usb controller in which usd device can be plugged in. I don't use ao_dev in libxl_deivce_usbctrl_add since we don't need to execute hotplug script for usbctrl and without ao_dev, adding default usbctrl for usb device would be easier. For the cammands to manipulate usb device such as xl usb-attach and xl usb-detach, this patch now only support to specify usb devices by their interface in sysfs. Using this interface, we can read usb device information through sysfs and bind/unbind usb device. (The support for mapping the lsusb bus:addr to the sysfs usb interface will come later). == What needs to do next ? == There are two main problems to be solved. 1. PVUSB Options in VM Guest's Configuration File The interface in VM Guest's configuration file to add usb device is: usb=[interface=1-1]. But the problem is now is that after the default usbctrl is added, the state of usbctrl is 2, e,g, XenbusStateInitWait, waiting for xen-usbfront to connect. The xen-usbfront in VM Guest isn't loaded. Therefore, sysfs_intf_write will report error. Does anyone have any clue how to solve this? 2. sysfs_intf_write In the process of xl usb-attach domid intf=1-1, after writing 1-1 to Xenstore entry, we need to bind the controller of this usb device to usbback driver so that it can be used by VM Guest. For exampele, for usb device 1-1, it's controller interface maybe 1-1:1.0, and we write this value to /sys/bus/usb/driver/usbback/bind. But for some devices, they have two controllers, for example 1-1:1.0 and 1-1:1.1. I think this means it has two functions, such as usbhid and usb-storage. So in this case, we bind the two controller to usbback? There maybe some errors or bugs in the codes. Feel free to tell me. Cheers, - Simon --- CC: George Dunlap george.dun...@eu.citrix.com CC: Ian Jackson ian.jack...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Pasi Kärkkäinen pa...@iki.fi CC: Lars Kurth lars.ku...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PCI passthrough (pci-attach) to HVM guests bug (BAR64 addresses are bogus)
On Mon, Nov 10, 2014 at 04:32:48PM -0500, Konrad Rzeszutek Wilk wrote: On Mon, Nov 10, 2014 at 01:07:20PM -0500, Konrad Rzeszutek Wilk wrote: On Mon, Nov 10, 2014 at 05:42:32PM +, David Vrabel wrote: On 10/11/14 17:32, Konrad Rzeszutek Wilk wrote: Hey, With Xen 4.5 (today's staging), when I boot a guest and then do pci-attach the BARs values are corrupt. I can reproduce this with Xen 4.4, Xen 4.3 and Xen 4.1. A bit digging in and I realized that: (XEN) memory_map:add: dom1 gfn=f4000 mfn=d8000 nr=4000 [64M] (XEN) AMD-Vi: update_paging_mode Try to access pdev_list without aquiring pcidevs_lock. (XEN) memory_map:add: dom1 gfn=f8000 mfn=fc000 nr=2000 [32M] (XEN) ioport_map:add: dom1 gport=1000 mport=c000 nr=80 (XEN) AMD-Vi: Disable: device id = 0x500, domain = 0, paging mode = 3 (XEN) AMD-Vi: Setup I/O page table: device id = 0x500, type = 0x1, root table = 0x228b02000, domain = 1, paging mode = 3 The sizes are my own editing. This means QEMU is putting the devices in the MMIO region - and doing it succesfully. But then: [ 152.572965] pci :00:04.0: BAR 1: no space for [mem size 0x0800 64bit pref] [ 152.518320] pci :00:04.0: reg 0x14: [mem 0x-0x07ff 64bit pref] .. The guest computes the right size for them, but reads the wrong BAR value that was set by QEMU and also created in the hypervisor. Perhaps this is Linux kernel being on fritz. Will try another kernel. I figured this out. When we pass in the device at bootup, the hvmloader does: (d4) pci dev 05:0 bar 14 size 00800: 0e00c (d4) pci dev 05:0 bar 1c size 00400: 0e80c (d4) pci dev 05:0 bar 10 size 00200: 0ec00 (d4) pci dev 05:0 bar 24 size 00080: 0c201 That is - it finds the size, and then it sets the BARs to fit within the MMIO region. QEMU is not involved in this. When we PCI insert an device, the BARs are not set at all - and hence the Linux kernel is the one that tries to set the BARs in. The reason it cannot fit the device in the MMIO region is due to the _CRS only having certain ranges (even thought the MMIO region can cover 2GB). See: Without any devices (and me doing PCI insertion after that): # dmesg | grep bus resource [0.366000] pci_bus :00: root bus resource [bus 00-ff] [0.366000] pci_bus :00: root bus resource [io 0x-0x0cf7] [0.366000] pci_bus :00: root bus resource [io 0x0d00-0x] [0.366000] pci_bus :00: root bus resource [mem 0x000a-0x000b] [0.366000] pci_bus :00: root bus resource [mem 0xf000-0xfbff] With the device (my GPU card) inserted so that hvmloader can enumerate it: dmesg | grep 'resource' [0.455006] pci_bus :00: root bus resource [bus 00-ff] [0.459006] pci_bus :00: root bus resource [io 0x-0x0cf7] [0.462006] pci_bus :00: root bus resource [io 0x0d00-0x] [0.466006] pci_bus :00: root bus resource [mem 0x000a-0x000b] [0.469006] pci_bus :00: root bus resource [mem 0xe000-0xfbff] I chatted with Bjorn and Rafeal on IRC about how PCI insertion works on baremetal and it sounds like Thunderbolt device insertion is an interesting case. The SMM sets the BAR regions to fit within the MMIO (which is advertised by the _CRS) and it then pokes the OS to enumerate the BARs. The OS is free to use what the firmware has set or renumber it. The end result is that since the SMM 'fits' the BAR inside the pre-set _CRS window it all works. We do not do that. The two ways I could think of making this work are: - QEMU tracks BAR enumeration. When a new device is inserted it would set the BAR to fit within the E820 HOLE region. If it can't (because the MMIO is too small) it puts it at the end of the memory. Naturally the 'end of the memory' part would require adding _CRS to cover end of GPFN to never never land. And also the _CRS region for the MMIO under 4GB would have to be expanded so QEMU can jam things in there. - Or add in dsdt.asl another _CRS region controlled by the hvmloader. This one would start at the end of GPFN + delta of maxmem - mem and continue to never never land. The hvmloader would just write the the values in the BIOS OperationRegion (0xFC00) and let the AML code take care of parsing it and constructing the #9 _CRS region. This will allow kernels who are picky about BARs not being in _CRS region to deal with cards that are hot-plugged past BIOS boot. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-xen-4.5] tools/hotplug: use configure --sysconfdir result
On Wed, Nov 12, 2014 at 11:12:06AM +, Ian Campbell wrote: You forgot to add the release manager... I've done that for you. In 1413279117.1497.25.ca...@citrix.com I said: Acked-by: Ian Campbell ian.campb...@citrix.com Is this a bug fix or a feature? What are the risks? IsLKonrad OK with it? Back again to that question. What happens if we do not take that in now but delay to Xen 4.6? Will systemd still correctly work? It looks like it will and this is just an improvement that makes the code be more streamlined. It does not fix a bug (at least that is what I see from reading), I believe this should be deferred to Xen 4.6. On Wed, 2014-11-12 at 12:06 +0100, Olaf Hering wrote: Ping? ... instead of hardcoding values and guess where they config files may be. Also use the result of --with-sysconfig-leaf-dir. Signed-off-by: Olaf Hering o...@aepfle.de Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Stefano Stabellini stefano.stabell...@eu.citrix.com Cc: Wei Liu wei.l...@citrix.com --- tools/hotplug/Linux/init.d/xencommons.in | 6 +- tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in| 3 +-- tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in | 3 +-- tools/hotplug/Linux/systemd/xenconsoled.service.in| 3 +-- tools/hotplug/Linux/systemd/xenstored.service.in | 3 +-- tools/hotplug/Linux/xendomains.in | 6 +- 6 files changed, 6 insertions(+), 18 deletions(-) diff --git a/tools/hotplug/Linux/init.d/xencommons.in b/tools/hotplug/Linux/init.d/xencommons.in index d53a1f3..a1095c2 100644 --- a/tools/hotplug/Linux/init.d/xencommons.in +++ b/tools/hotplug/Linux/init.d/xencommons.in @@ -23,11 +23,7 @@ BACKEND_MODULES=@LINUX_BACKEND_MODULES@ . @XEN_SCRIPT_DIR@/hotplugpath.sh -if [ -d /etc/sysconfig ]; then - xencommons_config=/etc/sysconfig -else - xencommons_config=/etc/default -fi +xencommons_config=@CONFIG_DIR@/@CONFIG_LEAF_DIR@ test -f $xencommons_config/xencommons . $xencommons_config/xencommons diff --git a/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in b/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in index 44dfce8..1e930ed 100644 --- a/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in +++ b/tools/hotplug/Linux/systemd/var-lib-xenstored.mount.in @@ -5,8 +5,7 @@ RefuseManualStop=true [Mount] Environment=XENSTORED_MOUNT_CTX=none -EnvironmentFile=-/etc/sysconfig/xenstored -EnvironmentFile=-/etc/default/xenstored +EnvironmentFile=-@CONFIG_DIR@/@CONFIG_LEAF_DIR@/xenstored What=xenstore Where=@XEN_LIB_STORED@ Type=tmpfs diff --git a/tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in b/tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in index d3470fc..2282923 100644 --- a/tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in +++ b/tools/hotplug/Linux/systemd/xen-qemu-dom0-disk-backend.service.in @@ -8,8 +8,7 @@ ConditionVirtualization=xen [Service] Type=simple -EnvironmentFile=-/etc/default/xenstored -EnvironmentFile=-/etc/sysconfig/xenstored +EnvironmentFile=-@CONFIG_DIR@/@CONFIG_LEAF_DIR@/xenstored PIDFile=@XEN_RUN_DIR@/qemu-dom0.pid ExecStartPre=/bin/grep -q control_d /proc/xen/capabilities ExecStartPre=/bin/mkdir -p @XEN_RUN_DIR@ diff --git a/tools/hotplug/Linux/systemd/xenconsoled.service.in b/tools/hotplug/Linux/systemd/xenconsoled.service.in index 7ca0264..377f131 100644 --- a/tools/hotplug/Linux/systemd/xenconsoled.service.in +++ b/tools/hotplug/Linux/systemd/xenconsoled.service.in @@ -9,8 +9,7 @@ Type=simple Environment=XENCONSOLED_ARGS= Environment=XENCONSOLED_LOG=none Environment=XENCONSOLED_LOG_DIR=@XEN_LOG_DIR@/console -EnvironmentFile=-/etc/default/xenconsoled -EnvironmentFile=-/etc/sysconfig/xenconsoled +EnvironmentFile=-@CONFIG_DIR@/@CONFIG_LEAF_DIR@/xenconsoled PIDFile=@XEN_RUN_DIR@/xenconsoled.pid ExecStartPre=/bin/grep -q control_d /proc/xen/capabilities ExecStartPre=/bin/mkdir -p ${XENCONSOLED_LOG_DIR} diff --git a/tools/hotplug/Linux/systemd/xenstored.service.in b/tools/hotplug/Linux/systemd/xenstored.service.in index 013e69e..f85b37d 100644 --- a/tools/hotplug/Linux/systemd/xenstored.service.in +++ b/tools/hotplug/Linux/systemd/xenstored.service.in @@ -11,8 +11,7 @@ Type=notify Environment=XENSTORED_ARGS= Environment=XENSTORED_ROOTDIR=@XEN_LIB_STORED@ Environment=XENSTORED=@XENSTORED@ -EnvironmentFile=-/etc/default/xencommons -EnvironmentFile=-/etc/sysconfig/xencommons +EnvironmentFile=-@CONFIG_DIR@/@CONFIG_LEAF_DIR@/xencommons ExecStartPre=/bin/grep -q control_d /proc/xen/capabilities ExecStartPre=-/bin/rm -f
Re: [Xen-devel] [PATCH for-4.5] libxl: add missing action in DEFINE_DEVICE_ADD
On Wed, Nov 12, 2014 at 10:56:24AM +, Ian Campbell wrote: On Wed, 2014-11-12 at 10:39 +, Wei Liu wrote: ... otherwise when device add operation fails, the error message looks like libxl: error: libxl.c:1897:device_addrm_aocomplete: unable to (null) device, which is not very helpful. Thanks. Signed-off-by: Wei Liu wei.l...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- tools/libxl/libxl.c |1 + 1 file changed, 1 insertion(+) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index f7961f6..de23fec 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -4164,6 +4164,7 @@ DEFINE_DEVICE_REMOVE(vtpm, destroy, 1) \ GCNEW(aodev); \ libxl__prepare_ao_device(ao, aodev);\ +aodev-action = LIBXL__DEVICE_ACTION_ADD; \ aodev-callback = device_addrm_aocomplete; \ aodev-update_json = true; \ libxl__device_##type##_add(egc, domid, type, aodev);\ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] Adjust number of domains in cpupools when destroying domain
On Wed, Nov 12, 2014 at 11:11:15AM +, George Dunlap wrote: On Wed, Nov 12, 2014 at 11:10 AM, Juergen Gross jgr...@suse.com wrote: Commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a (move domain to cpupool0 before destroying it) introduced an error in the accounting of cpupools regarding the number of domains. The number of domains is nor adjusted when a domain is moved to cpupool0 in kill_domain(). Correct this by introducing a cpupool function doing the move instead of open coding it by calling sched_move_domain(). Signed-off-by: Juergen Gross jgr...@suse.com Tested-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com Acked-by: George Dunlap george.dun...@eu.citrix.com Excellent! Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] Adjust number of domains in cpupools when destroying domain
On Wed, Nov 12, 2014 at 12:10:02PM +0100, Juergen Gross wrote: Commit bac6334b51d9bcfe57ecf4a4cb5288348fcf044a (move domain to cpupool0 before destroying it) introduced an error in the accounting of cpupools regarding the number of domains. The number of domains is nor adjusted when a domain is moved to cpupool0 in kill_domain(). s/nor/not/ Correct this by introducing a cpupool function doing the move instead of open coding it by calling sched_move_domain(). Signed-off-by: Juergen Gross jgr...@suse.com Tested-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- xen/common/cpupool.c| 47 +-- xen/common/domain.c | 2 +- xen/include/xen/sched.h | 1 + 3 files changed, 35 insertions(+), 15 deletions(-) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index 73249d3..a758a8b 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -225,6 +225,35 @@ static int cpupool_destroy(struct cpupool *c) } /* + * Move domain to another cpupool + */ +static int cpupool_move_domain_locked(struct domain *d, struct cpupool *c) +{ +int ret; + +d-cpupool-n_dom--; +ret = sched_move_domain(d, c); +if ( ret ) +d-cpupool-n_dom++; +else +c-n_dom++; + +return ret; +} \n ? +int cpupool_move_domain(struct domain *d, struct cpupool *c) +{ +int ret; + +spin_lock(cpupool_lock); + +ret = cpupool_move_domain_locked(d, c); + +spin_unlock(cpupool_lock); + +return ret; +} + +/* * assign a specific cpu to a cpupool * cpupool_lock must be held */ @@ -338,14 +367,9 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu) ret = -EBUSY; break; } -c-n_dom--; -ret = sched_move_domain(d, cpupool0); +ret = cpupool_move_domain_locked(d, cpupool0); if ( ret ) -{ -c-n_dom++; break; -} -cpupool0-n_dom++; } rcu_read_unlock(domlist_read_lock); if ( ret ) @@ -613,16 +637,11 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op) d-domain_id, op-cpupool_id); ret = -ENOENT; spin_lock(cpupool_lock); + c = cpupool_find_by_id(op-cpupool_id); if ( (c != NULL) cpumask_weight(c-cpu_valid) ) -{ -d-cpupool-n_dom--; -ret = sched_move_domain(d, c); -if ( ret ) -d-cpupool-n_dom++; -else -c-n_dom++; -} +ret = cpupool_move_domain_locked(d, c); + spin_unlock(cpupool_lock); cpupool_dprintk(cpupool move_domain(dom=%d)-pool=%d ret %d\n, d-domain_id, op-cpupool_id, ret); diff --git a/xen/common/domain.c b/xen/common/domain.c index a3f51ec..4a62c1d 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -621,7 +621,7 @@ int domain_kill(struct domain *d) rc = -EAGAIN; break; } -if ( sched_move_domain(d, cpupool0) ) +if ( cpupool_move_domain(d, cpupool0) ) return -EAGAIN; for_each_vcpu ( d, v ) unmap_vcpu_info(v); diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index c5157e6..46fc6e3 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -871,6 +871,7 @@ struct cpupool *cpupool_get_by_id(int poolid); void cpupool_put(struct cpupool *pool); int cpupool_add_domain(struct domain *d, int poolid); void cpupool_rm_domain(struct domain *d); +int cpupool_move_domain(struct domain *d, struct cpupool *c); int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op); void schedule_dump(struct cpupool *c); extern void dump_runq(unsigned char key); -- 2.1.2 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [TestDay] Xen-4.5.0 RC1 bug: with xen_platform_pci=0 option, the guest with VT-d device fails to boot up and qemu-xen
Basic root-causing log: -- [root@vt-hsw1 carl]# xl cr xlexample.hvm Parsing config from xlexample.hvm libxl: error: libxl_qmp.c:287:qmp_handle_error_response: received an error message from QMP server: Unsupported bus. Bus doesn't have property 'acpi-pcihp-bsel' set libxl: error: libxl_create.c:1385:domcreate_attach_pci: libxl_device_pci_add failed: -3 I can reproduce this. If you use device_model_version = 'qemu-xen-traditional' the problem go aways and you can boot the guest without Xen PCI platform driver. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-xen-4.5] tools/hotplug: use configure --sysconfdir result
On Wed, Nov 12, 2014 at 06:06:57PM +0100, Olaf Hering wrote: On Wed, Nov 12, Konrad Rzeszutek Wilk wrote: What happens if we do not take that in now but delay to Xen 4.6? I will be very unhappy... I mean, what exactly is the concern here?! I need to know what the risk is if this does not go in. We are at RC2 and I really want to make the amount of patches that go in be a trickle. Will systemd still correctly work? It looks like it will and this is just an improvement that makes the code be more streamlined. It does not fix a bug (at least that is what I see from reading), I believe this should be deferred to Xen 4.6. It does fix a bug. If the whole thing was configured with --prefix=X --sysconfdir=Y parts of it will still refer to hardcoded paths entirely unrelated to what was just installed with 'make install'. How often does that happen? Do the two paths that are specified in the code cover 99% of the use-cases? Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V3 2/8] xen: Delay remapping memory of pv-domain
On Tue, Nov 11, 2014 at 06:43:40AM +0100, Juergen Gross wrote: Early in the boot process the memory layout of a pv-domain is changed to match the E820 map (either the host one for Dom0 or the Xen one) regarding placement of RAM and PCI holes. This requires removing memory pages initially located at positions not suitable for RAM and adding them later at higher addresses where no restrictions apply. To be able to operate on the hypervisor supported p2m list until a virtual mapped linear p2m list can be constructed, remapping must be delayed until virtual memory management is initialized, as the initial p2m list can't be extended unlimited at physical memory initialization time due to it's fixed structure. A further advantage is the reduction in complexity and code volume as we don't have to be careful regarding memory restrictions during p2m updates. Signed-off-by: Juergen Gross jgr...@suse.com Reviewed-by: David Vrabel david.vra...@citrix.com --- arch/x86/include/asm/xen/page.h | 1 - arch/x86/xen/mmu.c | 4 + arch/x86/xen/p2m.c | 149 arch/x86/xen/setup.c| 385 +++- arch/x86/xen/xen-ops.h | 1 + 5 files changed, 223 insertions(+), 317 deletions(-) diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index 6c16451..b475297 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -44,7 +44,6 @@ extern unsigned long machine_to_phys_nr; extern unsigned long get_phys_to_machine(unsigned long pfn); extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn); -extern bool __init early_set_phys_to_machine(unsigned long pfn, unsigned long mfn); extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn); extern unsigned long set_phys_range_identity(unsigned long pfn_s, unsigned long pfn_e); diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index a8a1a3d..d3e492b 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1223,6 +1223,10 @@ static void __init xen_pagetable_init(void) /* Allocate and initialize top and mid mfn levels for p2m structure */ xen_build_mfn_list_list(); + /* Remap memory freed because of conflicts with E820 map */ s/becasue of/due to + if (!xen_feature(XENFEAT_auto_translated_physmap)) + xen_remap_memory(); + xen_setup_shared_info(); xen_post_allocator_init(); } diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index fa75842..f67f8cf 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -164,6 +164,7 @@ #include linux/sched.h #include linux/seq_file.h #include linux/bootmem.h +#include linux/slab.h #include asm/cache.h #include asm/setup.h @@ -204,6 +205,8 @@ RESERVE_BRK(p2m_mid, PAGE_SIZE * (MAX_DOMAIN_PAGES / (P2M_PER_PAGE * P2M_MID_PER */ RESERVE_BRK(p2m_identity_remap, PAGE_SIZE * 2 * 3 * MAX_REMAP_RANGES); +static int use_brk = 1; + static inline unsigned p2m_top_index(unsigned long pfn) { BUG_ON(pfn = MAX_P2M_PFN); @@ -268,6 +271,22 @@ static void p2m_init(unsigned long *p2m) p2m[i] = INVALID_P2M_ENTRY; } +static void * __ref alloc_p2m_page(void) +{ + if (unlikely(use_brk)) + return extend_brk(PAGE_SIZE, PAGE_SIZE); + + if (unlikely(!slab_is_available())) + return alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE); + + return (void *)__get_free_page(GFP_KERNEL | __GFP_REPEAT); +} + +static void free_p2m_page(void *p) +{ + free_page((unsigned long)p); +} + /* * Build the parallel p2m_top_mfn and p2m_mid_mfn structures * @@ -287,13 +306,13 @@ void __ref xen_build_mfn_list_list(void) /* Pre-initialize p2m_top_mfn to be completely missing */ if (p2m_top_mfn == NULL) { - p2m_mid_missing_mfn = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE); + p2m_mid_missing_mfn = alloc_p2m_page(); p2m_mid_mfn_init(p2m_mid_missing_mfn, p2m_missing); - p2m_top_mfn_p = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE); + p2m_top_mfn_p = alloc_p2m_page(); p2m_top_mfn_p_init(p2m_top_mfn_p); - p2m_top_mfn = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE); + p2m_top_mfn = alloc_p2m_page(); p2m_top_mfn_init(p2m_top_mfn); } else { /* Reinitialise, mfn's all change after migration */ @@ -327,7 +346,7 @@ void __ref xen_build_mfn_list_list(void) * missing parts of the mfn tree after * runtime. */ - mid_mfn_p = alloc_bootmem_align(PAGE_SIZE, PAGE_SIZE); + mid_mfn_p = alloc_p2m_page(); p2m_mid_mfn_init(mid_mfn_p, p2m_missing);
Re: [Xen-devel] [PATCH V3 4/8] xen: Delay invalidating extra memory
@@ -376,12 +374,14 @@ void __init xen_build_dynamic_phys_to_machine(void) unsigned long max_pfn; unsigned long pfn; - if (xen_feature(XENFEAT_auto_translated_physmap)) + if (xen_feature(XENFEAT_auto_translated_physmap)) Spurious change. .. snip.. diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 0e5f9b6..8d5985b 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -75,7 +75,6 @@ static unsigned long xen_remap_mfn __initdata = INVALID_P2M_ENTRY; static void __init xen_add_extra_mem(u64 start, u64 size) { - unsigned long pfn; int i; for (i = 0; i XEN_EXTRA_MEM_MAX_REGIONS; i++) { @@ -95,17 +94,74 @@ static void __init xen_add_extra_mem(u64 start, u64 size) printk(KERN_WARNING Warning: not enough extra memory regions\n); memblock_reserve(start, size); +} - xen_max_p2m_pfn = PFN_DOWN(start + size); - for (pfn = PFN_DOWN(start); pfn xen_max_p2m_pfn; pfn++) { - unsigned long mfn = pfn_to_mfn(pfn); +static void __init xen_del_extra_mem(u64 start, u64 size) +{ + int i; + u64 start_r, size_r; - if (WARN_ONCE(mfn == pfn, Trying to over-write 1-1 mapping (pfn: %lx)\n, pfn)) - continue; - WARN_ONCE(mfn != INVALID_P2M_ENTRY, Trying to remove %lx which has %lx mfn!\n, - pfn, mfn); + for (i = 0; i XEN_EXTRA_MEM_MAX_REGIONS; i++) { + start_r = xen_extra_mem[i].start; + size_r = xen_extra_mem[i].size; + + /* Start of region. */ + if (start_r == start) { + BUG_ON(size size_r); + xen_extra_mem[i].start += size; + xen_extra_mem[i].size -= size; + break; + } + /* End of region. */ + if (start_r + size_r == start + size) { + BUG_ON(size size_r); + xen_extra_mem[i].size -= size; + break; + } + /* Mid of region. */ + if (start start_r start start_r + size_r) { + BUG_ON(start + size start_r + size_r); + xen_extra_mem[i].size = start - start_r; + xen_add_extra_mem(start + size, start_r + size_r - + (start + size)); Which ends up calling 'memblock_reserve' for an region it already has reserved. Should we call memblock_free(start_r, size_r - size) before calling this? Or is that not neccessary as memblock_* is pretty smart about this sort of thing? + break; + } + } + memblock_free(start, size); +} + ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V3 6/8] xen: Hide get_phys_to_machine() to be able to tune common path
On Tue, Nov 11, 2014 at 06:43:44AM +0100, Juergen Gross wrote: Today get_phys_to_machine() is always called when the mfn for a pfn is to be obtained. Add a wrapper __pfn_to_mfn() as inline function to be able to avoid calling get_phys_to_machine() when possible as s/when/where/ soon as the switch to a linear mapped p2m list has been done. But your inline function still calls get_phys_to_machine? Signed-off-by: Juergen Gross jgr...@suse.com --- arch/x86/include/asm/xen/page.h | 27 +-- arch/x86/xen/mmu.c | 2 +- arch/x86/xen/p2m.c | 6 +++--- 3 files changed, 25 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index 28fa795..07d8a7b 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -59,6 +59,22 @@ extern int clear_foreign_p2m_mapping(struct gnttab_unmap_grant_ref *unmap_ops, struct page **pages, unsigned int count); extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn); +/* + * When to use pfn_to_mfn(), __pfn_to_mfn() or get_phys_to_machine(): + * - pfn_to_mfn() returns either INVALID_P2M_ENTRY or the mfn. In case of an + * identity entry the identity indicator will be cleared. Why don't you say : In case of identity PFN the same PFN is returned. But you did miss that also the FOREIGN_FRAME_BIT is cleared. + * - __pfn_to_mfn() returns the found entry of the p2m table. A possibly set s/of the/in the/ + * identity indicator will be still set. __pfn_to_mfn() is encapsulating .. be still set if the PFN is an identity one. + * get_phys_to_machine() and might skip that function if possible to speed + * up the common path. How is is skipping that function? The patch below does no such thing? + * - get_phys_to_machine() is basically the same as __pfn_to_mfn(), but + * without any short cuts for the common fast path. Right. Perhpas we should call it 'slow_p2m' instead of the 'get_phys_to_machine'. + */ +static inline unsigned long __pfn_to_mfn(unsigned long pfn) +{ + return get_phys_to_machine(pfn); +} + ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [TestDay] VMX test report for Xen 4.5.0-rc1
On Wed, Nov 12, 2014 at 12:06:20PM -0400, Konrad Rzeszutek Wilk wrote: On Wed, Nov 12, 2014 at 06:58:49AM +, Hu, Robert wrote: Hi All, This is a bug summary for Xen 4.5-rc1 on Intel Server platforms. Yeey! Thank you for doing those tests. Test environment: Xen: Xen 4.5-rc1 Dom0: Linux kernel 3.17.0 Hardware: Intel IVT-EX, Haswell-EP, BDW Client, HSW-EX, IVT-EX, HSW-UP New bugs(9): 4. Not all PFs are available if assign multi VT-d devices to Wndows guest VM http://bugzilla-archived.xenproject.org/bugzilla/show_bug.cgi?id=1896 Please post the bug as an email on xen-devel so we can start the discussion here. 5. Dom0 failed to resume from S3 power state Interstingly enough it works for me on my AMD box. I will shortly try it out with the Intel laptop (X230) (been using Xen 4.4 + 3.17.0). I put it on laptop (Lenovo X230) and used 'pm-suspend' and as well just closing the lid. In both cases it worked. Does that work for you as well? Thanks. Please post the bug as an email on xen-devel so we can start the discussion here. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] xl: correct test condition on libxl_domain_info
On November 12, 2014 8:01:21 PM EST, Chao Peng chao.p.p...@linux.intel.com wrote: On Wed, Nov 12, 2014 at 10:22:07AM -0500, Konrad Rzeszutek Wilk wrote: On Wed, Nov 12, 2014 at 11:10:52AM +, Ian Campbell wrote: CCing the folks who signed-of-by is on the original patch On Wed, 2014-11-12 at 11:05 +, Wei Liu wrote: The `if' statement considered return value 0 from libxl_domain_info an error, while 0 actually means success. Signed-off-by: Wei Liu wei.l...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- This is a bug fix for PSR feature. This feature was added recently and it's not an regression. However, it would be good to have it working correctly since the beginning, and the fix is straightforward, which should be of very low risk. Ack. I concur and I think it should go in Xen 4.5, but I would like their input first. Yeah, I'd like it to go in. Thanks Wei for your quick fix. Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Chao --- tools/libxl/xl_cmdimpl.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 3c9f146..9afef3f 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -7908,7 +7908,7 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid) /* Each domain */ if (domid != INVALID_DOMID) { libxl_dominfo dominfo; -if (!libxl_domain_info(ctx, dominfo, domid)) { +if (libxl_domain_info(ctx, dominfo, domid)) { fprintf(stderr, Failed to get domain info for %d\n, domid); return -1; } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] XenServer results from the Test Day for RC2
On Thu, Nov 13, 2014 at 04:02:47PM +, Andrew Cooper wrote: Hi, While attempting to test Konrads new interrupt injection logic, I got blocked behind yet another Pygrub bug caused by c/s d1b93ea This is the first time I have looked into the issue, but a cursory inspection of the first hunk shows that it cannot possibly be correct as self._default is either an integer or string. I have a possible workaround which I am testing, but cursory review of the patch would have shown that it cannot work as intended. Notice that self._default is now either a string or an integer, defaulting to an integer, and the top level code is updated to require a string. As a result, any bootloader configuration which doesn't explicitly set a default, or still drives this logic with integers (ExtLinux or LiLO) will die with an AttributeError Thank you for testing it. In case you don't get to it - and are preempted by other stuff - please do give me a braindump (and preliminary patch if you have one) so I can dig into it. Thank you! ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] xen: remove Xencomm
On Thu, Nov 13, 2014 at 04:18:58PM +, Jan Beulich wrote: On 13.11.14 at 17:03, t...@xen.org wrote: Being a feature that has only been used by ia64 and/or ppc it doesn't seem like we need to keep it any longer in the tree. So remove it. Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Signed-off-by: Tim Deegan t...@xen.org Acked-by: Jan Beulich jbeul...@suse.com Are folks OK if this is deferred to Xen 4.6? Thank you! ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] kexec-ed kernel possibly needing low memory
On Fri, Nov 14, 2014 at 11:02:54AM +, David Vrabel wrote: On 14/11/14 10:00, Jan Beulich wrote: David, we're being approached with the situation where a disk driver in the kexec-ed kernel needs memory below 4G in order to perform DMA (e.g. for the swiotlb to be set up). Linux not so long ago invented a two area approach, which doesn't fit with the current single KEXEC_RANGE_MA_CRASH area obtainable via KEXEC_CMD_kexec_get_range. I see multiple options - do no change at all; the user can deal with this by explicitly specifying an area below 4G via crashkernel= This is what we do. - add KEXEC_RANGE_MA_CRASH_LOW If you choose this option, it would be preferable to support N (which might be 2) arbitrary crash regions rather than specifying that one region is always low memory. I would suggest combining this with a way to specify the bounds of each region. e.g., crashkernel=32M@4G,128M We could also implement the 'autoplacement' feature that Red Hat has put in their kexec implementation (not sure if it was upstreamed). That would nicely deal with this by always putting it under 4GB. I wouldn't go this route unless you actually need a large crash region that would use up too much low memory otherwise. - when not asked for a specific address, always allocate the (single) area below 4G if there is enough space I don't think the default location should change. A user might have specified a large crash region that might use up most of low memory. - provide a means to request allocating the (single) area below 4G (or perhaps more generically below a certain boundary) without requiring an exact address to be specified This sounds ok. Do you have any preference here, or do you see other viable alternatives? My preference would be option 1 since it already works, then option 4. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC] EFI: allow retry of ExitBootServices() call
On Fri, Nov 14, 2014 at 12:37:30PM +, Jan Beulich wrote: The specification is kind of vague under what conditions ExitBootServices() may legitimately fail, requiring the OS loader to retry: If MapKey value is incorrect, ExitBootServices() returns EFI_INVALID_PARAMETER and GetMemoryMap() with ExitBootServices() must be called again. Firmware implementation may choose to do a partial shutdown of the boot services during the first call to ExitBootServices(). EFI OS loader should not make calls to any boot service function other then GetMemoryMap() after the first call to ExitBootServices(). While our code guarantees the map key to be valid, there are systems where a firmware internal notification sent while processing ExitBootServices() reportedly results in changes to the memory map. s/reportedly/in fact/ In that case, make a best effort second try: Avoid any boot service calls other than the two named above, with the possible exception of error paths. Those aren't a problem, since if we end up needing to retry, we're hosed when something goes wrong as much as if we didn't make the retry attempt. For x86, a minimal adjustment to efi_arch_process_memory_map() is needed for it to cope with potentially being called a second time. Wow. Talk about timing. We saw this and were going to see doing something similar. For arm64, while efi_process_memory_map_bootinfo() is easy to verify that it can safely be called more than once without violating spec constraints, it's not so obvious for fdt_add_uefi_nodes(), hence a step by step approach: - deletion of memory nodes and memory reserve map entries: the 2nd pass shouldn't find any as the 1st one deleted them all, - a chosen node should be found as it got added in the 1st pass, - the various linux,uefi-* nodes all got added during the 1st pass and hence only their contents may get updated. Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/arch/x86/efi/efi-boot.h +++ b/xen/arch/x86/efi/efi-boot.h @@ -140,7 +140,7 @@ static void __init efi_arch_process_memo /* Populate E820 table and check trampoline area availability. */ e = e820map - 1; -for ( i = 0; i map_size; i += desc_size ) +for ( e820nr = i = 0; i map_size; i += desc_size ) { EFI_MEMORY_DESCRIPTOR *desc = map + i; u64 len = desc-NumberOfPages EFI_PAGE_SHIFT; --- a/xen/common/efi/boot.c +++ b/xen/common/efi/boot.c @@ -703,7 +703,7 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SY EFI_GRAPHICS_OUTPUT_PROTOCOL *gop = NULL; EFI_GRAPHICS_OUTPUT_MODE_INFORMATION *mode_info; union string section = { NULL }, name; -bool_t base_video = 0; +bool_t base_video = 0, retry; char *option_str; bool_t use_cfg_file; @@ -1051,17 +1051,23 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SY if ( !efi_memmap ) blexit(LUnable to allocate memory for EFI memory map); -status = efi_bs-GetMemoryMap(efi_memmap_size, efi_memmap, map_key, - efi_mdesc_size, mdesc_ver); -if ( EFI_ERROR(status) ) -PrintErrMesg(LCannot obtain memory map, status); +for ( retry = 0; ; retry = 1 ) +{ +status = efi_bs-GetMemoryMap(efi_memmap_size, efi_memmap, map_key, + efi_mdesc_size, mdesc_ver); +if ( EFI_ERROR(status) ) +PrintErrMesg(LCannot obtain memory map, status); -efi_arch_process_memory_map(SystemTable, efi_memmap, efi_memmap_size, -efi_mdesc_size, mdesc_ver); +efi_arch_process_memory_map(SystemTable, efi_memmap, efi_memmap_size, +efi_mdesc_size, mdesc_ver); -efi_arch_pre_exit_boot(); +efi_arch_pre_exit_boot(); + +status = efi_bs-ExitBootServices(ImageHandle, map_key); +if ( status != EFI_INVALID_PARAMETER || retry ) +break; +} Any reason for just doing the loop at max twice? Could we iterate more than those (say forever?) with an printk at suitable intervals to notify the user? -status = efi_bs-ExitBootServices(ImageHandle, map_key); if ( EFI_ERROR(status) ) PrintErrMesg(LCannot exit boot services, status); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v10 for-xen-4.5 2/2] dpci: Replace tasklet with an softirq
On Fri, Nov 14, 2014 at 03:13:42PM +, Jan Beulich wrote: On 12.11.14 at 03:23, konrad.w...@oracle.com wrote: +static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) +{ +struct domain *d = pirq_dpci-dom; + +ASSERT(spin_is_locked(d-event_lock)); + +switch ( cmpxchg(pirq_dpci-state, 1 STATE_SCHED, 0) ) +{ +case (1 STATE_SCHED): +/* + * We are going to try to de-schedule the softirq before it goes in + * STATE_RUN. Whoever clears STATE_SCHED MUST refcount the 'dom'. + */ +put_domain(d); +/* fallthrough. */ Considering Sander's report, the only suspicious place I find is this one: When the STATE_SCHED flag is set, pirq_dpci is on some CPU's list. What guarantees it to get removed from that list before getting inserted on another one? None. The moment that STATE_SCHED is cleared, 'raise_softirq_for' is free to manipulate the list. We could: - Add another bit, say STATE_ZOMBIE - which pt_pirq_softirq_reset could set, and dpci_softirq - if it sees it - would clear. Said bit would stop 'raise_softirq_for' from trying to do anything. diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index efc66dc..8e9141e 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -50,20 +50,25 @@ static DEFINE_PER_CPU(struct list_head, dpci_list); enum { STATE_SCHED, -STATE_RUN +STATE_RUN, +STATE_ZOMBIE }; /* * This can be called multiple times, but the softirq is only raised once. - * That is until the STATE_SCHED state has been cleared. The state can be - * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'), - * or by 'pt_pirq_softirq_reset' (which will try to clear the state before - * the softirq had a chance to run). + * That is until the STATE_SCHED and STATE_ZOMBIE state has been cleared. The + * STATE_SCHED and STATE_ZOMBIE state can be cleared by the 'dpci_softirq' + * (when it has executed 'hvm_dirq_assist'). The STATE_SCHED can be cleared + * by 'pt_pirq_softirq_reset' (which will try to clear the state before the + * softirq had a chance to run). */ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) { unsigned long flags; +if ( test_bit(STATE_ZOMBIE, pirq_dpci-state) ) +return; + if ( test_and_set_bit(STATE_SCHED, pirq_dpci-state) ) return; @@ -85,7 +90,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) */ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci) { -if ( pirq_dpci-state ((1 STATE_RUN) | (1 STATE_SCHED)) ) +if ( pirq_dpci-state ((1 STATE_RUN) | (1 STATE_SCHED) | (1 STATE_ZOMBIE) ) ) return 1; /* @@ -109,7 +114,7 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) ASSERT(spin_is_locked(d-event_lock)); -switch ( cmpxchg(pirq_dpci-state, 1 STATE_SCHED, 0) ) +switch ( cmpxchg(pirq_dpci-state, 1 STATE_SCHED, 1 STATE_ZOMBIE ) ) { case (1 STATE_SCHED): /* @@ -120,6 +125,7 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) /* fallthrough. */ case (1 STATE_RUN): case (1 STATE_RUN) | (1 STATE_SCHED): +case (1 STATE_RUN) | (1 STATE_SCHED) | (1 STATE_ZOMBIE): /* * The reason it is OK to reset 'dom' when STATE_RUN bit is set is due * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom' @@ -779,6 +785,7 @@ unlock: static void dpci_softirq(void) { unsigned int cpu = smp_processor_id(); +unsigned int reset = 0; LIST_HEAD(our_list); local_irq_disable(); @@ -805,7 +812,15 @@ static void dpci_softirq(void) hvm_dirq_assist(d, pirq_dpci); put_domain(d); } +else +reset = 1; + clear_bit(STATE_RUN, pirq_dpci-state); +if ( reset ) +{ +clear_bit(STATE_ZOMBIE, pirq_dpci-state); +reset = 0; +} } } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V3 2/8] xen: Delay remapping memory of pv-domain
On Fri, Nov 14, 2014 at 05:53:19AM +0100, Juergen Gross wrote: On 11/13/2014 08:56 PM, Konrad Rzeszutek Wilk wrote: + mfn_save = virt_to_mfn(buf); + + while (xen_remap_mfn != INVALID_P2M_ENTRY) { So the 'list' is constructed by going forward - that is from low-numbered PFNs to higher numbered ones. But the 'xen_remap_mfn' is going the other way - from the highest PFN to the lowest PFN. Won't that mean we will restore the chunks of memory in the wrong order? That is we will still restore them in chunks size, but the chunks will be in descending order instead of ascending? No, the information where to put each chunk is contained in the chunk data. I can add a comment explaining this. Right, the MFNs in a chunks are going to be restored in the right order. I was thinking that the chunks (so a set of MFNs) will be restored in the opposite order that they are written to. And oddly enough the chunks are done in 512-3 = 509 MFNs at once? More don't fit on a single page due to the other info needed. So: yes. But you could use two pages - one for the structure and the other for the list of MFNs. That would fix the problem of having only 509 MFNs being contingous per chunk when restoring. Anyhow the point I had that I am worried is that we do not restore the MFNs in the same order. We do it in chunk size which is OK (so the 509 MFNs at once)- but the order we traverse the restoration process is the opposite of the save process. Say we have 4MB of contingous MFNs, so two (err, three) chunks. The first one we iterate is from 0-509, the second is 510-1018, the last is 1019-1023. When we restore (remap) we start with the last 'chunk' so we end up restoring them: 1019-1023, 510-1018, 0-509 order. If we go with using two pages - one for the structure and one for the list of PFNs, we could expand the structure to have an 'next' and 'prev' MFN. When you then traverse in 'xen_remap_memory' you could do: mfn = xen_remap_mfn; while (mfn != INVALID_P2M_ENTRY) { xen_remap_mfn = mfn; set_pte_mfn(buf, mfn, PAGE_KERNEL); mfn = xen_remap_buf.next_area_mfn; } And then you can start from this updated xen_remap_mfn which will start with the first chunk that has been set. Thought at this point it does not matter whether we have a seperate page for the MFNs as the restoration/remap process will put them in the save order that they were saved. + /* Map the remap information */ + set_pte_mfn(buf, xen_remap_mfn, PAGE_KERNEL); + + BUG_ON(xen_remap_mfn != xen_remap_buf.mfns[0]); + + free = 0; + pfn = xen_remap_buf.target_pfn; + for (i = 0; i xen_remap_buf.size; i++) { + mfn = xen_remap_buf.mfns[i]; + if (!released xen_update_mem_tables(pfn, mfn)) { + remapped++; If we fail 'xen_update_mem_tables' we will on the next chunk (so i+1) keep on freeing pages instead of trying to remap. Is that intentional? Could we try to remap? Hmm, I'm not sure this is worth the effort. What could lead to failure here? I suspect we could even just BUG() on failure. What do you think? I was hoping that this question would lead to making this loop a bit simpler as you would have to spread some of the code in the loop into functions. And keep 'remmaped' and 'released' reset every loop. However, if it makes the code more complex - then please forget my question. Using BUG() instead would make the code less complex. Do you really think xen_update_mem_tables() would ever fail in a sane system? - set_phys_to_machine() would fail only on a memory shortage. Just going on without adding more memory wouldn't lead to a healthy system, I think. - The hypervisor calls would fail only in case of parameter errors. This should never happen, so dying seems to be the correct reaction. David, what do you think? Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq
On Fri, Nov 14, 2014 at 05:59:23PM +0100, Sander Eikelenboom wrote: Friday, November 14, 2014, 4:43:58 PM, you wrote: On 14.11.14 at 16:20, li...@eikelenboom.it wrote: If it still helps i could try Andrews suggestion and try out with only commit aeeea485 .. Yes, even if it's pretty certain it's the second of the commits, verifying this would be helpful (or if the assumption is wrong, the pattern it's dying with would change and hence perhaps provide further clues). Jan Ok with a revert of f6dd295 .. it survived cooking and eating a nice bowl of pasta without a panic. So it would probably be indeed that specific commit. Thank you for confirmation. Could you give some specifics on the guests? As in what kind of devices you are giving it - and more interestingly - what type of interrupt mechanism do they use (/proc/interrupts along with 'dmesg' | grep name of driver should give some ideas). I wouldn't worry about the PV case as that bypasses the dpci codebase - just on the HVM side. Thanks again! ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 v2 0/4] libxl: CODING_STYLE
On Fri, Nov 14, 2014 at 06:52:37PM +, Ian Jackson wrote: The structural considerations, error handling patterns, and so on, in libxl have remained undocumented. This has been a problem during recent code submissions and reviews. In this series I have attempted to document current best practice. The first patch contains much new material and the others are minor and consequential fixes. *1/4 libxl: CODING_STYLE: Much new material 2/4 libxl: CODING_STYLE: Deprecate `error' for out blocks 3/4 libxl: CODING_STYLE: Mention function out parameters *4/4 libxl: CODING_STYLE: Discuss existing style problems * = Modified patches, to take into account review comments from Ian Campbell. I would like to suggest that this ought to go into 4.5 because: - it would be useful documentation to assist with any nontrivial bugfixes that need to be made to libxl - there isn't any risk of it actually breaking anything, because it's just doc changes. Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PCI and VGA Passthrough regressions on Xen 4.4.1 vs 4.3.2
On November 15, 2014 6:59:51 AM EST, Zir Blazer zir_bla...@hotmail.com wrote: I'm happy that you managed to reproduce the sound issue. It means that it will be possible to track it down and fix it. So, if I understanded properly: It does NOT happen on your custom Xen 4.4 build, but the one that Fedora got is affected by it. I did an 'custom' build (the only custom thing is that it is stock and has debug=y) and with Xen 4.4 from that build I can hear the sound issue. From my testing - Xen 4.5 - good. Xen 4.4 - not good. Hadn't yet done 4.3. Which is excellent as it removes variables. Could it be that your unstable build got any recent patch that fixes it that Fedora doesn't have should it be using the stable source? Should be the same with Arch Linux. Have your tried with the workaround of not using xen-pciback for the Sound Card and instead using xl pci-assignable-add? It should work, too. The kernel I am using as dom0 has no sound drivers so the pre-init workaround is not in effect. Depending on the list of patches - it might be easier to bisect based on 4.5 instead of Xen 4.3 or 4.4. Anyhow have to figure out how to make this automated to do the week long bisection. For my Radeon 5770, I use Catalyst 12.1 Drivers. The reason is that they're the latest which includes OpenCL support out of the box in Windows XP. It got removed from later versions, and not so much later, AMD branched out old GPU into another set of Drivers. If I recall correctly, the 12.1 are still unified, so they should work for your 4xxx. Cause AMD website has become a pain in the ass to browse if you're looking for older Driver versions archives, you can use the TechPowerUp mirror: www.techpowerup.com/downloads/2096/amd-catalyst-12-1-software-suite-winxp-32-bit/mirrors Download with confidence, they're the same ones that I'm using and TechPowerUp is a reputable Hardware reviewer website. Otherwise, you can download the latest legacy 14.4 from AMD website: support.amd.com/en-us/download/desktop/legacy?product=legacy2os=Windows%20XP%20-%20Professional/HomeRenderOnServer=true Ah! Thank you for the link! I will stay tuned on this. Date: Fri, 14 Nov 2014 17:12:19 -0500 From: konrad.w...@oracle.com To: zir_bla...@hotmail.com CC: xen-devel@lists.xen.org Subject: Re: [Xen-devel] PCI and VGA Passthrough regressions on Xen 4.4.1 vs 4.3.2 On Fri, Nov 14, 2014 at 04:08:14PM -0500, Konrad Rzeszutek Wilk wrote: On Mon, Oct 06, 2014 at 03:55:36AM -0300, Zir Blazer wrote: There is a regression in both PCI and VGA Passthrough in Xen 4.4.1 when compared to 4.3.2. Due to the added complexity of VGA Passthrough, I was suggested to focus instead in my PCI Passthrough issue, which involves the integrated Sound Card of my Motherboard. While it passes to the DomU and works, it produces robotic, distorted and lagged noise everytime it reproduces sound. The Video Card is an absolute no-go. On the previous Xen version, everything else being equal, both works properly.I have been trying to gather as much information as possible of my issue according to Xen Wiki articles http://wiki.xen.org/wiki/Debugging_Xen and http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen so this E-Mail comes attached with tons of logs. HARDWARE Processor: Intel Xeon E3-1245V3 (Haswell) - http://ark.intel.com/products/75462/Intel-Xeon-Processor-E3-1245-v3-8M-Cache-3_40-GHz Motherboard: Supermicro X10SAT (Chipset C226, BIOS R2.0) - http://www.supermicro.com/products/motherboard/Xeon/C220/X10SAT.cfm Sound Card: Integrated Realtek ALC1150 So I tried this on my box which has similar specs: Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz [ 0.00] DMI: Supermicro X10SAE/X10SAE, BIOS 2.00 04/21/2014 with todays' xen-unstable and with a simple guest config: builder='hvm' memory = 2048 name = WinXP vcpus=1 vif = [ 'type=ioemu, mac=00:0F:4B:00:00:61,bridge=switch' ] disk=[ 'phy:/dev/G/WinXP,hda,w'] vnc=1 videoram=8 vnclisten=0.0.0.0 vncpasswd='' stdvga=0 usb=1 usbdevice='tablet' qemu_model_version=traditional device_model_version = 'qemu-xen-traditional' pci=[01:00.0,01:00.1,00:1b.0] # lspci -s 01:00.* 01:00.0 VGA compatible controller: ATI Technologies Inc RV770 [Radeon HD 4870] 01:00.1 Audio device: ATI Technologies Inc HD48x0 audio # lspci -s 00:1b.0 00:1b.0 Audio device: Intel Corporation Device 8c20 (rev 04) This is using Windows XP 32 (don't have 64bit laying around) and when using the Realtek HD Sound Effects and the '3D Audio Demo' it plays nicely on my speakers. Aha! But if I use Xen 4.4 from Fedora 20 I get an horrible sound. The plot thickens! The other interesting thing is that my unstable build is with 'debug=y' while the Xen 4.4 from Fedora is not. The VGA is a different story as it seems there are no Video drivers for XP for it available at all! # ___ Xen-devel mailing list
Re: [Xen-devel] [PATCH RFC] EFI: allow retry of ExitBootServices() call
On Mon, Nov 17, 2014 at 01:35:17PM +, Jan Beulich wrote: On 14.11.14 at 16:32, konrad.w...@oracle.com wrote: On Fri, Nov 14, 2014 at 12:37:30PM +, Jan Beulich wrote: The specification is kind of vague under what conditions ExitBootServices() may legitimately fail, requiring the OS loader to retry: If MapKey value is incorrect, ExitBootServices() returns EFI_INVALID_PARAMETER and GetMemoryMap() with ExitBootServices() must be called again. Firmware implementation may choose to do a partial shutdown of the boot services during the first call to ExitBootServices(). EFI OS loader should not make calls to any boot service function other then GetMemoryMap() after the first call to ExitBootServices(). While our code guarantees the map key to be valid, there are systems where a firmware internal notification sent while processing ExitBootServices() reportedly results in changes to the memory map. s/reportedly/in fact/ In that case, make a best effort second try: Avoid any boot service calls other than the two named above, with the possible exception of error paths. Those aren't a problem, since if we end up needing to retry, we're hosed when something goes wrong as much as if we didn't make the retry attempt. For x86, a minimal adjustment to efi_arch_process_memory_map() is needed for it to cope with potentially being called a second time. Wow. Talk about timing. We saw this and were going to see doing something similar. So what are your thoughts then regarding this patch going into 4.5? Definitly should go in - and I would even say backport to earlier versions of Xen. And for the LIST_POISON* override one? Yes as well please. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq
Without #define DIFF_LIST 1: 1) The guest still crashes (xl-dmesg-not-defined.txt) AHA! --MARK-- 0: 8305085ffd28 [p:83054ef27e88, n:83054ef27e88] 1: 8305085ffd28 [p:0200200200200200, n:0100100100100100] The same pirq_dpci structure is put twice on the list. That will surely make it unhappy. The reason the #define DIFF_LIST 1 would work is that it has code to deal with that odd scenario. But .. more importantly - the code should not allow you to put _two_ of the same 'pirq_dpci' structures on the list. CPU00: d16 OK-softirq 186msec ago, state:1, 3257 count, [prev:82d0802e7e88, next:82d0802e7e88] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT PIRQ:87 d16 OK-raise 236msec ago, state:1, 3257 count, [prev:0200200200200200, next:0100100100100100] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT PIRQ:87 CPU01: d16 OK-softirq 232msec ago, state:1, 2978 count, [prev:83054ef57e70, next:83054ef57e70] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT PIRQ:87 d16 OK-raise 281msec ago, state:1, 2978 count, [prev:0200200200200200, next:0100100100100100] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT PIRQ:87 CPU02: d16 OK-softirq 373msec ago, state:1, 2378 count, [prev:83054ef47e70, next:83054ef47e70] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT PIRQ:87 d16 OK-raise 423msec ago, state:1, 2378 count, [prev:0200200200200200, next:0100100100100100] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT PIRQ:87 CPU03: d16 OK-softirq 867msec ago, state:1, 2744 count, [prev:83054ef37e70, next:83054ef37e70] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT PIRQ:87 d16 OK-raise 916msec ago, state:1, 2744 count, [prev:0200200200200200, next:0100100100100100] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT PIRQ:87 CPU04: d16 OK-softirq 497msec ago, state:1, 76910 count, [prev:83054ef2e3e0, next:83054ef2e3e0] 8305085ffd28MACH_PCI_SHIFT MAPPED_SHIFT GUEST_PCI_SHIFT PIRQ:0 d16 OK-raise 489msec ago, state:1, 76916 count, [prev:83054ef2e3e0, next:83054ef2e3e0] 8305085ffd28MACH_PCI_SHIFT MAPPED_SHIFT GUEST_PCI_SHIFT PIRQ:0 d16 ERR-poison 600msec ago, state:0, 1 count, [prev:0200200200200200, next:0100100100100100] 8305085ffd28MACH_PCI_SHIFT MAPPED_SHIFT GUEST_PCI_SHIFT PIRQ:0 CPU05: d16 OK-softirq 852msec ago, state:1, 2207 count, [prev:83054ef1fe70, next:83054ef1fe70] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT PIRQ:87 d16 OK-raise 902msec ago, state:1, 2207 count, [prev:0200200200200200, next:0100100100100100] 8305094a39a8NULL MAPPED_SHIFT GUEST_MSI_SHIFT PIRQ:87 domain_crash called from io.c:964 Domain 16 reported crashed by domain 7 on cpu#4: With #define DIFF_LIST 1: 2) Nor the guest or the host crashed (let it run for about an hour and 15 minutes), but the USB XHCI driver bailed out quickly after guest boot, so there were no MSI-X interrupts anymore. (xl-dmesg-defined-nousb.txt, dmesg-guest-defined-nousb.txt) Hm, that is odd. Can you tell me how you get your SeaBIOS to add those extra statements? I don't seem to see them with my SeaBIOS (I've an XHCI controlled hooked up to the guest too). Maybe I am missing an CONFIG in the SeaBIOS build. 3) On another boot the USB XHCI didn't bail out, after a while the host crashes. (serial.log) (XEN) [2014-11-18 14:53:37.364] RIP:e008:[82d08014a4de] hvm_do_IRQ_dpci+0xf4/0x131 which resolves to: # addr2line -e xen-syms 82d08014a4de /usr/src/new/xen-unstable-vanilla/xen/include/xen/list.h:67 which is: static inline void __list_add(struct list_head *new, struct list_head *prev, struct list_head *next) { next-prev = new; new-next = next; new-prev = prev; Here -prev-next = new; } Looking at the stack: (XEN) [2014-11-18 14:53:37.306] 0: 8303faaf25a8 [p:83054ef2e3e0, n:83054ef2e3e0] We detected that the list was poison and were printing it, while an: [ Xen-4.5.0-rc x86_64 debug=y Not tainted ] CPU:4 RIP:e008:[82d08014a4de] hvm_do_IRQ_dpci+0xf4/0x131 RFLAGS: 00010082 CONTEXT: hypervisor rax: 83054ef2e3e0 rbx: 8303faaf25a8 rcx: 8303faaf2610 rdx: 0200200200200200 rsi: 0006 rdi: 6ef3f123 rbp: 83054ef27be8 rsp: 83054ef27bd8 r8: 8303faaf2010 r9: 002f r10: 0132 r11: 0003 r12: 8305125d6000 r13: r14: 8303faaf2580 r15: 002f cr0: 8005003b cr4: 06f0 cr3: 0005448c cr2: ff600400 ds: 002b es: 002b fs: gs: ss: e010 cs: e008 Xen stack trace from rsp=83054ef27bd8: 83054ef27be8 8303faaf26c0 83054ef27c78 82d080172060 0020 83054ef27cf6 0046 83054ef27c70
Re: [Xen-devel] [for-xen-4.5 PATCH] Ignore non-zero data in unused xsave area.
On Tue, Nov 18, 2014 at 10:26:31AM -0500, Don Koch wrote: If we restore an xsave area from an older xen that has a larger size than the xcr0 bit call for, it is possible to have non-zero data in the unused area if an xsave has ever been done that used that area (e.g. during a context switch). Since the vcpu's xsave area is never zeroed after the initial allocation, that data is still there. Since we are told that said area was not written to during the save or migration, there is no need to restore it. Signed-off-by: Don Koch dk...@verizon.com --- Turns out the assertion that the unused xsave area is zero is wrong. Unfortunately, that leaves the following as the only way I can think of to work around it (and is no worse than xsave/xrestore during context switches). Alternate suggestions welcome. This is Xen 4.5 material I presume. xen/arch/x86/hvm/hvm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 8f49b44..b2c0bc4 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -2044,7 +2044,7 @@ static int hvm_load_cpu_xsave_states(struct domain *d, hvm_domain_context_t *h) printk(XENLOG_G_WARNING HVM%d.%u restore mismatch: xsave length %#x %#x (non-zero data at %#x)\n, d-domain_id, vcpuid, desc-length, size, i); -return -EOPNOTSUPP; +break; } } printk(XENLOG_G_WARNING -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxl: Document device parameter of libxl_device_type_add functions
On Tue, Nov 18, 2014 at 05:44:20PM +, Ian Jackson wrote: Euan Harris writes ([PATCH] libxl: Document device parameter of libxl_device_type_add functions): The device parameter of libxl_device_type_add is an in/out parameter. Unspecified fields are filled in with appropriate values for the created device when the function returns. Document this behaviour. Thanks. Acked-by: Ian Jackson ian.jack...@eu.citrix.com Konrad, this should go into 4.5 IMO because it's a doc comment describing existing behaviour which we want everyone to rely on. Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Ian. Signed-off-by: Euan Harris euan.har...@citrix.com --- tools/libxl/libxl.h | 4 1 file changed, 4 insertions(+) diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h index c3451bd..41d6e8d 100644 --- a/tools/libxl/libxl.h +++ b/tools/libxl/libxl.h @@ -1109,6 +1109,10 @@ void libxl_vtpminfo_list_free(libxl_vtpminfo *, int nr_vtpms); * domain to connect to the device and therefore cannot block on the * guest. * + * device is an in/out parameter: fields left unspecified when the + * structure is passed in are filled in with appropriate values for + * the device created. + * * libxl_device_type_remove(ctx, domid, device): * * Removes the given device from the specified domain by performing -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq
On Tue, Nov 18, 2014 at 05:20:44PM +, Jan Beulich wrote: On 18.11.14 at 18:03, li...@eikelenboom.it wrote: Tuesday, November 18, 2014, 5:16:50 PM, you wrote: 1) test_and_[set|clear]_bit sometimes return unexpected values. [But this might be invalid as the addition of the 8303faaf25a8 might be correct - as the second dpci the softirq is processing could be the MSI one] Would there be an easy way to stress test this function separately in some debugging function to see if it indeed is returning unexpected values ? I don't think there's a reasonable chance of these functions to return unexpected values - they're being used elsewhere, and you'd have had other problems in the past if they didn't behave as expected. Interestingly most of the 'test_and_[set|clear]_bit operate on 'unsigned long' while we do 'unsigned int' here. But the 'test_and'' are all btXl so double-word safe. The fact that Sander is able to get 'test_and_clear_bit(STATE_SCHED)' to return zero - when in fact it should return a positive value - implies that some actor is messing with the 'state' outside our raise/softirq dialogue. pt_irq_guest_eoi does pirq_dpci-state = 0 unconditionally! Lets see if this debug + potential patch helps. This should be applied on top of the other patch you had. Just in case I am attaching all four to this email. From 8093140e374fceb9121ccd07726fb3898b212bfb Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk konrad.w...@oracle.com Date: Tue, 18 Nov 2014 15:08:15 -0500 Subject: [PATCH 4/5] DEBUG4: Add an 'stamp' and potential fix. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen/drivers/passthrough/io.c | 57 +++- xen/include/xen/hvm/irq.h| 1 + 2 files changed, 41 insertions(+), 17 deletions(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index b786bd2..8a8fc62 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -26,6 +26,8 @@ #include asm/hvm/iommu.h #include asm/hvm/support.h #include xen/hvm/irq.h +#include xen/console.h + static DEFINE_PER_CPU(struct list_head, dpci_list); @@ -61,21 +63,29 @@ struct _debug_f { struct list_head list; unsigned int state; struct hvm_pirq_dpci *dpci; +unsigned long stamp; }; struct __debug { struct _debug_f ok; struct _debug_f poison; struct _debug_f raise; +struct _debug_f busy_raise; struct _debug_f reset; struct _debug_f zombie_softirq; struct _debug_f zombie_raise; +struct _debug_f timeout; +struct _debug_f clear; }; static DEFINE_PER_CPU(struct __debug, _d); void _record(struct _debug_f *d, struct hvm_pirq_dpci *pirq_dpci) { + +if (pirq_dpci-pirq) +return; + if (pirq_dpci-dom) d-domid = pirq_dpci-dom-domain_id; else @@ -86,6 +96,7 @@ void _record(struct _debug_f *d, struct hvm_pirq_dpci *pirq_dpci) d-list.prev = pirq_dpci-softirq_list.prev; d-state = pirq_dpci-state; d-dpci = pirq_dpci; +d-stamp = pirq_dpci-stamp++; } enum { @@ -95,6 +106,9 @@ enum { OK_SOFTIRQ, OK_RAISE, OK_RESET, +OK_TIMEOUT, +OK_BUSY, +OK_CLEAR, }; static void dump_record(struct _debug_f *d, unsigned int type) @@ -106,7 +120,11 @@ static void dump_record(struct _debug_f *d, unsigned int type) [OK_SOFTIRQ] = OK-softirq, [OK_RAISE] = OK-raise , [OK_RESET] = OK-reset , +[OK_TIMEOUT] = OK-timeout, +[OK_BUSY]= OK-busy , +[OK_CLEAR] = OK-clear , }; +#if 0 #define LONG(x) [_HVM_IRQ_DPCI_##x] = #x static const char *const names_flag[] = { LONG(MACH_PCI_SHIFT), @@ -117,6 +135,7 @@ static void dump_record(struct _debug_f *d, unsigned int type) }; #undef LONG unsigned int i; +#endif s_time_t now; if ( d-domid == 0 ) @@ -126,20 +145,21 @@ static void dump_record(struct _debug_f *d, unsigned int type) BUG(); now = NOW(); -printk(d%d %s %lumsec ago, state:%x, %ld count, [prev:%p, next:%p] %p, +printk(d%d %s %lumsec ago, state:%x, %ld count, [prev:%p, next:%p] %p %lx, d-domid, names[type], (unsigned long)((now - d-last) / MILLISECS(1)), -d-state, d-count, d-list.prev, d-list.next, d-dpci); +d-state, d-count, d-list.prev, d-list.next, d-dpci, d-stamp); if ( d-dpci ) { struct hvm_pirq_dpci *pirq_dpci = d-dpci; - +#if 0 for ( i = 0; i = _HVM_IRQ_DPCI_GUEST_MSI_SHIFT; i++ ) if ( pirq_dpci-flags (1 i) ) printk(%s , names_flag[i]); printk( PIRQ:%d, pirq_dpci-pirq); +#endif if (pirq_dpci-line) printk( LINE: %d, pirq_dpci-line); } @@ -159,7 +179,10 @@ static void dump_debug(unsigned char key) printk(CPU%02d: \n ,cpu); dump_record(d-ok, OK_SOFTIRQ); dump_record(d-raise, OK_RAISE
Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq
Uhmm i thought i had these switched off (due to problems earlier and then forgot about them .. however looking at the earlier reports these lines were also in those reports). The xen-syms and these last runs are all with a prestine xen tree cloned today (staging branch), so the qemu-xen and seabios defined with that were also freshly cloned and had a new default seabios config. (just to rule out anything stale in my tree) If you don't see those messages .. perhaps your seabios and qemu trees (and at least the seabios config) are not the most recent (they don't get updated automatically when you just do a git pull on the main tree) ? In /tools/firmware/seabios-dir/.config i have: CONFIG_USB=y CONFIG_USB_UHCI=y CONFIG_USB_OHCI=y CONFIG_USB_EHCI=y CONFIG_USB_XHCI=y CONFIG_USB_MSC=y CONFIG_USB_UAS=y CONFIG_USB_HUB=y CONFIG_USB_KEYBOARD=y CONFIG_USB_MOUSE=y I seem to have the same thing. Perhaps it is my XHCI controller being wonky. And this is all just from a: - git clone git://xenbits.xen.org/xen.git -b staging - make clean ./configure make -j6 make -j6 install Aye. .. snip.. 1) test_and_[set|clear]_bit sometimes return unexpected values. [But this might be invalid as the addition of the 8303faaf25a8 might be correct - as the second dpci the softirq is processing could be the MSI one] Would there be an easy way to stress test this function separately in some debugging function to see if it indeed is returning unexpected values ? Sadly no. But you got me looking in the right direction when you mentioned 'timeout'. 2) INIT_LIST_HEAD operations on the same CPU are not honored. Just curious, have you also tested the patches on AMD hardware ? Yes. To reproduce this the first thing I did was to get an AMD box. When i look at the combination of (2) and (3), It seems it could be an interaction between the two passed through devices and/or different IRQ types. Could be - as in it is causing this issue to show up faster than expected. Or it is the one that triggers more than one dpci happening at the same time. Well that didn't seem to be it (see separate amendment i mailed previously) Right, the current theory I've is that the interrupts are not being Acked within 8 milisecond and we reset the 'state' - and at the same time we get an interrupt and schedule it - while we are still processing the same interrupt. This would explain why the 'test_and_clear_bit' got the wrong value. In regards to the list poison - following this thread of logic - with the 'state = 0' set we open the floodgates for any CPU to put the same 'struct hvm_pirq_dpci' on its list. We do reset the 'state' on _every_ GSI that is mapped to a guest - so we also reset the 'state' for the MSI one (XHCI). Anyhow in your case: CPUX: CPUY: pt_irq_time_out: state = 0; [out of timer coder, theraise_softirq pirq_dpci is on the dpci_list] [adds the pirq_dpci as state == 0] softirq_dpcisoftirq_dpci: list_del [entries poison] list_del = BOOM Is what I believe is happening. The INTX device - once I put a load on it - does not trigger any pt_irq_time_out, so that would explain why I cannot hit this. But I believe your card hits these hiccups. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq
On Tue, Nov 18, 2014 at 11:12:54PM +0100, Sander Eikelenboom wrote: Tuesday, November 18, 2014, 9:56:33 PM, you wrote: Uhmm i thought i had these switched off (due to problems earlier and then forgot about them .. however looking at the earlier reports these lines were also in those reports). The xen-syms and these last runs are all with a prestine xen tree cloned today (staging branch), so the qemu-xen and seabios defined with that were also freshly cloned and had a new default seabios config. (just to rule out anything stale in my tree) If you don't see those messages .. perhaps your seabios and qemu trees (and at least the seabios config) are not the most recent (they don't get updated automatically when you just do a git pull on the main tree) ? In /tools/firmware/seabios-dir/.config i have: CONFIG_USB=y CONFIG_USB_UHCI=y CONFIG_USB_OHCI=y CONFIG_USB_EHCI=y CONFIG_USB_XHCI=y CONFIG_USB_MSC=y CONFIG_USB_UAS=y CONFIG_USB_HUB=y CONFIG_USB_KEYBOARD=y CONFIG_USB_MOUSE=y I seem to have the same thing. Perhaps it is my XHCI controller being wonky. And this is all just from a: - git clone git://xenbits.xen.org/xen.git -b staging - make clean ./configure make -j6 make -j6 install Aye. .. snip.. 1) test_and_[set|clear]_bit sometimes return unexpected values. [But this might be invalid as the addition of the 8303faaf25a8 might be correct - as the second dpci the softirq is processing could be the MSI one] Would there be an easy way to stress test this function separately in some debugging function to see if it indeed is returning unexpected values ? Sadly no. But you got me looking in the right direction when you mentioned 'timeout'. 2) INIT_LIST_HEAD operations on the same CPU are not honored. Just curious, have you also tested the patches on AMD hardware ? Yes. To reproduce this the first thing I did was to get an AMD box. When i look at the combination of (2) and (3), It seems it could be an interaction between the two passed through devices and/or different IRQ types. Could be - as in it is causing this issue to show up faster than expected. Or it is the one that triggers more than one dpci happening at the same time. Well that didn't seem to be it (see separate amendment i mailed previously) Right, the current theory I've is that the interrupts are not being Acked within 8 milisecond and we reset the 'state' - and at the same time we get an interrupt and schedule it - while we are still processing the same interrupt. This would explain why the 'test_and_clear_bit' got the wrong value. In regards to the list poison - following this thread of logic - with the 'state = 0' set we open the floodgates for any CPU to put the same 'struct hvm_pirq_dpci' on its list. We do reset the 'state' on _every_ GSI that is mapped to a guest - so we also reset the 'state' for the MSI one (XHCI). Anyhow in your case: CPUX: CPUY: pt_irq_time_out: state = 0; [out of timer coder, theraise_softirq pirq_dpci is on the dpci_list] [adds the pirq_dpci as state == 0] softirq_dpcisoftirq_dpci: list_del [entries poison] list_del = BOOM Is what I believe is happening. The INTX device - once I put a load on it - does not trigger any pt_irq_time_out, so that would explain why I cannot hit this. But I believe your card hits these hiccups. Hi Konrad, I just tested you 5 patches and as a result i still got an(other) host crash: (complete serial log attached) (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc x86_64 debug=y Not tainted ] (XEN) [2014-11-18 21:55:41.591] CPU:0 (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc x86_64 debug=y Not tainted ] (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08012c7e7]CPU:2 (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08014a461] hvm_do_IRQ_dpci+0xbd/0x13c (XEN) [2014-11-18 21:55:41.591] RFLAGS: 00010006 _spin_unlock+0x1f/0x30CONTEXT: hypervisor Duh! Here is another patch on top of the five you have (attached and inline). From 18008650f10199e2b83f74394f634a97086e34b8 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk konrad.w...@oracle.com Date: Tue, 18 Nov 2014 20:48:43 -0500 Subject: [PATCH] debug: Whether we want to clear the 'dom' field or not when changing the 'state' bit. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen/drivers/passthrough/io.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index aad5607..24e2bd6 100644 --- a/xen/drivers/passthrough
Re: [Xen-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration
On November 19, 2014 5:52:58 AM EST, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: ping? On Tue, 18 Nov 2014, Stefano Stabellini wrote: Konrad, I think we should have this fix in Xen 4.5. Should I go ahead and backport it? Go for it. Release-Acked-by: Konrad Rzeszutek Wilk (konrad.w...@oracle.com) On Mon, 17 Nov 2014, Don Slutz wrote: The other callers to blk_set_enable_write_cache() in this file already check for s-blk == NULL. Signed-off-by: Don Slutz dsl...@verizon.com --- I think this is a bugfix that should be back ported to stable releases. I also think this should be done in xen's copy of QEMU for 4.5 with back port(s) to active stable releases. Note: In 2.1 and earlier the routine is bdrv_set_enable_write_cache(); variable is s-bs. hw/ide/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 00e21cf..d4af5e2 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque, int version_id) { IDEState *s = opaque; -if (s-identify_set) { +if (s-blk s-identify_set) { blk_set_enable_write_cache(s-blk, !!(s-identify_data[85] (1 5))); } return 0; -- 1.8.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Strangeness in generated xen-command-line.html
On November 19, 2014 6:05:33 AM EST, Andrew Cooper andrew.coop...@citrix.com wrote: On 19/11/14 11:04, Ian Campbell wrote: On Wed, 2014-11-19 at 10:52 +, Andrew Cooper wrote: On 19/11/14 10:46, Ian Campbell wrote: On Wed, 2014-11-19 at 10:38 +, Ian Campbell wrote: I've not been able to find a workaround... This works for me... 8--- From 3483179d333c47deacfc8c2eb195bf7dc4a555ff Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Wed, 19 Nov 2014 10:42:18 + Subject: [PATCH] docs: workaround markdown parser error in xen-command-line.markdown Some versions of markdown (specifically the one in Debian Wheezy, currently used to generate http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be confused by nested lists in the middle of multi-paragraph parent list entries as seen in the com1,com2 entry. The effect is that the Default section of all following entries are replace by some sort of hash or checksum (at least, a string of 32 random seeming hex digits). Workaround this issue by making the decriptions of the DPS options a nested list, moving the existing nested list describing the options for S into a third level list. This seems to avoid the issue, and is arguably better formatting in its own right (at least its not a regression IMHO) Signed-off-by: Ian Campbell ian.campb...@citrix.com I had just identified a different way, but this way is slightly better. If you take out all the blank lines visible in the context below, the resulting HTML will be correctly formatted and rather neater (i.e. without sporadic blank lines). Agreed. 8-- From 53398a9729d391f1fb7b6f753a0032b1f3604d4d Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Wed, 19 Nov 2014 10:42:18 + Subject: [PATCH] docs: workaround markdown parser error in xen-command-line.markdown Some versions of markdown (specifically the one in Debian Wheezy, currently used to generate http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be confused by nested lists in the middle of multi-paragraph parent list entries as seen in the com1,com2 entry. The effect is that the Default section of all following entries are replace by some sort of hash or checksum (at least, a string of 32 random seeming hex digits). Workaround this issue by making the decriptions of the DPS options a nested list, moving the existing nested list describing the options for S into a third level list. This seems to avoid the issue, and is arguably better formatting in its own right (at least its not a regression IMHO) Signed-off-by: Ian Campbell ian.campb...@citrix.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com Release-Acked-by: Konrad Rzeszutek Wilk (konrad.w...@oracle.com) In case you were thinking of putting in 4.5 --- v2: Less blank lines == nicer output. --- docs/misc/xen-command-line.markdown | 21 - 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 0830e5f..b7eaeea 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -247,19 +247,14 @@ Both option `com1` and `com2` follow the same format. * Optionally, a clock speed measured in hz can be specified. * `DPS` represents the number of data bits, the parity, and the number of stop bits. - - `D` is an integer between 5 and 8 for the number of data bits. - - `P` is a single character representing the type of parity: - - * `n` No - * `o` Odd - * `e` Even - * `m` Mark - * `s` Space - - `S` is an integer 1 or 2 for the number of stop bits. - + * `D` is an integer between 5 and 8 for the number of data bits. + * `P` is a single character representing the type of parity: + * `n` No + * `o` Odd + * `e` Even + * `m` Mark + * `s` Space + * `S` is an integer 1 or 2 for the number of stop bits. * `io-base` is an integer which specifies the IO base port for UART registers. * `irq` is the IRQ number to use, or `0` to use the UART in poll ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq
On Wed, Nov 19, 2014 at 12:16:44PM +0100, Sander Eikelenboom wrote: Wednesday, November 19, 2014, 2:55:41 AM, you wrote: On Tue, Nov 18, 2014 at 11:12:54PM +0100, Sander Eikelenboom wrote: Tuesday, November 18, 2014, 9:56:33 PM, you wrote: Uhmm i thought i had these switched off (due to problems earlier and then forgot about them .. however looking at the earlier reports these lines were also in those reports). The xen-syms and these last runs are all with a prestine xen tree cloned today (staging branch), so the qemu-xen and seabios defined with that were also freshly cloned and had a new default seabios config. (just to rule out anything stale in my tree) If you don't see those messages .. perhaps your seabios and qemu trees (and at least the seabios config) are not the most recent (they don't get updated automatically when you just do a git pull on the main tree) ? In /tools/firmware/seabios-dir/.config i have: CONFIG_USB=y CONFIG_USB_UHCI=y CONFIG_USB_OHCI=y CONFIG_USB_EHCI=y CONFIG_USB_XHCI=y CONFIG_USB_MSC=y CONFIG_USB_UAS=y CONFIG_USB_HUB=y CONFIG_USB_KEYBOARD=y CONFIG_USB_MOUSE=y I seem to have the same thing. Perhaps it is my XHCI controller being wonky. And this is all just from a: - git clone git://xenbits.xen.org/xen.git -b staging - make clean ./configure make -j6 make -j6 install Aye. .. snip.. 1) test_and_[set|clear]_bit sometimes return unexpected values. [But this might be invalid as the addition of the 8303faaf25a8 might be correct - as the second dpci the softirq is processing could be the MSI one] Would there be an easy way to stress test this function separately in some debugging function to see if it indeed is returning unexpected values ? Sadly no. But you got me looking in the right direction when you mentioned 'timeout'. 2) INIT_LIST_HEAD operations on the same CPU are not honored. Just curious, have you also tested the patches on AMD hardware ? Yes. To reproduce this the first thing I did was to get an AMD box. When i look at the combination of (2) and (3), It seems it could be an interaction between the two passed through devices and/or different IRQ types. Could be - as in it is causing this issue to show up faster than expected. Or it is the one that triggers more than one dpci happening at the same time. Well that didn't seem to be it (see separate amendment i mailed previously) Right, the current theory I've is that the interrupts are not being Acked within 8 milisecond and we reset the 'state' - and at the same time we get an interrupt and schedule it - while we are still processing the same interrupt. This would explain why the 'test_and_clear_bit' got the wrong value. In regards to the list poison - following this thread of logic - with the 'state = 0' set we open the floodgates for any CPU to put the same 'struct hvm_pirq_dpci' on its list. We do reset the 'state' on _every_ GSI that is mapped to a guest - so we also reset the 'state' for the MSI one (XHCI). Anyhow in your case: CPUX: CPUY: pt_irq_time_out: state = 0; [out of timer coder, theraise_softirq pirq_dpci is on the dpci_list] [adds the pirq_dpci as state == 0] softirq_dpcisoftirq_dpci: list_del [entries poison] list_del = BOOM Is what I believe is happening. The INTX device - once I put a load on it - does not trigger any pt_irq_time_out, so that would explain why I cannot hit this. But I believe your card hits these hiccups. Hi Konrad, I just tested you 5 patches and as a result i still got an(other) host crash: (complete serial log attached) (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc x86_64 debug=y Not tainted ] (XEN) [2014-11-18 21:55:41.591] CPU:0 (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc x86_64 debug=y Not tainted ] (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08012c7e7]CPU:2 (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08014a461] hvm_do_IRQ_dpci+0xbd/0x13c (XEN) [2014-11-18 21:55:41.591] RFLAGS: 00010006 _spin_unlock+0x1f/0x30CONTEXT: hypervisor Duh! Here is another patch on top of the five you have (attached and inline). Hi Konrad, Happy to report it has been running with this additional patch for 2 hours now without any problems. I think you nailed it :-) Could you also do an 'xl debug-keys k' and send that please? More than happy to test the definitive patch as well.
Re: [Xen-devel] [PATCH v10 for-xen-4.5 2/2] dpci: Replace tasklet with an softirq
On Fri, Nov 14, 2014 at 11:11:46AM -0500, Konrad Rzeszutek Wilk wrote: On Fri, Nov 14, 2014 at 03:13:42PM +, Jan Beulich wrote: On 12.11.14 at 03:23, konrad.w...@oracle.com wrote: +static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) +{ +struct domain *d = pirq_dpci-dom; + +ASSERT(spin_is_locked(d-event_lock)); + +switch ( cmpxchg(pirq_dpci-state, 1 STATE_SCHED, 0) ) +{ +case (1 STATE_SCHED): +/* + * We are going to try to de-schedule the softirq before it goes in + * STATE_RUN. Whoever clears STATE_SCHED MUST refcount the 'dom'. + */ +put_domain(d); +/* fallthrough. */ Considering Sander's report, the only suspicious place I find is this one: When the STATE_SCHED flag is set, pirq_dpci is on some CPU's list. What guarantees it to get removed from that list before getting inserted on another one? None. The moment that STATE_SCHED is cleared, 'raise_softirq_for' is free to manipulate the list. I was too quick to say this. A bit more inspection shows that while 'raise_softirq_for' is free to manipulate the list - it won't be called. The reason is that the pt_pirq_softirq_reset is called _after_ the IRQ action handler are removed for this IRQ. That means we will not receive any interrupts for it and call 'raise_softirq_for'. At least until 'pt_irq_create_bind' is called. And said function has a check for this too: 42 * A crude 'while' loop with us dropping the spinlock and giving 243 * the softirq_dpci a chance to run. 244 * We MUST check for this condition as the softirq could be scheduled 245 * and hasn't run yet. Note that this code replaced tasklet_kill which 246 * would have spun forever and would do the same thing (wait to flush out 247 * outstanding hvm_dirq_assist calls. 248 */ 249 if ( pt_pirq_softirq_active(pirq_dpci) ) Hence the patch below is not needed. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [for-xen-4.5 PATCH] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.
If we pass in INTx type devices to a guest on an over-subscribed machine - and in an over-worked guest - we can cause the pirq_dpci-softirq_list to become corrupted. The reason for this is that the 'pt_irq_guest_eoi' ends up setting the 'state' to zero value. However the 'state' value (STATE_SCHED, STATE_RUN) is used to communicate between 'raise_softirq_for' and 'dpci_softirq' to determine whether the 'struct hvm_pirq_dpci' can be re-scheduled. We are ignoring the teardown path for simplicity for right now. The 'pt_irq_guest_eoi' was not adhering to the proper dialogue and was not using locked cmpxchg or test_bit operations and ended setting 'state' set to zero. That meant 'raise_softirq_for' was free to schedule it while the 'struct hvm_pirq_dpci'' was still on an per-cpu list. The end result was list_del being called twice and the second call corrupting the per-cpu list. For this to occur one of the CPUs must be in the idle loop executing softirqs and the interrupt handler in the guest must not respond to the pending interrupt within 8ms, and we must receive another interrupt for this device on another CPU. CPU0: CPU1: timer_softirq_action \- pt_irq_time_out state = 0;do_IRQ [out of timer code, theraise_softirq pirq_dpci is on the CPU0 dpci_list] [adds the pirq_dpci to CPU1 dpci_list as state == 0] softirq_dpci:softirq_dpci: list_del [list entries are poisoned] list_del = BOOM The fix is simple - enroll 'pt_irq_guest_eoi' to use the locked semantics for 'state'. We piggyback on pt_pirq_softirq_cancel (was pt_pirq_softirq_reset) to use cmpxchg. We also expand said function to reset the '-dom' only on the teardown paths - but not on the timeouts. Reported-by: Sander Eikelenboom li...@eikelenboom.it Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen/drivers/passthrough/io.c | 27 +-- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index efc66dc..2039d31 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -57,7 +57,7 @@ enum { * This can be called multiple times, but the softirq is only raised once. * That is until the STATE_SCHED state has been cleared. The state can be * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'), - * or by 'pt_pirq_softirq_reset' (which will try to clear the state before + * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before * the softirq had a chance to run). */ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) @@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci) } /* - * Reset the pirq_dpci-dom parameter to NULL. + * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set, + * reset pirq_dpci-dom parameter to NULL (used for teardown). * * This function checks the different states to make sure it can do it * at the right time. If it unschedules the 'hvm_dirq_assist' from running * it also refcounts (which is what the softirq would have done) properly. */ -static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) +static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci, + unsigned int clear) { struct domain *d = pirq_dpci-dom; @@ -125,8 +127,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom' * in local variable before it sets STATE_RUN - and therefore will not * dereference '-dom' which would crash. + * + * However, if this is called from 'pt_irq_time_out' we do not want to + * clear the '-dom' as we can re-use the 'pirq_dpci' after that and + * need '-dom'. */ -pirq_dpci-dom = NULL; +if ( clear ) +pirq_dpci-dom = NULL; break; } } @@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci, if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT, pirq_dpci-flags) ) { -pirq_dpci-state = 0; +pt_pirq_softirq_cancel(pirq_dpci, 0 /* keep dom */); pirq_dpci-pending = 0; pirq_guest_eoi(dpci_pirq(pirq_dpci)); } @@ -285,7 +292,7 @@ int pt_irq_create_bind( * to be scheduled but we must deal with the one that may be * in the queue. */ -pt_pirq_softirq_reset(pirq_dpci); +pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */); } } if ( unlikely(rc) ) @@ -536,9 +543,9 @@ int
Re: [Xen-devel] [PATCH for-4.5] xen/arm: clear UIE on hypervisor entry
On Wed, Nov 19, 2014 at 05:44:49PM +, Stefano Stabellini wrote: UIE being set can cause maintenance interrupts to occur when Xen writes to one or more LR registers. The effect is a busy loop around the interrupt handler in Xen (http://marc.info/?l=xen-develm=141597517132682): everything gets stuck. Konrad, this fixes an actual bug, at least on OMAP5. It should have no bad side effects on any other platforms as far as I can tell. It should go in 4.5. Have you checked (aka ran the tests) on the other platforms? Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Tested-by: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com ^^^ 'Reported-and-Tested-by' diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 70d10d6..df140b9 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -527,8 +529,6 @@ void gic_inject(void) if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1); -else -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); } static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V3 2/8] xen: Delay remapping memory of pv-domain
On Fri, Nov 14, 2014 at 06:14:06PM +0100, Juergen Gross wrote: On 11/14/2014 05:47 PM, Konrad Rzeszutek Wilk wrote: On Fri, Nov 14, 2014 at 05:53:19AM +0100, Juergen Gross wrote: On 11/13/2014 08:56 PM, Konrad Rzeszutek Wilk wrote: + mfn_save = virt_to_mfn(buf); + + while (xen_remap_mfn != INVALID_P2M_ENTRY) { So the 'list' is constructed by going forward - that is from low-numbered PFNs to higher numbered ones. But the 'xen_remap_mfn' is going the other way - from the highest PFN to the lowest PFN. Won't that mean we will restore the chunks of memory in the wrong order? That is we will still restore them in chunks size, but the chunks will be in descending order instead of ascending? No, the information where to put each chunk is contained in the chunk data. I can add a comment explaining this. Right, the MFNs in a chunks are going to be restored in the right order. I was thinking that the chunks (so a set of MFNs) will be restored in the opposite order that they are written to. And oddly enough the chunks are done in 512-3 = 509 MFNs at once? More don't fit on a single page due to the other info needed. So: yes. But you could use two pages - one for the structure and the other for the list of MFNs. That would fix the problem of having only 509 MFNs being contingous per chunk when restoring. That's no problem (see below). Anyhow the point I had that I am worried is that we do not restore the MFNs in the same order. We do it in chunk size which is OK (so the 509 MFNs at once)- but the order we traverse the restoration process is the opposite of the save process. Say we have 4MB of contingous MFNs, so two (err, three) chunks. The first one we iterate is from 0-509, the second is 510-1018, the last is 1019-1023. When we restore (remap) we start with the last 'chunk' so we end up restoring them: 1019-1023, 510-1018, 0-509 order. No. When building up the chunks we save in each chunk where to put it on remap. So in your example 0-509 should be mapped at dest+0, 510-1018 at dest+510, and 1019-1023 at dest+1019. When remapping we map 1019-1023 to dest+1019, 510-1018 at dest+510 and last 0-509 at dest+0. So we do the mapping in reverse order, but to the correct pfns. Excellent! Could a condensed version of that explanation be put in the code ? Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V3 7/8] xen: switch to linear virtual mapped sparse p2m list
On Tue, Nov 11, 2014 at 06:43:45AM +0100, Juergen Gross wrote: At start of the day the Xen hypervisor presents a contiguous mfn list to a pv-domain. In order to support sparse memory this mfn list is accessed via a three level p2m tree built early in the boot process. Whenever the system needs the mfn associated with a pfn this tree is used to find the mfn. Instead of using a software walked tree for accessing a specific mfn list entry this patch is creating a virtual address area for the entire possible mfn list including memory holes. The holes are covered by mapping a pre-defined page consisting only of invalid mfn entries. Access to a mfn entry is possible by just using the virtual base address of the mfn list and the pfn as index into that list. This speeds up the (hot) path of determining the mfn of a pfn. Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0 showed following improvements: Elapsed time: 32:50 - 32:35 System: 18:07 - 17:47 User:104:00 - 103:30 Tested on 64 bit dom0 and 32 bit domU. Signed-off-by: Juergen Gross jgr...@suse.com --- arch/x86/include/asm/xen/page.h | 14 +- arch/x86/xen/mmu.c | 32 +- arch/x86/xen/p2m.c | 732 +--- arch/x86/xen/xen-ops.h | 2 +- 4 files changed, 342 insertions(+), 438 deletions(-) diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index 07d8a7b..4a227ec 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -72,7 +72,19 @@ extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn) */ static inline unsigned long __pfn_to_mfn(unsigned long pfn) { - return get_phys_to_machine(pfn); + unsigned long mfn; + + if (pfn xen_p2m_size) + mfn = xen_p2m_addr[pfn]; + else if (unlikely(pfn xen_max_p2m_pfn)) + return get_phys_to_machine(pfn); + else + return IDENTITY_FRAME(pfn); + + if (unlikely(mfn == INVALID_P2M_ENTRY)) + return get_phys_to_machine(pfn); + + return mfn; } static inline unsigned long pfn_to_mfn(unsigned long pfn) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 31ca515..0b43c45 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1158,20 +1158,16 @@ static void __init xen_cleanhighmap(unsigned long vaddr, * instead of somewhere later and be confusing. */ xen_mc_flush(); } -static void __init xen_pagetable_p2m_copy(void) + +static void __init xen_pagetable_p2m_free(void) { unsigned long size; unsigned long addr; - unsigned long new_mfn_list; - - if (xen_feature(XENFEAT_auto_translated_physmap)) - return; size = PAGE_ALIGN(xen_start_info-nr_pages * sizeof(unsigned long)); - new_mfn_list = xen_revector_p2m_tree(); /* No memory or already called. */ - if (!new_mfn_list || new_mfn_list == xen_start_info-mfn_list) + if ((unsigned long)xen_p2m_addr == xen_start_info-mfn_list) return; /* using __ka address and sticking INVALID_P2M_ENTRY! */ @@ -1189,8 +1185,6 @@ static void __init xen_pagetable_p2m_copy(void) size = PAGE_ALIGN(xen_start_info-nr_pages * sizeof(unsigned long)); memblock_free(__pa(xen_start_info-mfn_list), size); - /* And revector! Bye bye old array */ - xen_start_info-mfn_list = new_mfn_list; /* At this stage, cleanup_highmap has already cleaned __ka space * from _brk_limit way up to the max_pfn_mapped (which is the end of @@ -1214,12 +1208,26 @@ static void __init xen_pagetable_p2m_copy(void) } #endif -static void __init xen_pagetable_init(void) +static void __init xen_pagetable_p2m_setup(void) { - paging_init(); + if (xen_feature(XENFEAT_auto_translated_physmap)) + return; + + xen_vmalloc_p2m_tree(); + #ifdef CONFIG_X86_64 - xen_pagetable_p2m_copy(); + xen_pagetable_p2m_free(); #endif + /* And revector! Bye bye old array */ + xen_start_info-mfn_list = (unsigned long)xen_p2m_addr; +} + +static void __init xen_pagetable_init(void) +{ + paging_init(); + + xen_pagetable_p2m_setup(); + /* Allocate and initialize top and mid mfn levels for p2m structure */ xen_build_mfn_list_list(); diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 328875a..7df446d 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -3,21 +3,22 @@ * guests themselves, but it must also access and update the p2m array * during suspend/resume when all the pages are reallocated. * - * The p2m table is logically a flat array, but we implement it as a - * three-level tree to allow the address space to be sparse. + * The logical flat p2m table is mapped to a linear kernel memory area. + * For accesses by Xen a three-level tree linked via
Re: [Xen-devel] [PATCH V3 0/8] xen: Switch to virtual mapped linear p2m list
On Tue, Nov 11, 2014 at 06:43:38AM +0100, Juergen Gross wrote: Paravirtualized kernels running on Xen use a three level tree for translation of guest specific physical addresses to machine global addresses. This p2m tree is used for construction of page table entries, so the p2m tree walk is performance critical. By using a linear virtual mapped p2m list accesses to p2m elements can be sped up while even simplifying code. To achieve this goal some p2m related initializations have to be performed later in the boot process, as the final p2m list can be set up only after basic memory management functions are available. Hey Juergen, I finially finished looking at the patchset. Had some comments, some questions that I hope can make it in the patch so that in six months or so when somebody looks at the code they can understand the subtle pieces. Looking forward to the v4! (Thought keep in mind that next week is Thanksgiving week so won't be able to look much after Wednesday) arch/x86/include/asm/pgtable_types.h |1 + arch/x86/include/asm/xen/page.h | 49 +- arch/x86/mm/pageattr.c | 20 + arch/x86/xen/mmu.c | 38 +- arch/x86/xen/p2m.c | 1315 ++ arch/x86/xen/setup.c | 460 ++-- arch/x86/xen/xen-ops.h |6 +- 7 files changed, 854 insertions(+), 1035 deletions(-) And best of - we are deleting more code! -- 2.1.2 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] libxl: remove existence check for PCI device hotplug
On Mon, Nov 17, 2014 at 12:10:34PM +, Wei Liu wrote: The existence check is to make sure a device is not added to a guest multiple times. PCI device backend path has different rules from vif, disk etc. For example: /local/domain/0/backend/pci/9/0/dev-1/:03:10.1 /local/domain/0/backend/pci/9/0/key-1/:03:10.1 /local/domain/0/backend/pci/9/0/dev-2/:03:10.2 /local/domain/0/backend/pci/9/0/key-2/:03:10.2 The devid for PCI devices is hardcoded 0. libxl__device_exists only checks up to /local/.../9/0 so it always returns true even the device is assignable. Remove invocation of libxl__device_exists. We're sure at this point that the PCI device is assignable (hence no xenstore entry or JSON entry). The check is done before hand. For HVM guest it's done by calling xc_test_assign_device and for PV guest it's done by calling pciback_dev_is_assigned. Reported-by: Li, Liang Z liang.z...@intel.com Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Konrad Wilk konrad.w...@oracle.com --- This patch fixes a regression in 4.5. Ouch! That needs then to be fixed. Is the version you would want to commit? I did test it - and it looked to do the right thing - thought the xen-pciback is stuck in the 7 state. However that is a seperate issue that I believe is due to Xen pciback not your patches. The risk is that I misunderstood semantics of xc_test_assign_device and pciback_dev_is_assigned and end up adding several entries to JSON config template. But if the assignable tests are incorrect I think we have a bigger problem to worry about than duplicated entries in JSON template. It would be good for someone to have PCI hotplug setup to run a quick test. I think Liang confirmed (indrectly) that xc_test_assign_device worked well for him so I think there's won't be multiple JSON template entries for HVM guests. However PV side still remains to be tested. --- tools/libxl/libxl_pci.c |8 1 file changed, 8 deletions(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 9f40100..316643c 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -175,14 +175,6 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, uint32_t domid, libxl_d rc = libxl__xs_transaction_start(gc, t); if (rc) goto out; -rc = libxl__device_exists(gc, t, device); -if (rc 0) goto out; -if (rc == 1) { -LOG(ERROR, device already exists in xenstore); -rc = ERROR_DEVICE_EXISTS; -goto out; -} - rc = libxl__set_domain_configuration(gc, domid, d_config); if (rc) goto out; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 12/13] swiotlb-xen: pass dev_addr to xen_dma_unmap_page and xen_dma_sync_single_for_cpu
On Wed, Nov 12, 2014 at 11:40:53AM +, Stefano Stabellini wrote: xen_dma_unmap_page and xen_dma_sync_single_for_cpu take a dma_addr_t handle as argument, not a physical address. Ouch. Should this also go on stable tree? Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Reviewed-by: Catalin Marinas catalin.mari...@arm.com --- drivers/xen/swiotlb-xen.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 3725ee4..498b654 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -449,7 +449,7 @@ static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr, BUG_ON(dir == DMA_NONE); - xen_dma_unmap_page(hwdev, paddr, size, dir, attrs); + xen_dma_unmap_page(hwdev, dev_addr, size, dir, attrs); /* NOTE: We use dev_addr here, not paddr! */ if (is_xen_swiotlb_buffer(dev_addr)) { @@ -497,14 +497,14 @@ xen_swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr, BUG_ON(dir == DMA_NONE); if (target == SYNC_FOR_CPU) - xen_dma_sync_single_for_cpu(hwdev, paddr, size, dir); + xen_dma_sync_single_for_cpu(hwdev, dev_addr, size, dir); /* NOTE: We use dev_addr here, not paddr! */ if (is_xen_swiotlb_buffer(dev_addr)) swiotlb_tbl_sync_single(hwdev, paddr, size, dir, target); if (target == SYNC_FOR_DEVICE) - xen_dma_sync_single_for_cpu(hwdev, paddr, size, dir); + xen_dma_sync_single_for_cpu(hwdev, dev_addr, size, dir); if (dir != DMA_FROM_DEVICE) return; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] set pv guest default video_memkb to 0
On Tue, Nov 18, 2014 at 03:57:08PM -0500, Zhigang Wang wrote: Before this patch, pv guest video_memkb is -1, which is an invalid value. And it will cause the xenstore 'memory/targe' calculation wrong: memory/target = info-target_memkb - info-video_memkb CC-ing the maintainers. Is this an regression as compared to Xen 4.4 or is this also in Xen 4.4? Thanks. Signed-off-by: Zhigang Wang zhigang.x.w...@oracle.com --- tools/libxl/libxl_create.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index b1ff5ae..1198225 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -357,6 +357,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, break; case LIBXL_DOMAIN_TYPE_PV: libxl_defbool_setdefault(b_info-u.pv.e820_host, false); +if (b_info-video_memkb == LIBXL_MEMKB_DEFAULT) +b_info-video_memkb = 0; if (b_info-shadow_memkb == LIBXL_MEMKB_DEFAULT) b_info-shadow_memkb = 0; if (b_info-u.pv.slack_memkb == LIBXL_MEMKB_DEFAULT) -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] docs/commandline: Fix formatting issues
On Wed, Nov 19, 2014 at 11:22:18AM +, Ian Campbell wrote: On Wed, 2014-11-19 at 11:17 +, Andrew Cooper wrote: In both of these cases, markdown was interpreting the text as regular text, and reflowing it as a regular paragraph, leading to a single line as output. Reformat them as code blocks inside blockquote blocks, which causes them to take their precise whitespace layout. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- Konrad: this is a documentation fix, so requesting a 4.5 ack please. FWIW IMHO documentation fixes in general should have a very low bar to cross until very late in the release cycle... I concur, I updated the release criteria doc so that it will be expediated in the future. --- docs/misc/xen-command-line.markdown | 38 +-- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index f054d4b..e3a5a15 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -475,13 +475,13 @@ defaults of 1 and unlimited respectively are used instead. For example, with `dom0_max_vcpus=4-8`: - Number of - PCPUs | Dom0 VCPUs - 2| 4 - 4| 4 - 6| 6 - 8| 8 - 10| 8 +Number of + PCPUs | Dom0 VCPUs + 2| 4 + 4| 4 + 6| 6 + 8| 8 + 10| 8 ### dom0\_mem `= List of ( min:size | max:size | size )` @@ -684,18 +684,18 @@ supported only when compiled with XSM\_ENABLE=y on x86. The specified value is a bit mask with the individual bits having the following meaning: -Bit 0 - debug level 0 (unused at present) -Bit 1 - debug level 1 (Control Register logging) -Bit 2 - debug level 2 (VMX logging of MSR restores when context switching) -Bit 3 - debug level 3 (unused at present) -Bit 4 - I/O operation logging -Bit 5 - vMMU logging -Bit 6 - vLAPIC general logging -Bit 7 - vLAPIC timer logging -Bit 8 - vLAPIC interrupt logging -Bit 9 - vIOAPIC logging -Bit 10 - hypercall logging -Bit 11 - MSR operation logging + Bit 0 - debug level 0 (unused at present) + Bit 1 - debug level 1 (Control Register logging) + Bit 2 - debug level 2 (VMX logging of MSR restores when context switching) + Bit 3 - debug level 3 (unused at present) + Bit 4 - I/O operation logging + Bit 5 - vMMU logging + Bit 6 - vLAPIC general logging + Bit 7 - vLAPIC timer logging + Bit 8 - vLAPIC interrupt logging + Bit 9 - vIOAPIC logging + Bit 10 - hypercall logging + Bit 11 - MSR operation logging Recognized in debug builds of the hypervisor only. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] libxl: remove existence check for PCI device hotplug
On Wed, Nov 19, 2014 at 09:21:23PM +, Wei Liu wrote: On Wed, Nov 19, 2014 at 04:01:54PM -0500, Konrad Rzeszutek Wilk wrote: On Mon, Nov 17, 2014 at 12:10:34PM +, Wei Liu wrote: The existence check is to make sure a device is not added to a guest multiple times. PCI device backend path has different rules from vif, disk etc. For example: /local/domain/0/backend/pci/9/0/dev-1/:03:10.1 /local/domain/0/backend/pci/9/0/key-1/:03:10.1 /local/domain/0/backend/pci/9/0/dev-2/:03:10.2 /local/domain/0/backend/pci/9/0/key-2/:03:10.2 The devid for PCI devices is hardcoded 0. libxl__device_exists only checks up to /local/.../9/0 so it always returns true even the device is assignable. Remove invocation of libxl__device_exists. We're sure at this point that the PCI device is assignable (hence no xenstore entry or JSON entry). The check is done before hand. For HVM guest it's done by calling xc_test_assign_device and for PV guest it's done by calling pciback_dev_is_assigned. Reported-by: Li, Liang Z liang.z...@intel.com Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Konrad Wilk konrad.w...@oracle.com --- This patch fixes a regression in 4.5. Ouch! That needs then to be fixed. Is the version you would want to commit? I did test it - and it Yes. Then Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com looked to do the right thing - thought the xen-pciback is stuck in the 7 state. However that is a seperate issue that I believe is due to Xen pciback not your patches. Thanks for testing. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v0 RFC 0/2] xl/libxl support for PVUSB
On Sun, Nov 16, 2014 at 10:36:28AM +0800, Simon Cao wrote: Hi, I was working on the work. But I was busing preparing some job interviews in the last three months, sorry for this long delay. I will update my progress in a few days. OK, I put your name for this to be in Xen 4.6. Thanks! Thanks! Bo Cao On Mon, Nov 10, 2014 at 4:37 PM, Chun Yan Liu cy...@suse.com wrote: Is there any progress on this work? I didn't see new version after this. Anyone knows the status? Thanks, Chunyan On 8/11/2014 at 04:23 AM, in message 1407702234-22309-1-git-send-email-caobosi...@gmail.com, Bo Cao caobosi...@gmail.com wrote: Finally I have a workable version xl/libxl support for PVUSB. Most of its commands work property now, but there are still some probelm to be solved. Please take a loot and give me some advices. == What have been implemented ? == I have implemented libxl functions for PVUSB in libxl_usb.c. It mainly consists of two part: usbctrl_add/remove/list and usb_add/remove/list in which usbctrl denote usb controller in which usd device can be plugged in. I don't use ao_dev in libxl_deivce_usbctrl_add since we don't need to execute hotplug script for usbctrl and without ao_dev, adding default usbctrl for usb device would be easier. For the cammands to manipulate usb device such as xl usb-attach and xl usb-detach, this patch now only support to specify usb devices by their interface in sysfs. Using this interface, we can read usb device information through sysfs and bind/unbind usb device. (The support for mapping the lsusb bus:addr to the sysfs usb interface will come later). == What needs to do next ? == There are two main problems to be solved. 1. PVUSB Options in VM Guest's Configuration File The interface in VM Guest's configuration file to add usb device is: usb=[interface=1-1]. But the problem is now is that after the default usbctrl is added, the state of usbctrl is 2, e,g, XenbusStateInitWait, waiting for xen-usbfront to connect. The xen-usbfront in VM Guest isn't loaded. Therefore, sysfs_intf_write will report error. Does anyone have any clue how to solve this? 2. sysfs_intf_write In the process of xl usb-attach domid intf=1-1, after writing 1-1 to Xenstore entry, we need to bind the controller of this usb device to usbback driver so that it can be used by VM Guest. For exampele, for usb device 1-1, it's controller interface maybe 1-1:1.0, and we write this value to /sys/bus/usb/driver/usbback/bind. But for some devices, they have two controllers, for example 1-1:1.0 and 1-1:1.1. I think this means it has two functions, such as usbhid and usb-storage. So in this case, we bind the two controller to usbback? There maybe some errors or bugs in the codes. Feel free to tell me. Cheers, - Simon --- CC: George Dunlap george.dun...@eu.citrix.com CC: Ian Jackson ian.jack...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Pasi Kärkkäinen pa...@iki.fi CC: Lars Kurth lars.ku...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.
On Wed, Nov 19, 2014 at 08:17:35PM +0100, Sander Eikelenboom wrote: Wednesday, November 19, 2014, 8:01:31 PM, you wrote: On Wed, Nov 19, 2014 at 07:54:39PM +0100, Sander Eikelenboom wrote: Wednesday, November 19, 2014, 6:31:39 PM, you wrote: Hey, This patch should fix the issue that Sander had seen. The full details are in the patch itself. Sander, if you could - please test origin/staging with this patch to make sure it does fix the issue. xen/drivers/passthrough/io.c | 27 +-- Konrad Rzeszutek Wilk (1): dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked. 1 file changed, 17 insertions(+), 10 deletions(-) Hi Konrad, Hmm just tested with a freshly cloned tree .. unfortunately it blew up again. (i must admit i also re-enabled stuff i had disabled in debugging like, cpuidle, cpufreq). Argh. Could you also try the first patch the STATE_ZOMBIE one? Building now .. (Attached and inline) Sander mentioned to me over IRC that with the STATE_ZOMBIE patch things work peachy for him. The patch in combination with the previous adds two extra paths: 1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is cleared. 2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the STATE_ZOMBIE. Lets follow the case without the zombie patch and with the zombie patch: w/o zombie: timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 0. pirq_dpci is still on dpci_list. dpci_sofitrq while (!list_emptry(our_list)) list_del, but has not yet done 'entry-next = LIST_POISON1;' [interrupt happens] raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list. [interrupt is done, back to dpci_softirq] finishes the entry-next = LIST_POISON1; .. test STATE_SCHED returns true, so executes the hvm_dirq_assist. ends the loop, exits. dpci_softirq while (!list_emtpry) list_del, but -next already has LIST_POISON1 and we blow up. w/ zombie: timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to STATE_ZOMBIE. pirq_dpci is still on dpci_list. dpci_sofitrq while (!list_emptry(our_list)) list_del, but has not yet done 'entry-next = LIST_POISON1;' [interrupt happens] raise_softirq checks state, it is STATE_ZOMBIE so returns. [interrupt is done, back to dpci_softirq] finishes the entry-next = LIST_POISON1; .. test STATE_SCHED returns true, so executes the hvm_dirq_assist. ends the loop, exits. So it seems that the STATE_ZOMBIE is needed, but for a different reason that Jan initially thought of: From c89a97f695fda245f5fcb16ddb36d3df7f6f28b9 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk konrad.w...@oracle.com Date: Fri, 14 Nov 2014 12:15:26 -0500 Subject: [PATCH] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq. When we want to cancel an outstanding 'struct hvm_pirq_dpci' we perform and cmpxch on the state to set it to zero. That is OK on the teardown paths as it is guarnateed that the do_IRQ action handler has been removed. Hence no more interrupts can be scheduled. But with the introduction of dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked. we now utilize the pt_pirq_softirq_cancel when we want to cancel outstanding operations. However once we cancel them the do_IRQ is free to schedule them back in - even if said 'struct hvm_pirq_dpci' is still on the dpci_list. The code base before this patch could follow this race: \-timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 0. pirq_dpci is still on dpci_list. \- dpci_sofitrq while (!list_emptry(our_list)) list_del, but has not yet done 'entry-next = LIST_POISON1;' [interrupt happens] raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list. [interrupt is done, back to dpci_softirq] finishes the entry-next = LIST_POISON1; .. test STATE_SCHED returns true, so executes the hvm_dirq_assist. ends the loop, exits. \- dpci_softirq while (!list_emtpry) list_del, but -next already has LIST_POISON1 and we blow up. This patch in combination adds two extra paths: 1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is cleared. 2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the STATE_ZOMBIE. Using the example above the code-paths would be now: \- timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to STATE_ZOMBIE. pirq_dpci is still on dpci_list. \- dpci_sofitrq while (!list_emptry(our_list
[Xen-devel] [for xen-4.5 PATCH v2] Fix list corruption in dpci_softirq.
Hey, Attached are two patches that fix the dpci_softirq list corruption that Sander was observing. xen/drivers/passthrough/io.c | 55 +++- 1 file changed, 39 insertions(+), 16 deletions(-) Konrad Rzeszutek Wilk (2): dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked. dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [for-xen-4.5 PATCH v2 1/2] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.
If we pass in INTx type devices to a guest on an over-subscribed machine - and in an over-worked guest - we can cause the pirq_dpci-softirq_list to become corrupted. The reason for this is that the 'pt_irq_guest_eoi' ends up setting the 'state' to zero value. However the 'state' value (STATE_SCHED, STATE_RUN) is used to communicate between 'raise_softirq_for' and 'dpci_softirq' to determine whether the 'struct hvm_pirq_dpci' can be re-scheduled. We are ignoring the teardown path for simplicity for right now. The 'pt_irq_guest_eoi' was not adhering to the proper dialogue and was not using locked cmpxchg or test_bit operations and ended setting 'state' set to zero. That meant 'raise_softirq_for' was free to schedule it while the 'struct hvm_pirq_dpci'' was still on an per-cpu list. The end result was list_del being called twice and the second call corrupting the per-cpu list. For this to occur one of the CPUs must be in the idle loop executing softirqs and the interrupt handler in the guest must not respond to the pending interrupt within 8ms, and we must receive another interrupt for this device on another CPU. CPU0: CPU1: timer_softirq_action \- pt_irq_time_out state = 0;do_IRQ [out of timer code, theraise_softirq pirq_dpci is on the CPU0 dpci_list] [adds the pirq_dpci to CPU1 dpci_list as state == 0] softirq_dpci:softirq_dpci: list_del [list entries are poisoned] list_del = BOOM The fix is simple - enroll 'pt_irq_guest_eoi' to use the locked semantics for 'state'. We piggyback on pt_pirq_softirq_cancel (was pt_pirq_softirq_reset) to use cmpxchg. We also expand said function to reset the '-dom' only on the teardown paths - but not on the timeouts. Reported-and-Tested-by: Sander Eikelenboom li...@eikelenboom.it Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen/drivers/passthrough/io.c | 27 +-- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index efc66dc..2039d31 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -57,7 +57,7 @@ enum { * This can be called multiple times, but the softirq is only raised once. * That is until the STATE_SCHED state has been cleared. The state can be * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'), - * or by 'pt_pirq_softirq_reset' (which will try to clear the state before + * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before * the softirq had a chance to run). */ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) @@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci) } /* - * Reset the pirq_dpci-dom parameter to NULL. + * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set, + * reset pirq_dpci-dom parameter to NULL (used for teardown). * * This function checks the different states to make sure it can do it * at the right time. If it unschedules the 'hvm_dirq_assist' from running * it also refcounts (which is what the softirq would have done) properly. */ -static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) +static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci, + unsigned int clear) { struct domain *d = pirq_dpci-dom; @@ -125,8 +127,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom' * in local variable before it sets STATE_RUN - and therefore will not * dereference '-dom' which would crash. + * + * However, if this is called from 'pt_irq_time_out' we do not want to + * clear the '-dom' as we can re-use the 'pirq_dpci' after that and + * need '-dom'. */ -pirq_dpci-dom = NULL; +if ( clear ) +pirq_dpci-dom = NULL; break; } } @@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci, if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT, pirq_dpci-flags) ) { -pirq_dpci-state = 0; +pt_pirq_softirq_cancel(pirq_dpci, 0 /* keep dom */); pirq_dpci-pending = 0; pirq_guest_eoi(dpci_pirq(pirq_dpci)); } @@ -285,7 +292,7 @@ int pt_irq_create_bind( * to be scheduled but we must deal with the one that may be * in the queue. */ -pt_pirq_softirq_reset(pirq_dpci); +pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */); } } if ( unlikely(rc) ) @@ -536,9
Re: [Xen-devel] [PATCH v2 for 4.5] xl: Return proper error codes for block-attach and block-detach
On Thu, Nov 20, 2014 at 03:47:04PM +, Ian Campbell wrote: On Mon, 2014-11-17 at 12:36 +, George Dunlap wrote: On 11/14/2014 11:12 AM, Ian Campbell wrote: On Thu, 2014-11-13 at 19:04 +, George Dunlap wrote: Return proper error codes on failure so that scripts can tell whether the command completed properly or not. Also while we're at it, normalize init and dispose of libxl_device_disk structures. This means calling init and dispose in the actual functions where they are declared. This in turn means changing parse_disk_config_multistring() to not call libxl_device_disk_init. And that in turn means changes to callers of parse_disk_config(). It also means removing libxl_device_disk_init() from libxl.c:libxl_vdev_to_device_disk(). This is only called from xl_cmdimpl.c. Signed-off-by: George Dunlap george.dun...@eu.citrix.com --- CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@citrix.com CC: Wei Liu wei.l...@citrix.com CC: Konrad Wilk konrad.w...@oracle.com Release justification: This is a bug fix. It's a fairly minor one, but it's also a very simple one. v2: - Restructure functions to make sure init and dispose are properly called. Sadly this bit has somewhat reduced the truth of the second half of your release justification, since the patch is a fair bit more subtle though. Although IMHO it hasn't invalidated the case for taking the patch for 4.5 (modulo comments below). Well, I think we can take the hacky short-term fix I posted before, which is simple, and do a proper fix after the 4.6 dev window opens up; or we can do a more complete fix now. Specifically the hacky short-term fix is 1415813493-25330-1-git-send-email-george.dun...@eu.citrix.com ? I could live with that, perhaps with the commit log explaining that a little. Konrad? hands George the band-aid tape Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Or, if the valgrind thing is really important, I could use the change you suggested, and do return rc ? 1 : 0; but I really think that's going in the wrong direction... -George --- tools/libxl/libxl.c | 2 -- tools/libxl/xl_cmdimpl.c | 28 +--- 2 files changed, 21 insertions(+), 9 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index f7961f6..d9c4ce7 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -2611,8 +2611,6 @@ int libxl_vdev_to_device_disk(libxl_ctx *ctx, uint32_t domid, if (devid 0) return ERROR_INVAL; -libxl_device_disk_init(disk); _init functions are idempotent, so this is harmless, I think. Existing callers (including things which aren't xl) might be relying on it. Outside of a freeze period that might be acceptable (those callers are buggy) but since you are asking for a freeze exception I think this may be going a step too far in cleaning things up. In terms of other callers you've missed tools/ocaml/libs/xl/xenlight_stubs.c (which appears buggy to me) in tree, plus whatever use e.g. libvirt makes of it. diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 3c9f146..25af715 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -485,8 +485,6 @@ static void parse_disk_config_multistring(XLU_Config **config, { int e; -libxl_device_disk_init(disk); Likewise here, although to a lesser extent since this is xl not libxl. @@ -6378,13 +6382,17 @@ int main_blockattach(int argc, char **argv) printf(disk: %s\n, json); free(json); if (ferror(stdout) || fflush(stdout)) { perror(stdout); exit(-1); } -return 0; You should set rc = 0 here rather than initing it at declaration to catch error paths which don't set the result. (we aren't very consistent about this in the code today so I'm only mentioning it because other changes seem to be needed, if that turns out not to be the case and there's no need for v3 then this shouldn't block acceptance) +goto out; I'm not 100% convinced by the use of the goto out error handling style for a success path, but it's probably better than duplicating the exit path or adding !dryrun checks to all the following operations. Ian, since you recently posted updated coding style for things around this, what do you think? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
On Thu, Nov 20, 2014 at 11:55:43AM +0100, Jan Beulich wrote: On 19.11.14 at 23:21, konrad.w...@oracle.com wrote: Leaving aside the question of whether this is the right approach, in case it is a couple of comments: @@ -85,7 +91,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) */ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci) { -if ( pirq_dpci-state ((1 STATE_RUN) | (1 STATE_SCHED)) ) +if ( pirq_dpci-state ((1 STATE_RUN) | (1 STATE_SCHED) | (1 STATE_ZOMBIE) ) ) This is becoming unwieldy - perhaps better just if ( pirq_dpci-state )? @@ -122,6 +128,7 @@ static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci, /* fallthrough. */ case (1 STATE_RUN): case (1 STATE_RUN) | (1 STATE_SCHED): +case (1 STATE_RUN) | (1 STATE_SCHED) | (1 STATE_ZOMBIE): How would we get into this state? Afaict STATE_ZOMBIE can't be set at the same time as STATE_SCHED. @@ -786,6 +793,7 @@ unlock: static void dpci_softirq(void) { unsigned int cpu = smp_processor_id(); +unsigned int reset = 0; bool_t (and would better be inside the loop, avoiding the pointless re-init on the last iteration). But taking it as a whole - aren't we now at risk of losing an interrupt instance the guest expects, due to raise_softirq_for() bailing on zombie entries, and dpci_softirq() being the only place where the zombie state gets cleared? Ah crud. So a simple fix could be to seperate the 'state' to only deal with the raise_softirq and softirq_dpci. And then add a new (old) 'masked' to deal between hvm_dirq_assist, pt_irq_guest_eoi and hvm_do_IRQ_dpci. From 94a98e20a8ab721a58788919f92e3524a6c6e25c Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk konrad.w...@oracle.com Date: Thu, 20 Nov 2014 14:28:11 -0500 Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to process. commit f6dd295381f4b6a66acddacf46bca8940586c8d8 dpci: replace tasklet with softirq used the 'masked' as an two-bit state mechanism (STATE_SCHED, STATE_RUN) to communicate between 'raise_softirq_for' and 'dpci_softirq' to determine whether the 'struct hvm_pirq_dpci' can be re-scheduled. However it ignored the 'pt_irq_guest_eoi' was not adhering to the proper dialogue and was not using locked cmpxchg or test_bit operations and ended setting 'state' set to zero. That meant 'raise_softirq_for' was free to schedule it while the 'struct hvm_pirq_dpci'' was still on an per-cpu list causing an list corruption. The code would trigger the following path causing list corruption: \-timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel sets state to 0. pirq_dpci is still on dpci_list. \- dpci_sofitrq while (!list_emptry(our_list)) list_del, but has not yet done 'entry-next = LIST_POISON1;' [interrupt happens] raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list. [interrupt is done, back to dpci_softirq] finishes the entry-next = LIST_POISON1; .. test STATE_SCHED returns true, so executes the hvm_dirq_assist. ends the loop, exits. \- dpci_softirq while (!list_emtpry) list_del, but -next already has LIST_POISON1 and we blow up. An alternative solution was proposed (adding STATE_ZOMBIE and making pt_irq_time_out use the cmpxchg protocol on 'state'), which fixed the above issue but had an fatal bug. It would miss interrupts that are to be scheduled! This patch brings back the 'masked' boolean which is used as an communication channel between 'hvm_do_IRQ_dpci', 'hvm_dirq_assist' and 'pt_irq_guest_eoi'. When we have an interrupt we set 'masked'. Anytime 'hvm_dirq_assist' or 'pt_irq_guest_eoi' executes - it clears it. The 'state' is left as a seperate mechanism to provide an mechanism between 'raise_sofitrq' and 'softirq_dpci' to communicate the state of the 'struct hvm_dirq_pirq'. Reported-by: Sander Eikelenboom li...@eikelenboom.it Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen/drivers/passthrough/io.c | 5 +++-- xen/include/xen/hvm/irq.h| 1 + 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index efc66dc..4d8c02f 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -142,7 +142,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci, if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT, pirq_dpci-flags) ) { -pirq_dpci-state = 0; +pirq_dpci-masked = 0; pirq_dpci-pending = 0; pirq_guest_eoi(dpci_pirq(pirq_dpci)); } @@ -610,6 +610,7 @@ int hvm_do_IRQ_dpci(struct domain *d, struct pirq *pirq) !(pirq_dpci-flags HVM_IRQ_DPCI_MAPPED) ) return 0; +pirq_dpci-masked = 1; raise_softirq_for
Re: [Xen-devel] [PATCH v2 for-4.5] xen/arm: clear UIE on hypervisor entry
On Thu, Nov 20, 2014 at 04:47:42PM +, Ian Campbell wrote: On Thu, 2014-11-20 at 10:53 +, Stefano Stabellini wrote: UIE being set can cause maintenance interrupts to occur when Xen writes to one or more LR registers. The effect is a busy loop around the interrupt handler in Xen (http://marc.info/?l=xen-develm=141597517132682): everything gets stuck. I think it would be useful to explain somewhere why/how a spurious interrupt can occur, if not here then in the comment you've already added or in another one in gic_clear_lrs where you clear UIE. The bit about clearing the LRs on entry causing UIE to become deasserted before we get as far as reading the IAR is a bit subtle. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Reported-and-Tested-by: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com CC: konrad.w...@oracle.com With the expanded commentary: Acked-by: Ian Campbell ian.campb...@citrix.com --- Konrad, this fixes an actual bug, at least on OMAP5. It should have no bad side effects on any other platforms as far as I can tell. It should go in 4.5. As long as the testing that Julian has asked for does not demonstrate any issues with this patch I am OK with it going in Xen 4.5. Changes in v2: - add an in-code comment about maintenance_interrupt not being called. diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 70d10d6..c6c11d3 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -527,8 +529,6 @@ void gic_inject(void) if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1); -else -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); } static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) @@ -598,6 +598,10 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r * Receiving the interrupt is going to cause gic_inject to be called * on return to guest that is going to clear the old LRs and inject * new interrupts. + * + * Do not add code here: maintenance interrupts caused by setting + * GICH_HCR_UIE, might read as spurious interrupts (1023). As a + * consequence sometimes this handler might not be called. */ } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 v2] libxc: don't leak buffer containing the uncompressed PV kernel
On Thu, Nov 20, 2014 at 03:48:47PM +, Ian Campbell wrote: The libxc xc_dom_* infrastructure uses a very simple malloc memory pool which is freed by xc_dom_release. However the various xc_try_*_decode routines (other than the gzip one) just use plain malloc/realloc and therefore the buffer ends up leaked. The memory pool currently supports mmap'd buffers as well as a directly allocated buffers, however the try decode routines make use of realloc and do not fit well into this model. Introduce a concept of an external memory block to the memory pool and provide an interface to register such memory. The mmap_ptr and mmap_len fields of the memblock tracking struct lose their mmap_ prefix since they are now also used for external memory blocks. We are only seeing this now because the gzip decoder doesn't leak and it's only relatively recently that kernels in the wild have switched to better compression. This is https://bugs.debian.org/767295 Reported by: Gedalya geda...@gedalya.net Gedelya, Could you also test this patch to make sure it does fix the reported issue please? Signed-off-by: Ian Campbell ian.campb...@citrix.com --- v2: Correct handling in xc_try_lzo1x_decode. This is a bug fix and should go into 4.5. I have a sneaking suspicion this is going to need to backport a very long way... --- tools/libxc/include/xc_dom.h| 10 -- tools/libxc/xc_dom_bzimageloader.c | 20 tools/libxc/xc_dom_core.c | 61 +++ tools/libxc/xc_dom_decompress_lz4.c |5 +++ 4 files changed, 80 insertions(+), 16 deletions(-) diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h index 6ae6a9f..07d7224 100644 --- a/tools/libxc/include/xc_dom.h +++ b/tools/libxc/include/xc_dom.h @@ -33,8 +33,13 @@ struct xc_dom_seg { struct xc_dom_mem { struct xc_dom_mem *next; -void *mmap_ptr; -size_t mmap_len; +void *ptr; +enum { +XC_DOM_MEM_TYPE_MALLOC_INTERNAL, +XC_DOM_MEM_TYPE_MALLOC_EXTERNAL, +XC_DOM_MEM_TYPE_MMAP, +} type; +size_t len; unsigned char memory[0]; }; @@ -298,6 +303,7 @@ void xc_dom_log_memory_footprint(struct xc_dom_image *dom); /* --- simple memory pool -- */ void *xc_dom_malloc(struct xc_dom_image *dom, size_t size); +int xc_dom_register_external(struct xc_dom_image *dom, void *ptr, size_t size); void *xc_dom_malloc_page_aligned(struct xc_dom_image *dom, size_t size); void *xc_dom_malloc_filemap(struct xc_dom_image *dom, const char *filename, size_t * size, diff --git a/tools/libxc/xc_dom_bzimageloader.c b/tools/libxc/xc_dom_bzimageloader.c index 2225699..964ebdc 100644 --- a/tools/libxc/xc_dom_bzimageloader.c +++ b/tools/libxc/xc_dom_bzimageloader.c @@ -161,6 +161,13 @@ static int xc_try_bzip2_decode( total = (((uint64_t)stream.total_out_hi32) 32) | stream.total_out_lo32; +if ( xc_dom_register_external(dom, out_buf, total) ) +{ +DOMPRINTF(BZIP2: Error registering stream output); +free(out_buf); +goto bzip2_cleanup; +} + DOMPRINTF(%s: BZIP2 decompress OK, 0x%zx - 0x%lx, __FUNCTION__, *size, (long unsigned int) total); @@ -305,6 +312,13 @@ static int _xc_try_lzma_decode( } } +if ( xc_dom_register_external(dom, out_buf, stream-total_out) ) +{ +DOMPRINTF(%s: Error registering stream output, what); +free(out_buf); +goto lzma_cleanup; +} + DOMPRINTF(%s: %s decompress OK, 0x%zx - 0x%zx, __FUNCTION__, what, *size, (size_t)stream-total_out); @@ -464,7 +478,13 @@ static int xc_try_lzo1x_decode( dst_len = lzo_read_32(cur); if ( !dst_len ) +{ +msg = Error registering stream output; +if ( xc_dom_register_external(dom, out_buf, out_len) ) +break; + return 0; +} if ( dst_len LZOP_MAX_BLOCK_SIZE ) { diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c index baa62a1..ecbf981 100644 --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -132,6 +132,7 @@ void *xc_dom_malloc(struct xc_dom_image *dom, size_t size) return NULL; } memset(block, 0, sizeof(*block) + size); +block-type = XC_DOM_MEM_TYPE_MALLOC_INTERNAL; block-next = dom-memblocks; dom-memblocks = block; dom-alloc_malloc += sizeof(*block) + size; @@ -151,23 +152,45 @@ void *xc_dom_malloc_page_aligned(struct xc_dom_image *dom, size_t size) return NULL; } memset(block, 0, sizeof(*block)); -block-mmap_len = size; -block-mmap_ptr = mmap(NULL, block-mmap_len, - PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, - -1, 0);
Re: [Xen-devel] [PATCH v2 for 4.5] scripts/get_maintainer.pl: Correctly CC the maintainers
On Thu, Nov 20, 2014 at 05:36:03PM +, Julien Grall wrote: The current script is setting $email_remove_duplicates to 1 by default, on complex patch (see [1]), this will result to ommitting randomly some maintainers. One could see that as feature - the emails about bugs or patches to review don't land in your inbox! This is because, the script will: 1) Get the list of maintainers of the file (incidentally all the maintainers in THE REST role are added). If the email address already exists in the global list, skip it. = The role will be lost 2) Filter the list to remove the entry with THE REST role So if a maintainers is marked with THE REST role on the first file and actually be an x86 maintainers on the script, the script will only retain the THE REST role. During the filtering step, this maintainers will therefore be dropped. This patch fixes this by setting $email_remove_duplicates to 0 by default. The new behavior of the script will be: 1) Append the list of maintainers for every file 2) Filter the list to remove the entry with THE REST role 3) Remove duplicated email address Example: Patch: https://patches.linaro.org/41083/ Before the patch: Daniel De Graaf dgde...@tycho.nsa.gov Ian Jackson ian.jack...@eu.citrix.com Stefano Stabellini stefano.stabell...@eu.citrix.com Ian Campbell ian.campb...@citrix.com Wei Liu wei.l...@citrix.com George Dunlap george.dun...@eu.citrix.com xen-devel@lists.xen.org After the patch: Daniel De Graaf dgde...@tycho.nsa.gov Ian Jackson ian.jack...@eu.citrix.com Stefano Stabellini stefano.stabell...@eu.citrix.com Ian Campbell ian.campb...@citrix.com Wei Liu wei.l...@citrix.com Stefano Stabellini stefano.stabell...@citrix.com Tim Deegan t...@xen.org Keir Fraser k...@xen.org Jan Beulich jbeul...@suse.com George Dunlap george.dun...@eu.citrix.com xen-devel@lists.xen.org [1] http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg00060.html Signed-off-by: Julien Grall julien.gr...@linaro.org CC: Don Slutz dsl...@verizon.com --- Changes in v2: - Rework the commit message to explain the problem and the solution more clearly I would like to see this patch in Xen 4.5 and backported to Xen 4.4 (first time the script has been introduced). Developpers using this script won't ommitted to cc some maintainers, and it will avoid maintainers complaining about miss CC. The only drawbacks I can see is if the maintainers is referenced twice in the file MAINTAINERS with different email, the script won't notice it's duplicated and list 2 times. Though, for this one it could be fixed by modifying the MAINTAINERS file. Is it worth for Xen 4.5? For know, it seems to only happen with Stefano. I am OK with this going in Xen 4.5. Thank you. --- scripts/get_maintainer.pl |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/get_maintainer.pl b/scripts/get_maintainer.pl index df920e2..cc445cd 100755 --- a/scripts/get_maintainer.pl +++ b/scripts/get_maintainer.pl @@ -35,7 +35,7 @@ my $email_git_min_percent = 5; my $email_git_since = 1-year-ago; my $email_hg_since = -365; my $interactive = 0; -my $email_remove_duplicates = 1; +my $email_remove_duplicates = 0; my $email_use_mailmap = 1; my $email_drop_the_rest_supporter_if_supporter_found = 1; my $output_multiline = 1; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 12/13] swiotlb-xen: pass dev_addr to xen_dma_unmap_page and xen_dma_sync_single_for_cpu
On Thu, Nov 20, 2014 at 10:47:51AM +, Stefano Stabellini wrote: On Thu, 20 Nov 2014, Stefano Stabellini wrote: On Wed, 19 Nov 2014, Konrad Rzeszutek Wilk wrote: On Wed, Nov 12, 2014 at 11:40:53AM +, Stefano Stabellini wrote: xen_dma_unmap_page and xen_dma_sync_single_for_cpu take a dma_addr_t handle as argument, not a physical address. Ouch. Should this also go on stable tree? Good idea Also can I take that as an Acked-by for this patch? Yes. There is one last bit of common and x86 changes in this series: patch #7 adds a paramter to xen_dma_map_page, even though the x86 implementation is empty: http://marc.info/?l=xen-develm=141579253829808w=2 is that OK for you? Yes. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Reviewed-by: Catalin Marinas catalin.mari...@arm.com --- drivers/xen/swiotlb-xen.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 3725ee4..498b654 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -449,7 +449,7 @@ static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr, BUG_ON(dir == DMA_NONE); - xen_dma_unmap_page(hwdev, paddr, size, dir, attrs); + xen_dma_unmap_page(hwdev, dev_addr, size, dir, attrs); /* NOTE: We use dev_addr here, not paddr! */ if (is_xen_swiotlb_buffer(dev_addr)) { @@ -497,14 +497,14 @@ xen_swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr, BUG_ON(dir == DMA_NONE); if (target == SYNC_FOR_CPU) - xen_dma_sync_single_for_cpu(hwdev, paddr, size, dir); + xen_dma_sync_single_for_cpu(hwdev, dev_addr, size, dir); /* NOTE: We use dev_addr here, not paddr! */ if (is_xen_swiotlb_buffer(dev_addr)) swiotlb_tbl_sync_single(hwdev, paddr, size, dir, target); if (target == SYNC_FOR_DEVICE) - xen_dma_sync_single_for_cpu(hwdev, paddr, size, dir); + xen_dma_sync_single_for_cpu(hwdev, dev_addr, size, dir); if (dir != DMA_FROM_DEVICE) return; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
On Fri, Nov 21, 2014 at 01:51:24PM +0100, Sander Eikelenboom wrote: Friday, November 21, 2014, 12:50:16 PM, you wrote: On November 21, 2014 2:51:33 AM EST, Jan Beulich jbeul...@suse.com wrote: On 20.11.14 at 20:51, konrad.w...@oracle.com wrote: @@ -669,7 +670,7 @@ static void hvm_dirq_assist(struct domain *d, struct hvm_pirq_dpci *pirq_dpci) ASSERT(d-arch.hvm_domain.irq.dpci); spin_lock(d-event_lock); -if ( pirq_dpci-state ) +if ( test_and_clear_bool(pirq_dpci-masked) ) { struct pirq *pirq = dpci_pirq(pirq_dpci); const struct dev_intx_gsi_link *digl; So this now guards solely against the timeout enforced EOI? Why do you no longer need to guard against cancellation (i.e. why isn't this looking at both -state and -masked)? The previous state check was superfluous as the dpci_softirq would check for the valid STATE_ before calling hvm_dirq_assist (and deal with cancellation). I actually had an cleanup patch that would have removed the 'if (pirq_dpci-state) ' and move the code for Xen 4.6. Anyhow waiting to see if Sander was able to test with this patch. Jan Hi Konrad / Jan, I have tested it for 3 hours now, no host crash so far (even after applying some extra stress to the guest). Yeey! Thank you for being so flexible and willing to test these patches out! -- Sander ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] WARNings in guest during xl save/restore
On Fri, Nov 21, 2014 at 11:08:37AM +0100, Juergen Gross wrote: Hi, during tests of my linear p2m list patches I stumbled over some WARNs issued during xl save and xl restore of a pv-domU with unpatched linux 3.18-rc5: Boris had an patch for this I htink.. during save I saw multiple entries like: [ 176.900393] WARNING: CPU: 0 PID: 9 at arch/x86/xen/enlighten.c:968 clear_local_APIC+0xa5/0x2b0() [ 176.900393] Modules linked in: cfg80211 rfkill nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc evdev x86_pkg_temp_thermal thermal_sys snd_pcm coretemp snd_timer crc32_pclmul aesni_intel snd xts soundcore aes_i586 lrw gf128mul ablk_helper pcspkr cryptd fuse autofs4 ext4 crc16 mbcache jbd2 crc32c_intel [ 176.900393] CPU: 0 PID: 9 Comm: migration/0 Tainted: GW 3.18.0-rc5 #30 [ 176.900393] 0009 c14c40b2 c1054b10 c1599538 0009 c158bdc2 [ 176.900393] 03c8 c103c925 c103c925 03c8 0002 c15d25eb e8867e64 [ 176.900393] c1054bd9 0009 c103c925 c103cb54 0002 [ 176.900393] Call Trace: [ 176.900393] [c14c40b2] ? dump_stack+0x3e/0x4e [ 176.900393] [c1054b10] ? warn_slowpath_common+0x90/0xc0 [ 176.900393] [c103c925] ? clear_local_APIC+0xa5/0x2b0 [ 176.900393] [c103c925] ? clear_local_APIC+0xa5/0x2b0 [ 176.900393] [c1054bd9] ? warn_slowpath_null+0x19/0x20 [ 176.900393] [c103c925] ? clear_local_APIC+0xa5/0x2b0 [ 176.900393] [c103cb54] ? disable_local_APIC+0x24/0x90 [ 176.900393] [c103ccde] ? lapic_suspend+0x11e/0x170 [ 176.900393] [c1360ff9] ? syscore_suspend+0x79/0x220 [ 176.900393] [c107e782] ? set_next_entity+0x62/0x80 [ 176.900393] [c13165cd] ? xen_suspend+0x2d/0x110 [ 176.900393] [c10045df] ? xen_mc_flush+0x13f/0x170 [ 176.900393] [c10d5619] ? multi_cpu_stop+0xa9/0xd0 [ 176.900393] [c10d5570] ? cpu_stop_should_run+0x50/0x50 [ 176.900393] [c10d5771] ? cpu_stopper_thread+0x71/0x100 [ 176.900393] [c1074214] ? finish_task_switch+0x34/0xd0 [ 176.900393] [c14c513d] ? __schedule+0x23d/0x7f0 [ 176.900393] [c10897f4] ? __wake_up_common+0x44/0x70 [ 176.900393] [c14c8962] ? _raw_spin_lock_irqsave+0x12/0x60 [ 176.900393] [c1071f22] ? smpboot_thread_fn+0xd2/0x170 [ 176.900393] [c1071e50] ? SyS_setgroups+0x110/0x110 [ 176.900393] [c106e801] ? kthread+0xa1/0xc0 [ 176.900393] [c14c8ea1] ? ret_from_kernel_thread+0x21/0x30 [ 176.900393] [c106e760] ? kthread_create_on_node+0x120/0x120 [ 176.900393] ---[ end trace b38596d5cfdcde8d ]--- and during restore: [ 176.900393] WARNING: CPU: 0 PID: 9 at arch/x86/xen/enlighten.c:968 lapic_resume+0xc6/0x270() [ 176.900393] Modules linked in: cfg80211 rfkill nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc evdev x86_pkg_temp_thermal thermal_sys snd_pcm coretemp snd_timer crc32_pclmul aesni_intel snd xts soundcore aes_i586 lrw gf128mul ablk_helper pcspkr cryptd fuse autofs4 ext4 crc16 mbcache jbd2 crc32c_intel [ 176.900393] CPU: 0 PID: 9 Comm: migration/0 Tainted: GW 3.18.0-rc5 #30 [ 176.900393] 0009 c14c40b2 c1054b10 c1599538 0009 c158bdc2 [ 176.900393] 03c8 c103c1e6 c103c1e6 03c8 c1030020 0002 001b [ 176.900393] c1054bd9 0009 c103c1e6 c16432c0 0108cdfe c15d25dc [ 176.900393] Call Trace: [ 176.900393] [c14c40b2] ? dump_stack+0x3e/0x4e [ 176.900393] [c1054b10] ? warn_slowpath_common+0x90/0xc0 [ 176.900393] [c103c1e6] ? lapic_resume+0xc6/0x270 [ 176.900393] [c103c1e6] ? lapic_resume+0xc6/0x270 [ 176.900393] [c1030020] ? mcheck_cpu_init+0x170/0x4f0 [ 176.900393] [c1054bd9] ? warn_slowpath_null+0x19/0x20 [ 176.900393] [c103c1e6] ? lapic_resume+0xc6/0x270 [ 176.900393] [c1360e66] ? syscore_resume+0x46/0x160 [ 176.900393] [c1009012] ? xen_timer_resume+0x42/0x60 [ 176.900393] [c131661c] ? xen_suspend+0x7c/0x110 [ 176.900393] [c10d5619] ? multi_cpu_stop+0xa9/0xd0 [ 176.900393] [c10d5570] ? cpu_stop_should_run+0x50/0x50 [ 176.900393] [c10d5771] ? cpu_stopper_thread+0x71/0x100 [ 176.900393] [c1074214] ? finish_task_switch+0x34/0xd0 [ 176.900393] [c14c513d] ? __schedule+0x23d/0x7f0 [ 176.900393] [c10897f4] ? __wake_up_common+0x44/0x70 [ 176.900393] [c14c8962] ? _raw_spin_lock_irqsave+0x12/0x60 [ 176.900393] [c1071f22] ? smpboot_thread_fn+0xd2/0x170 [ 176.900393] [c1071e50] ? SyS_setgroups+0x110/0x110 [ 176.900393] [c106e801] ? kthread+0xa1/0xc0 [ 176.900393] [c14c8ea1] ? ret_from_kernel_thread+0x21/0x30 [ 176.900393] [c106e760] ? kthread_create_on_node+0x120/0x120 [ 176.900393] ---[ end trace b38596d5cfdcde93 ]--- While this seems not to be critical (the system is running after the restore) I assume disabling/enabling a local APIC on a pv-domain isn't something we want to happen... Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org
Re: [Xen-devel] [for-xen-4.5 PATCH v2 2/2] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq.
' continues on with setting '-flag=0' and unlocks the lock. 'hvm_dirq_assist' grabs the lock and continues one (and state is STATE_RUN). Since 'flag = 0' and 'digl_list' is empty, it thunders through the 'hvm_dirq_assist' not doing anything. Well, it ends up calling the dreaded 'set_timer' - which will hit another assert as it has never been initialized. Adding in 'mapping = 0' on the error paths for MSI takes care of that. That will take care of that - and I just rolled 'masked =0' in the pt_pirq_softirq_reset function. The legacy interrupt does not have this window, so it cannot do it. is being acquired right before the line quoted above, _that_ is No. The event lock is acquired in 'hvm_dirq_assist' which would be checking the 'mapping', not 'state'. spin_lock(d-event_lock); if ( test_and_clear_bool(pirq_dpci-masked) ) what needs closely looking at (and why I was asking about the deletion of the [at the first glance unnecessary] checking of -state in hvm_dirq_assist()). From 90d00db0949a8e796d7f812134753a54b2fe3d51 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk konrad.w...@oracle.com Date: Thu, 20 Nov 2014 14:28:11 -0500 Subject: [PATCH] dpci: Add 'masked' as an gate for hvm_dirq_assist to process. commit f6dd295381f4b6a66acddacf46bca8940586c8d8 dpci: replace tasklet with softirq used the 'masked' as an two-bit state mechanism (STATE_SCHED, STATE_RUN) to communicate between 'raise_softirq_for' and 'dpci_softirq' to determine whether the 'struct hvm_pirq_dpci' can be re-scheduled. However it ignored the 'pt_irq_guest_eoi' was not adhering to the proper dialogue and was not using locked cmpxchg or test_bit operations and ended setting 'state' set to zero. That meant 'raise_softirq_for' was free to schedule it while the 'struct hvm_pirq_dpci'' was still on an per-cpu list causing an list corruption. The code would trigger the following path causing list corruption: \-timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel sets state to 0. pirq_dpci is still on dpci_list. \- dpci_sofitrq while (!list_emptry(our_list)) list_del, but has not yet done 'entry-next = LIST_POISON1;' [interrupt happens] raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list. [interrupt is done, back to dpci_softirq] finishes the entry-next = LIST_POISON1; .. test STATE_SCHED returns true, so executes the hvm_dirq_assist. ends the loop, exits. \- dpci_softirq while (!list_emtpry) list_del, but -next already has LIST_POISON1 and we blow up. An alternative solution was proposed (adding STATE_ZOMBIE and making pt_irq_time_out use the cmpxchg protocol on 'state'), which fixed the above issue but had an fatal bug. It would miss interrupts that are to be scheduled! This patch brings back the 'masked' boolean which is used as an communication channel between 'hvm_do_IRQ_dpci', 'hvm_dirq_assist' and 'pt_irq_guest_eoi'. When we have an interrupt we set 'masked'. Anytime 'hvm_dirq_assist' or 'pt_irq_guest_eoi' executes - it clears it. The 'state' is left as a seperate mechanism to provide an mechanism between 'raise_sofitrq' and 'softirq_dpci' to communicate the state of the 'struct hvm_dirq_pirq'. However since we have now two seperate machines we have to deal with an cancellations and outstanding interrupt being serviced: 'pt_irq_destroy_bind' is called while an 'hvm_dirq_assist' is just about to service. The 'pt_irq_destroy_bind' takes the lock first and kills the timer - and the moment it releases the spinlock, 'hvm_dirq_assist' thunders in and calls 'set_timer' hitting an ASSERT. By clearing the 'masked' in the 'pt_irq_destroy_bind' we take care of that scenario by inhibiting 'hvm_dirq_assist' to call the 'set_timer'. In the 'pt_irq_create_bind' - in the error cases we could be seeing an softirq scheduled right away and being serviced (though stuck at the spinlock). The 'pt_irq_create_bind' fails in 'pt_pirq_softirq_reset' to change the 'state' (as the state is in 'STATE_RUN', not 'STATE_SCHED'). 'pt_irq_create_bind' continues on with setting '-flag=0' and unlocks the lock. 'hvm_dirq_assist' grabs the lock and continues one. Since 'flag = 0' and 'digl_list' is empty, it thunders through the 'hvm_dirq_assist' not doing anything until it hits 'set_timer' which is undefined for MSI. Adding in 'masked=0' for the MSI case fixes that. The legacy interrupt one does not need it as there is no chance of do_IRQ being called at that point. Reported-by: Sander Eikelenboom li...@eikelenboom.it Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com v2: Deal with 'masked' in pt_irq_destroy_bind. v3: Deal with 'masked' in pt_irq_create_bind. --- xen/drivers/passthrough/io.c | 12 ++-- xen/include/xen/hvm/irq.h| 1 + 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/xen
Re: [Xen-devel] [PATCH v9 10/13] xen/arm/arm64: introduce xen_arch_need_swiotlb
On Fri, Nov 21, 2014 at 04:31:31PM +, Stefano Stabellini wrote: On Wed, 12 Nov 2014, Stefano Stabellini wrote: Introduce an arch specific function to find out whether a particular dma mapping operation needs to bounce on the swiotlb buffer. On ARM and ARM64, if the page involved is a foreign page and the device is not coherent, we need to bounce because at unmap time we cannot execute any required cache maintenance operations (we don't know how to find the pfn from the mfn). No change of behaviour for x86. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Reviewed-by: David Vrabel david.vra...@citrix.com Reviewed-by: Catalin Marinas catalin.mari...@arm.com Acked-by: Ian Campbell ian.campb...@citrix.com Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com I am thinking of asking a backport of this patch to 3.16+ The catch is that is_device_dma_coherent is not available on older kernels, so I'll have to change the arm implementation of xen_arch_need_swiotlb to: +bool xen_arch_need_swiotlb(struct device *dev, +unsigned long pfn, +unsigned long mfn) +{ + return pfn != mfn; +} + It is going to make things slower but it is going to fix the issue with cache flushing buffers for non-coherent devices. Konrad, are you OK with that? Looks fine. Changes in v6: - fix ts. Changes in v5: - fix indentation. --- arch/arm/include/asm/xen/page.h |4 arch/arm/xen/mm.c |7 +++ arch/x86/include/asm/xen/page.h |7 +++ drivers/xen/swiotlb-xen.c |5 - 4 files changed, 22 insertions(+), 1 deletion(-) diff --git a/arch/arm/include/asm/xen/page.h b/arch/arm/include/asm/xen/page.h index 135c24a..68c739b 100644 --- a/arch/arm/include/asm/xen/page.h +++ b/arch/arm/include/asm/xen/page.h @@ -107,4 +107,8 @@ static inline bool set_phys_to_machine(unsigned long pfn, unsigned long mfn) #define xen_remap(cookie, size) ioremap_cache((cookie), (size)) #define xen_unmap(cookie) iounmap((cookie)) +bool xen_arch_need_swiotlb(struct device *dev, + unsigned long pfn, + unsigned long mfn); + #endif /* _ASM_ARM_XEN_PAGE_H */ diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c index ab700e1..28ebf3e 100644 --- a/arch/arm/xen/mm.c +++ b/arch/arm/xen/mm.c @@ -100,6 +100,13 @@ void __xen_dma_sync_single_for_device(struct device *hwdev, __xen_dma_page_cpu_to_dev(hwdev, handle, size, dir); } +bool xen_arch_need_swiotlb(struct device *dev, + unsigned long pfn, + unsigned long mfn) +{ + return ((pfn != mfn) !is_device_dma_coherent(dev)); +} + int xen_create_contiguous_region(phys_addr_t pstart, unsigned int order, unsigned int address_bits, dma_addr_t *dma_handle) diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index c949923..f58ef6c 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -236,4 +236,11 @@ void make_lowmem_page_readwrite(void *vaddr); #define xen_remap(cookie, size) ioremap((cookie), (size)); #define xen_unmap(cookie) iounmap((cookie)) +static inline bool xen_arch_need_swiotlb(struct device *dev, +unsigned long pfn, +unsigned long mfn) +{ + return false; +} + #endif /* _ASM_X86_XEN_PAGE_H */ diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index ad2c5eb..3725ee4 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -399,7 +399,9 @@ dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page, * buffering it. */ if (dma_capable(dev, dev_addr, size) - !range_straddles_page_boundary(phys, size) !swiotlb_force) { + !range_straddles_page_boundary(phys, size) + !xen_arch_need_swiotlb(dev, PFN_DOWN(phys), PFN_DOWN(dev_addr)) + !swiotlb_force) { /* we are not interested in the dma_addr returned by * xen_dma_map_page, only in the potential cache flushes executed * by the function. */ @@ -557,6 +559,7 @@ xen_swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, dma_addr_t dev_addr = xen_phys_to_bus(paddr); if (swiotlb_force || + xen_arch_need_swiotlb(hwdev, PFN_DOWN(paddr), PFN_DOWN(dev_addr)) || !dma_capable(hwdev, dev_addr, sg-length) || range_straddles_page_boundary(paddr, sg-length)) { phys_addr_t map = swiotlb_tbl_map_single(hwdev, -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel
Re: [Xen-devel] [PATCH for-4.5 RFC] pygrub: Fix regression from c/s d1b93ea, attempt 2
On Fri, Nov 21, 2014 at 01:32:13PM +, Andrew Cooper wrote: On 20/11/14 16:45, Boris Ostrovsky wrote: On 11/20/2014 11:15 AM, Ian Campbell wrote: On Thu, 2014-11-20 at 16:08 +, Andrew Cooper wrote: On 20/11/14 16:00, Ian Campbell wrote: On Mon, 2014-11-17 at 15:19 +, Andrew Cooper wrote: c/s d1b93ea causes substantial functional regressions in pygrub's ability to parse bootloader configuration files. c/s d1b93ea itself changed an an interface which previously used exclusively integers, to using strings in the case of a grub configuration with explicit default set, along with changing the code calling the interface to require a string. The default value for default remained as an integer. As a result, any Extlinux or Lilo configuration (which drives this interface exclusively with integers), or Grub configuration which doesn't explicitly declare a default will die with an AttributeError when attempting to call self.cf.default.isdigit() where default is an integer. Sadly, this AttributeError gets swallowed by the blanket ignore in the loop which searches partitions for valid bootloader configurations, causing the issue to be reported as Unable to find partition containing kernel This patch attempts to fix the issue by altering all parts of this interface to use strings, as opposed to integers. Would it be less invasive at this point in the release to have the place where this stuff is used do isinstance(s, str) and isinstance(s, int)? It would be BaseString not str, but I am fairly sure the classes have altered through Py2's history. I would not be any more confident with that as a solution as trying to correctly to start with. Actually isinstance(s, basestring) is what the webpage I was looking at said, but I cut-n-pasted the wrong thing. But regardless could it not do something like: if !isinstance(foo.cf.default, int): I think it would be the other way around, e.g. (not tested): diff --git a/tools/pygrub/src/pygrub b/tools/pygrub/src/pygrub index aa7e562..7250f45 100644 --- a/tools/pygrub/src/pygrub +++ b/tools/pygrub/src/pygrub @@ -457,7 +457,9 @@ class Grub: self.cf.parse(buf) def image_index(self): -if self.cf.default.isdigit(): +if isinstance(self.cf.default, int) +sel = self.cf.default +elif if self.cf.default.isdigit(): sel = int(self.cf.default) else: # We don't fully support submenus. Look for the leaf value in but yes, this looks less intrusive (assuming this the only place where we'd hit this error). That does look plausibly like it would fix the issue. However, I can't help but feeing that this is hacking around a broken patch in the first place. I cannot think of a reason you would ever feel that way! surreptitiously checks the roll of Xen 4.5 band-aid We know that the existing patches work fine for a host of Fedora families (15-21) (thought I need to double check that the testing framework that we have is using pygrub and not pvgrub), SLESs and RHEL5s, and OL6s. The ones that I am worried about are the ExtLinux and such which I didn't realize would use 'pygrub'. Thought I just remembered a bug with OL7 grub entries - that is if you go in the interactive menu things broke down. Boris, do you remember if that was fixed or just 'deferred'? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/2] small swiotlb-xen fixes
On Fri, Nov 21, 2014 at 05:00:05PM +, Stefano Stabellini wrote: Hi all, I have a couple of small straightforward fixes for swiotlb-xen for 3.19. They look fine to me. Thanks. Cheers, Stefano Stefano Stabellini (2): swiotlb-xen: call xen_dma_sync_single_for_device when appropriate swiotlb-xen: pass dev_addr to swiotlb_tbl_unmap_single drivers/xen/swiotlb-xen.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] libxl: do not load roms for any NICs except the first to avoid wasting memory
On Fri, Nov 21, 2014 at 05:11:09PM +, Stefano Stabellini wrote: The rom is used for pxebooting. We don't need to allow pxebooting from more than one network card. Loading a romfile for every NIC wastes Why not? Why can't we PXE boot from each network card? memory and as a matter of fact breaks configurations with more than 4 NICs as QEMU fails to allocate memory on behalf of the guest. What if you have four different type of NICs? Say 1 rlt8193, 1 e1000, one eepro, and ne2k ? Don't you want to load the ROM for each one? With this fix, it is possible to assign more than 4 rtl8139 NICs to the guest. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 3e191c3..f907ca9 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -674,9 +674,10 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, LIBXL_NIC_TYPE_VIF_IOEMU); flexarray_append(dm_args, -device); flexarray_append(dm_args, - libxl__sprintf(gc, %s,id=nic%d,netdev=net%d,mac=%s, + libxl__sprintf(gc, %s,id=nic%d,netdev=net%d,mac=%s%s, nics[i].model, nics[i].devid, -nics[i].devid, smac)); +nics[i].devid, smac, +i ? ,romfile=\\ : )); flexarray_append(dm_args, -netdev); flexarray_append(dm_args, GCSPRINTF( type=tap,id=net%d,ifname=%s, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5-rc2 post update (RC2 was out 2014-Nov-11th)
On Fri, Nov 21, 2014 at 12:54:08PM -0500, Don Slutz wrote: On 11/21/14 12:42, konrad.w...@oracle.com wrote: Feature patchsets that did not make it in by today have been put on the deferred list. If you think your feature should make it by Xen 4.5-RC3 please make your case. Xen 4.5-rc2 was out on Monday (11th). There are three known issues, if I have missed one (or more) please respond. (see 'Known Issues' below) which are to be fixed by RC3 - if: - The maintainers are fine with it, - The risk is minimal to common code paths. Details for the test-day are at http://wiki.xen.org/wiki/Xen_4.5_RC2_test_instructions In terms of bugs, we have: #11 qxl hypervisor support #13 Re: [Xen-devel] man page example: xm block-attach #18 xl improve support for migration over non-sshlike tunnels #19 xl migrate transport improvements #22 xl does not support specifying virtual function for passthrough device #23 Remove arbitrary LIBXL_MAXMEM_CONSTANT from libxl, see what breaks #24 xl missing support for encrypted VNC #27 Re: [Xen-devel] xend vs xl with pci=['bdf'] wherein the 'bdf' are not owned by pciback or pcistub will still launch. #28 support PCI hole resize in qemu-xen [Isn't this fixed by Don's 'mmio_hole' patches?] I consider mmio_hole a work around for this. The is more that can be done here. Yeah, I had a discussion with George about this and it seems that the proper fix to this is quite involved (perhaps make it a XEn 4.6 feature/fix?). I need somehow to update the bug system to mention how to use 'mmio_hole' for this. Thank you! ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 2/2] xl: print out partial configuration in long mode of list command
On Fri, Nov 21, 2014 at 01:04:42PM -0500, Zhigang Wang wrote: On 11/14/2014 05:22 AM, Ian Campbell wrote: On Fri, 2014-11-14 at 10:09 +, Wei Liu wrote: On Fri, Nov 14, 2014 at 09:59:00AM +, Ian Campbell wrote: On Wed, 2014-11-12 at 17:04 +, Wei Liu wrote: Unconditionally print out the partial configuration. Can you provide an example of what such a configuration looks like? { domid: 2, config: { c_info: { name: s0-raw-vnuma, uuid: a8bed4ac-a0fe-4166-8eac-feeb007a2110 }, b_info: { sched_params: { }, type.invalid: { } } } } Great, thanks. I propose to insert this into the commit message as I check it in (once I check who's acked it etc) Libxl still complains because it tries to read some nodes that don't exist, so xl will just print it out on stderr. This is the same behaviour as before though. I think that's tolerable for 4.5 at least. I also think this is OK for 4.5. Konrad: can pickup this for next rc? Please see http://lists.xen.org/archives/html/xen-devel/2014-11/msg01238.html Thanks, Zhigang ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] libxl: do not load roms for any NICs except the first to avoid wasting memory
On Fri, Nov 21, 2014 at 06:48:53PM +, Stefano Stabellini wrote: On Fri, 21 Nov 2014, Konrad Rzeszutek Wilk wrote: On Fri, Nov 21, 2014 at 05:11:09PM +, Stefano Stabellini wrote: The rom is used for pxebooting. We don't need to allow pxebooting from more than one network card. Loading a romfile for every NIC wastes Why not? Why can't we PXE boot from each network card? I take it back: you are right, at the moment it is actually possible to pxe boot from the fourth nic for example. Maybe we could just load the first four romfiles and skip the others (four is the limit before QEMU fails to allocate any more memory). The limit is in the count of ROMs or the memory consumption? What if you also do PCI passthrough? Does that figure in this calculation? TBH not all the emulated nics need a romfile but I wouldn't want to go down at that level of details beause they become QEMU implementation details. I wouldn't want to tie libxl to a certain version of QEMU in any way. A better way would be handling memory allocation errors in QEMU but QEMU doesn't do that at the moment and the change there would be certainly more invasive that a couple of lines in libxl. memory and as a matter of fact breaks configurations with more than 4 NICs as QEMU fails to allocate memory on behalf of the guest. What if you have four different type of NICs? Say 1 rlt8193, 1 e1000, one eepro, and ne2k ? Don't you want to load the ROM for each one? The rom cannot be reused in QEMU among multiple instances of the same nic, so having four different types of nics or just one type doesn't change anything. With this fix, it is possible to assign more than 4 rtl8139 NICs to the guest. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 3e191c3..f907ca9 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -674,9 +674,10 @@ static char ** libxl__build_device_model_args_new(libxl__gc *gc, LIBXL_NIC_TYPE_VIF_IOEMU); flexarray_append(dm_args, -device); flexarray_append(dm_args, - libxl__sprintf(gc, %s,id=nic%d,netdev=net%d,mac=%s, + libxl__sprintf(gc, %s,id=nic%d,netdev=net%d,mac=%s%s, nics[i].model, nics[i].devid, -nics[i].devid, smac)); +nics[i].devid, smac, +i ? ,romfile=\\ : )); flexarray_append(dm_args, -netdev); flexarray_append(dm_args, GCSPRINTF( type=tap,id=net%d,ifname=%s, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 00/19] Virtual NUMA for PV and HVM
On Fri, Nov 21, 2014 at 03:06:42PM +, Wei Liu wrote: Hi all This patch series implements virtual NUMA support for both PV and HVM guest. That is, admin can configure via libxl what virtual NUMA topology the guest sees. This is the stage 1 (basic vNUMA support) and part of stage 2 (vNUMA-ware ballooning, hypervisor side) described in my previous email to xen-devel [0]. This series is broken into several parts: 1. xen patches: vNUMA debug output and vNUMA-aware memory hypercall support. 2. libxc/libxl support for PV vNUMA. 3. libxc/libxl support for HVM vNUMA. 4. xl vNUMA configuration documentation and parser. I think one significant difference from Elena's work is that this patch series makes use of multiple vmemranges should there be a memory hole, instead of shrinking ram. This matches the behaviour of real hardware. Are some of the patches then borrowed from Elena? If so, she should be credited in the patches? The vNUMA auto placement algorithm is missing at the moment and Dario is working on it. This series can be found at: git://xenbits.xen.org/people/liuw/xen.git wip.vnuma-v1 With this series, the following configuration can be used to enabled virtual NUMA support, and it works for both PV and HVM guests. memory = 6000 vnuma_memory = [3000, 3000] vnuma_vcpu_map = [0, 1] vnuma_pnode_map = [0, 0] vnuma_vdistances = [10, 30] # optional dmesg output for HVM guest: [0.00] ACPI: SRAT fc009ff0 000C8 (v01Xen HVM HVML ) [0.00] ACPI: SLIT fc00a0c0 00030 (v01Xen HVM HVML ) ...snip... [0.00] SRAT: PXM 0 - APIC 0x00 - Node 0 [0.00] SRAT: PXM 1 - APIC 0x02 - Node 1 [0.00] SRAT: Node 0 PXM 0 [mem 0x-0xbb7f] [0.00] SRAT: Node 1 PXM 1 [mem 0xbb80-0xefff] [0.00] SRAT: Node 1 PXM 1 [mem 0x1-0x186ff] [0.00] NUMA: Initialized distance table, cnt=2 [0.00] NUMA: Node 1 [mem 0xbb80-0xefff] + [mem 0x1-0x1867f] - [mem 0xbb80-0x1867f] [0.00] Initmem setup node 0 [mem 0x-0xbb7f] [0.00] NODE_DATA [mem 0xbb7fc000-0xbb7f] [0.00] Initmem setup node 1 [mem 0xbb80-0x1867f] [0.00] NODE_DATA [mem 0x1867f7000-0x1867fafff] [0.00] [ea00-ea00029f] PMD - [8800b860-8800baff] on node 0 [0.00] [ea0002a0-ea00055f] PMD - [88018300-8801859f] on node 1 ...snip... [0.00] Early memory node ranges [0.00] node 0: [mem 0x1000-0x0009efff] [0.00] node 0: [mem 0x0010-0xbb7f] [0.00] node 1: [mem 0xbb80-0xefffefff] [0.00] node 1: [mem 0x1-0x1867f] numactl output for HVM guest: available: 2 nodes (0-1) node 0 cpus: 0 node 0 size: 2999 MB node 0 free: 2546 MB node 1 cpus: 1 node 1 size: 2991 MB node 1 free: 2144 MB node distances: node 0 1 0: 10 30 1: 30 10 dmesg output for PV guest: [0.00] NUMA: Initialized distance table, cnt=2 [0.00] NUMA: Node 1 [mem 0xbb80-0xce68efff] + [mem 0x1-0x1a8970fff] - [mem 0xbb80-0x1a8970fff] [0.00] NODE_DATA(0) allocated [mem 0xbb7fc000-0xbb7f] [0.00] NODE_DATA(1) allocated [mem 0x1a8969000-0x1a896cfff] numactl output for PV guest: available: 2 nodes (0-1) node 0 cpus: 0 node 0 size: 2944 MB node 0 free: 2917 MB node 1 cpus: 1 node 1 size: 2934 MB node 1 free: 2904 MB node distances: node 0 1 0: 10 30 1: 30 10 And for a HVM guest on a real NUMA-capable machine: (XEN) Memory location of each domain: (XEN) Domain 0 (total: 262144): (XEN) Node 0: 245758 (XEN) Node 1: 16386 (XEN) Domain 2 (total: 2097226): (XEN) Node 0: 1046335 (XEN) Node 1: 1050891 (XEN) 2 vnodes, 4 vcpus (XEN)vnode 0 - pnode 0 (XEN) 3840 MB: 0x0 - 0xf000 (XEN) 256 MB: 0x1 - 0x11000 (XEN) vcpus: 0 1 (XEN)vnode 1 - pnode 1 (XEN) 4096 MB: 0x11000 - 0x21000 (XEN) vcpus: 2 3 Wei. [0] 2014173606.gc21...@zion.uk.xensource.com Wei Liu (19): xen: dump vNUMA information with debug key u xen: make two memory hypercalls vNUMA-aware libxc: allocate memory with vNUMA information for PV guest libxl: add emacs local variables in libxl_{x86,arm}.c libxl: introduce vNUMA types libxl: add vmemrange to libxl__domain_build_state libxl: introduce libxl__vnuma_config_check libxl: x86: factor out e820_host_sanitize libxl: functions to build vmemranges for PV guest libxl: build, check and pass vNUMA info to Xen for PV guest hvmloader: add new fields for vNUMA information hvmloader: construct SRAT hvmloader: construct SLIT hvmloader: disallow memory relocation when vNUMA is
[Xen-devel] [PATCH v4] Fixes for PCI backend for 3.19
Hey, The last time I posted these patches: https://lkml.org/lkml/2014/7/8/533 was so long ago that I don't even remember most comments. The only one that stuck in mind was David's recommendation to add a new PCI API call and use that. See patch #6 and #7. The original posting (v3) had an extra patch that would do slot and bus reset using do_flr SysFS attribute. I will revisit that once I am done with this patchset. I also seem to have had in my a bunch of 'SoB' from David - which makes no sense - unless I pulled from his tree. Anyhow wherewere I saw them I removed them. Please take a look and comment. If I missed some comment from months ago hopefully the new version has clarified them. drivers/pci/pci.c | 5 +-- drivers/xen/xen-pciback/passthrough.c | 14 ++-- drivers/xen/xen-pciback/pci_stub.c| 60 ++- drivers/xen/xen-pciback/pciback.h | 7 ++-- drivers/xen/xen-pciback/vpci.c| 14 ++-- drivers/xen/xen-pciback/xenbus.c | 4 +-- include/linux/device.h| 5 +++ include/linux/pci.h | 2 ++ 8 files changed, 76 insertions(+), 35 deletions(-) Konrad Rzeszutek Wilk (7): xen/pciback: Don't deadlock when unbinding. driver core: Provide an wrapper around the mutex to do lockdep warnings xen/pciback: Include the domain id if removing the device whilst still in use xen/pciback: Print out the domain owning the device. xen/pciback: Remove tons of dereferences PCI: Expose pci_load_saved_state for public consumption. xen/pciback: Restore configuration space when detaching from a guest. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 6/7] PCI: Expose pci_load_saved_state for public consumption.
We have the pci_load_and_free_saved_state, and pci_store_saved_state but are missing the functionality to just load the state multiple times in the PCI device without having to free/save the state. This patch makes it possible to use this function. CC: Bjorn Helgaas bhelg...@google.com Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- drivers/pci/pci.c | 5 +++-- include/linux/pci.h | 2 ++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 625a4ac..f00a9d6 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1142,8 +1142,8 @@ EXPORT_SYMBOL_GPL(pci_store_saved_state); * @dev: PCI device that we're dealing with * @state: Saved state returned from pci_store_saved_state() */ -static int pci_load_saved_state(struct pci_dev *dev, - struct pci_saved_state *state) +int pci_load_saved_state(struct pci_dev *dev, +struct pci_saved_state *state) { struct pci_cap_saved_data *cap; @@ -1171,6 +1171,7 @@ static int pci_load_saved_state(struct pci_dev *dev, dev-state_saved = true; return 0; } +EXPORT_SYMBOL_GPL(pci_load_saved_state); /** * pci_load_and_free_saved_state - Reload the save state pointed to by state, diff --git a/include/linux/pci.h b/include/linux/pci.h index 5be8db4..08088cb1 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1003,6 +1003,8 @@ void __iomem __must_check *pci_platform_rom(struct pci_dev *pdev, size_t *size); int pci_save_state(struct pci_dev *dev); void pci_restore_state(struct pci_dev *dev); struct pci_saved_state *pci_store_saved_state(struct pci_dev *dev); +int pci_load_saved_state(struct pci_dev *dev, +struct pci_saved_state *state); int pci_load_and_free_saved_state(struct pci_dev *dev, struct pci_saved_state **state); struct pci_cap_saved_state *pci_find_saved_cap(struct pci_dev *dev, char cap); -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 2/7] driver core: Provide an wrapper around the mutex to do lockdep warnings
Instead of open-coding it in drivers that want to double check that their functions are indeed holding the device lock. Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Suggested-by: David Vrabel david.vra...@citrix.com Acked-by: Greg Kroah-Hartman gre...@linuxfoundation.org --- drivers/xen/xen-pciback/pci_stub.c | 2 +- include/linux/device.h | 5 + 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 9cbe1a3..8b77089 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -278,7 +278,7 @@ void pcistub_put_pci_dev(struct pci_dev *dev) /* Cleanup our device * (so it's ready for the next domain) */ - lockdep_assert_held(dev-dev.mutex); + device_lock_assert(dev-dev); __pci_reset_function_locked(dev); pci_restore_state(dev); diff --git a/include/linux/device.h b/include/linux/device.h index ce1f2160..41d6a75 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -911,6 +911,11 @@ static inline void device_unlock(struct device *dev) mutex_unlock(dev-mutex); } +static inline void device_lock_assert(struct device *dev) +{ + lockdep_assert_held(dev-mutex); +} + void driver_init(void); /* -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 5/7] xen/pciback: Remove tons of dereferences
A little cleanup. No functional difference. Reviewed-by: Boris Ostrovsky boris.ostrov...@oracle.com Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- drivers/xen/xen-pciback/pci_stub.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index e5ff09d..843a2ba 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -631,10 +631,12 @@ static pci_ers_result_t common_process(struct pcistub_device *psdev, { pci_ers_result_t res = result; struct xen_pcie_aer_op *aer_op; + struct xen_pcibk_device *pdev = psdev-pdev; + struct xen_pci_sharedinfo *sh_info = pdev-sh_info; int ret; /*with PV AER drivers*/ - aer_op = (psdev-pdev-sh_info-aer_op); + aer_op = (sh_info-aer_op); aer_op-cmd = aer_cmd ; /*useful for error_detected callback*/ aer_op-err = state; @@ -655,36 +657,36 @@ static pci_ers_result_t common_process(struct pcistub_device *psdev, * this flag to judge whether we need to check pci-front give aer * service ack signal */ - set_bit(_PCIB_op_pending, (unsigned long *)psdev-pdev-flags); + set_bit(_PCIB_op_pending, (unsigned long *)pdev-flags); /*It is possible that a pcifront conf_read_write ops request invokes * the callback which cause the spurious execution of wake_up. * Yet it is harmless and better than a spinlock here */ set_bit(_XEN_PCIB_active, - (unsigned long *)psdev-pdev-sh_info-flags); + (unsigned long *)sh_info-flags); wmb(); - notify_remote_via_irq(psdev-pdev-evtchn_irq); + notify_remote_via_irq(pdev-evtchn_irq); ret = wait_event_timeout(xen_pcibk_aer_wait_queue, !(test_bit(_XEN_PCIB_active, (unsigned long *) -psdev-pdev-sh_info-flags)), 300*HZ); +sh_info-flags)), 300*HZ); if (!ret) { if (test_bit(_XEN_PCIB_active, - (unsigned long *)psdev-pdev-sh_info-flags)) { + (unsigned long *)sh_info-flags)) { dev_err(psdev-dev-dev, pcifront aer process not responding!\n); clear_bit(_XEN_PCIB_active, - (unsigned long *)psdev-pdev-sh_info-flags); + (unsigned long *)sh_info-flags); aer_op-err = PCI_ERS_RESULT_NONE; return res; } } - clear_bit(_PCIB_op_pending, (unsigned long *)psdev-pdev-flags); + clear_bit(_PCIB_op_pending, (unsigned long *)pdev-flags); if (test_bit(_XEN_PCIF_active, - (unsigned long *)psdev-pdev-sh_info-flags)) { + (unsigned long *)sh_info-flags)) { dev_dbg(psdev-dev-dev, schedule pci_conf service in DRV_NAME \n); xen_pcibk_test_and_schedule_op(psdev-pdev); -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 1/7] xen/pciback: Don't deadlock when unbinding.
As commit 0a9fd0152929db372ff61b0d6c280fdd34ae8bdb 'xen/pciback: Document the entry points for 'pcistub_put_pci_dev'' explained there are four entry points in this function. Two of them are when the user fiddles in the SysFS to unbind a device which might be in use by a guest or not. Both 'unbind' states will cause a deadlock as the the PCI lock has already been taken, which then pci_device_reset tries to take. We can simplify this by requiring that all callers of pcistub_put_pci_dev MUST hold the device lock. And then we can just call the lockless version of pci_device_reset. To make it even simpler we will modify xen_pcibk_release_pci_dev to quality whether it should take a lock or not - as it ends up calling xen_pcibk_release_pci_dev and needs to hold the lock. Reviewed-by: Boris Ostrovsky boris.ostrov...@oracle.com Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- drivers/xen/xen-pciback/passthrough.c | 14 +++--- drivers/xen/xen-pciback/pci_stub.c| 12 ++-- drivers/xen/xen-pciback/pciback.h | 7 --- drivers/xen/xen-pciback/vpci.c| 14 +++--- drivers/xen/xen-pciback/xenbus.c | 2 +- 5 files changed, 33 insertions(+), 16 deletions(-) diff --git a/drivers/xen/xen-pciback/passthrough.c b/drivers/xen/xen-pciback/passthrough.c index 828dddc..f16a30e 100644 --- a/drivers/xen/xen-pciback/passthrough.c +++ b/drivers/xen/xen-pciback/passthrough.c @@ -69,7 +69,7 @@ static int __xen_pcibk_add_pci_dev(struct xen_pcibk_device *pdev, } static void __xen_pcibk_release_pci_dev(struct xen_pcibk_device *pdev, - struct pci_dev *dev) + struct pci_dev *dev, bool lock) { struct passthrough_dev_data *dev_data = pdev-pci_dev_data; struct pci_dev_entry *dev_entry, *t; @@ -87,8 +87,13 @@ static void __xen_pcibk_release_pci_dev(struct xen_pcibk_device *pdev, mutex_unlock(dev_data-lock); - if (found_dev) + if (found_dev) { + if (lock) + device_lock(found_dev-dev); pcistub_put_pci_dev(found_dev); + if (lock) + device_unlock(found_dev-dev); + } } static int __xen_pcibk_init_devices(struct xen_pcibk_device *pdev) @@ -156,8 +161,11 @@ static void __xen_pcibk_release_devices(struct xen_pcibk_device *pdev) struct pci_dev_entry *dev_entry, *t; list_for_each_entry_safe(dev_entry, t, dev_data-dev_list, list) { + struct pci_dev *dev = dev_entry-dev; list_del(dev_entry-list); - pcistub_put_pci_dev(dev_entry-dev); + device_lock(dev-dev); + pcistub_put_pci_dev(dev); + device_unlock(dev-dev); kfree(dev_entry); } diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 017069a..9cbe1a3 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -250,6 +250,8 @@ struct pci_dev *pcistub_get_pci_dev(struct xen_pcibk_device *pdev, * - 'echo BDF unbind' with a guest still using it. See pcistub_remove * * As such we have to be careful. + * + * To make this easier, the caller has to hold the device lock. */ void pcistub_put_pci_dev(struct pci_dev *dev) { @@ -276,11 +278,8 @@ void pcistub_put_pci_dev(struct pci_dev *dev) /* Cleanup our device * (so it's ready for the next domain) */ - - /* This is OK - we are running from workqueue context -* and want to inhibit the user from fiddling with 'reset' -*/ - pci_reset_function(dev); + lockdep_assert_held(dev-dev.mutex); + __pci_reset_function_locked(dev); pci_restore_state(dev); /* This disables the device. */ @@ -567,7 +566,8 @@ static void pcistub_remove(struct pci_dev *dev) /* N.B. This ends up calling pcistub_put_pci_dev which ends up * doing the FLR. */ xen_pcibk_release_pci_dev(found_psdev-pdev, - found_psdev-dev); + found_psdev-dev, + false /* caller holds the lock. */); } spin_lock_irqsave(pcistub_devices_lock, flags); diff --git a/drivers/xen/xen-pciback/pciback.h b/drivers/xen/xen-pciback/pciback.h index f72af87..58e38d5 100644 --- a/drivers/xen/xen-pciback/pciback.h +++ b/drivers/xen/xen-pciback/pciback.h @@ -99,7 +99,8 @@ struct xen_pcibk_backend { unsigned int *domain, unsigned int *bus, unsigned int *devfn); int (*publish)(struct xen_pcibk_device *pdev, publish_pci_root_cb cb); - void (*release)(struct xen_pcibk_device *pdev, struct pci_dev *dev); + void (*release)(struct xen_pcibk_device *pdev, struct pci_dev
[Xen-devel] [PATCH v4 7/7] xen/pciback: Restore configuration space when detaching from a guest.
The commit xen/pciback: Don't deadlock when unbinding. was using the version of pci_reset_function which would lock the device lock. That is no good as we can dead-lock. As such we swapped to using the lock-less version and requiring that the callers of 'pcistub_put_pci_dev' take the device lock. And as such this bug got exposed. Using the lock-less version is OK, except that we tried to use 'pci_restore_state' after the lock-less version of __pci_reset_function_locked - which won't work as 'state_saved' is set to false. Said 'state_saved' is a toggle boolean that is to be used by the sequence of a) pci_save_state/pci_restore_state or b) pci_load_and_free_saved_state/pci_restore_state. We don't want to use a) as the guest might have messed up the PCI configuration space and we want it to revert to the state when the PCI device was binded to us. Therefore we pick b) to restore the configuration space. We restore from our 'golden' version of PCI configuration space, when an: - Device is unbinded from pciback - Device is detached from a guest. Reported-by: Sander Eikelenboom li...@eikelenboom.it Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- drivers/xen/xen-pciback/pci_stub.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 843a2ba..eb8b58e 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -105,7 +105,7 @@ static void pcistub_device_release(struct kref *kref) */ __pci_reset_function_locked(dev); if (pci_load_and_free_saved_state(dev, dev_data-pci_saved_state)) - dev_dbg(dev-dev, Could not reload PCI state\n); + dev_info(dev-dev, Could not reload PCI state\n); else pci_restore_state(dev); @@ -257,6 +257,8 @@ void pcistub_put_pci_dev(struct pci_dev *dev) { struct pcistub_device *psdev, *found_psdev = NULL; unsigned long flags; + struct xen_pcibk_dev_data *dev_data; + int ret; spin_lock_irqsave(pcistub_devices_lock, flags); @@ -279,9 +281,19 @@ void pcistub_put_pci_dev(struct pci_dev *dev) * (so it's ready for the next domain) */ device_lock_assert(dev-dev); - __pci_reset_function_locked(dev); - pci_restore_state(dev); - + dev_data = pci_get_drvdata(dev); + ret = pci_load_saved_state(dev, dev_data-pci_saved_state); + if (ret 0) + dev_warn(dev-dev, Could not reload PCI state\n); + else { + __pci_reset_function_locked(dev); + /* +* The usual sequence is pci_save_state pci_restore_state +* but the guest might have messed the configuration space up. +* Use the initial version (when device was bound to us). +*/ + pci_restore_state(dev); + } /* This disables the device. */ xen_pcibk_reset_device(dev); -- 1.9.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] frequent lockups in 3.18rc4
On Fri, Nov 21, 2014 at 08:51:43PM +0100, Thomas Gleixner wrote: On Fri, 21 Nov 2014, Linus Torvalds wrote: Here's the simplified end result. Again, this is TOTALLY UNTESTED. I compiled it and verified that the code generation looks like what I'd have expected, but that's literally it. static noinline int vmalloc_fault(unsigned long address) { pgd_t *pgd_dst; pgdval_t pgd_entry; unsigned index = pgd_index(address); if (index KERNEL_PGD_BOUNDARY) return -1; pgd_entry = init_mm.pgd[index].pgd; if (!pgd_entry) return -1; pgd_dst = __va(PAGE_MASK read_cr3()); pgd_dst += index; if (pgd_dst-pgd) return -1; ACCESS_ONCE(pgd_dst-pgd) = pgd_entry; This will break paravirt. set_pgd/set_pmd are paravirt functions. But I'm fine with breaking it, then you just need to change CONFIG_PARAVIRT to 'def_bool n' That is not very nice. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Is: Fixes for xl pci-attach for Xen 4.5 confirmed to fix by Intel.Was:Re: [TestDay] VMX test report for Xen 4.5.0-rc1
On Mon, Nov 24, 2014 at 07:43:49AM +, Hu, Robert wrote: -Original Message- From: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com] Sent: Wednesday, November 12, 2014 8:39 PM To: Hu, Robert; xen-devel@lists.xen.org Cc: jbeul...@suse.com Subject: Re: [Xen-devel] [TestDay] VMX test report for Xen 4.5.0-rc1 On 11/12/2014 01:58 AM, Hu, Robert wrote: 2. Failed to hotplug a VT-d device with XEN4.5-RC1 http://bugzilla-archived.xenproject.org/bugzilla/show_bug.cgi?id=1894 This should be addressed by these two: http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg00875.html http://lists.xenproject.org/archives/html/xen-devel/2014-11/msg00874.html Tried, these patches works. When will they get in? CC-ed Ian's so they are aware of the independent testing done by Intel. Thank you! -boris ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Configure block device IO rate
On Mon, Nov 24, 2014 at 12:07:39PM +, George Dunlap wrote: On Mon, Nov 24, 2014 at 3:54 AM, aragrawal aragra...@cs.wisc.edu wrote: Hi I am interested in getting the block I/O rate fixed for a Virtual Machine. However, it seems that there are no such options to control the block I/O rate via the VM configuration file. Going through the net, I see suggestions to configure the xen_blkback driver using ionice to get that done. I would like to know if there are any plans to include any such configuration option. Also, are there any particular reason (apart from the fact that there are other ways to get this done) for the absence of such configuration option? I'm pretty sure it's just because nobody had asked for it yet. We can put it on our list of things to think about doing / projects for people new to the community. We also accept patches. :-) Patches for this were in the past posted - it just that nobody explained to the maintainer (me) why the dm-* or io-nice would not do the same job. I sadly don't remember from whom they were or such. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Configure block device IO rate
On Mon, Nov 24, 2014 at 03:11:20PM +, Ian Campbell wrote: On Mon, 2014-11-24 at 10:06 -0500, Konrad Rzeszutek Wilk wrote: On Mon, Nov 24, 2014 at 12:07:39PM +, George Dunlap wrote: On Mon, Nov 24, 2014 at 3:54 AM, aragrawal aragra...@cs.wisc.edu wrote: Hi I am interested in getting the block I/O rate fixed for a Virtual Machine. However, it seems that there are no such options to control the block I/O rate via the VM configuration file. Going through the net, I see suggestions to configure the xen_blkback driver using ionice to get that done. I would like to know if there are any plans to include any such configuration option. Also, are there any particular reason (apart from the fact that there are other ways to get this done) for the absence of such configuration option? I'm pretty sure it's just because nobody had asked for it yet. We can put it on our list of things to think about doing / projects for people new to the community. We also accept patches. :-) Patches for this were in the past posted - it just that nobody explained to the maintainer (me) why the dm-* or io-nice would not do the same job. AIUI the question here is why the toolstack doesn't integrate a guest cfg file setting to configure dm-* and/or io-nice, rather than why blkback doesn't duplicate their functionality. Ah, that is a different question. That certainly could be done by somebody who has the time. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86/cpuidle: don't count C1 multiple times
On Mon, Nov 24, 2014 at 11:47:28AM +, Jan Beulich wrote: Commit 4ca6f9f0 (x86/cpuidle: publish new states only after fully initializing them) resulted in the state counter to be incremented for C1 despite that using a fixed table entry (and the statically initialized counter value already accounting for it and C0). Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Reported-by: Steve Freitas sfl...@ihonk.com Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com (thought it would be good to get from Steve an confirmation that this fixes it - which I believe is 99% the case). Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -1015,7 +1015,7 @@ static void set_cx( cx-target_residency = cx-latency * latency_factor; smp_wmb(); -acpi_power-count++; +acpi_power-count += (cx-type != ACPI_STATE_C1); if ( cx-type == ACPI_STATE_C1 || cx-type == ACPI_STATE_C2 ) acpi_power-safe_state = cx; } x86/cpuidle: don't count C1 multiple times Commit 4ca6f9f0 (x86/cpuidle: publish new states only after fully initializing them) resulted in the state counter to be incremented for C1 despite that using a fixed table entry (and the statically initialized counter value already accounting for it and C0). Signed-off-by: Jan Beulich jbeul...@suse.com --- a/xen/arch/x86/acpi/cpu_idle.c +++ b/xen/arch/x86/acpi/cpu_idle.c @@ -1015,7 +1015,7 @@ static void set_cx( cx-target_residency = cx-latency * latency_factor; smp_wmb(); -acpi_power-count++; +acpi_power-count += (cx-type != ACPI_STATE_C1); if ( cx-type == ACPI_STATE_C1 || cx-type == ACPI_STATE_C2 ) acpi_power-safe_state = cx; } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] virtio leaks cpu mappings, was: qemu crash with virtio on Xen domUs (backtrace included)
On Mon, Nov 24, 2014 at 06:44:45PM +, Stefano Stabellini wrote: On Mon, 24 Nov 2014, Stefano Stabellini wrote: On Mon, 24 Nov 2014, Stefano Stabellini wrote: CC'ing Paolo. Wen, thanks for the logs. I investigated a little bit and it seems to me that the bug occurs when QEMU tries to unmap only a portion of a memory region previously mapped. That doesn't work with xen-mapcache. See these logs for example: DEBUG address_space_map phys_addr=78ed8b44 vaddr=7fab50afbb68 len=0xa DEBUG address_space_unmap vaddr=7fab50afbb68 len=0x6 Sorry the logs don't quite match, it was supposed to be: DEBUG address_space_map phys_addr=78ed8b44 vaddr=7fab50afbb64 len=0xa DEBUG address_space_unmap vaddr=7fab50afbb68 len=0x6 It looks like the problem is caused by iov_discard_front, called by virtio_net_handle_ctrl. By changing iov_base after the sg has already been mapped (cpu_physical_memory_map), it causes a leak in the mapping because the corresponding cpu_physical_memory_unmap will only unmap a portion of the original sg. On Xen the problem is worse because xen-mapcache aborts. Didn't um Andy post patches for ths: http://lists.xen.org/archives/html/xen-devel/2014-09/msg02864.html diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 2ac6ce5..b2b5c2d 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -775,7 +775,7 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) struct iovec *iov; unsigned int iov_cnt; -while (virtqueue_pop(vq, elem)) { +while (virtqueue_pop_nomap(vq, elem)) { if (iov_size(elem.in_sg, elem.in_num) sizeof(status) || iov_size(elem.out_sg, elem.out_num) sizeof(ctrl)) { error_report(virtio-net ctrl missing headers); @@ -784,8 +784,12 @@ static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq) iov = elem.out_sg; iov_cnt = elem.out_num; -s = iov_to_buf(iov, iov_cnt, 0, ctrl, sizeof(ctrl)); iov_discard_front(iov, iov_cnt, sizeof(ctrl)); + +virtqueue_map_sg(elem.in_sg, elem.in_addr, elem.in_num, 1); +virtqueue_map_sg(elem.out_sg, elem.out_addr, elem.out_num, 0); + +s = iov_to_buf(iov, iov_cnt, 0, ctrl, sizeof(ctrl)); if (s != sizeof(ctrl)) { status = VIRTIO_NET_ERR; } else if (ctrl.class == VIRTIO_NET_CTRL_RX) { diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 3e4b70c..6a4bd3a 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -446,7 +446,7 @@ void virtqueue_map_sg(struct iovec *sg, hwaddr *addr, } } -int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) +int virtqueue_pop_nomap(VirtQueue *vq, VirtQueueElement *elem) { unsigned int i, head, max; hwaddr desc_pa = vq-vring.desc; @@ -505,9 +505,6 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) } } while ((i = virtqueue_next_desc(desc_pa, i, max)) != max); -/* Now map what we have collected */ -virtqueue_map_sg(elem-in_sg, elem-in_addr, elem-in_num, 1); -virtqueue_map_sg(elem-out_sg, elem-out_addr, elem-out_num, 0); elem-index = head; @@ -517,6 +514,16 @@ int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) return elem-in_num + elem-out_num; } +int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem) +{ +int rc = virtqueue_pop_nomap(vq, elem); +if (rc 0) { +virtqueue_map_sg(elem-in_sg, elem-in_addr, elem-in_num, 1); +virtqueue_map_sg(elem-out_sg, elem-out_addr, elem-out_num, 0); +} +return rc; +} + /* virtio device */ static void virtio_notify_vector(VirtIODevice *vdev, uint16_t vector) { diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index 3e54e90..40a3977 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -174,6 +174,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement *elem, void virtqueue_map_sg(struct iovec *sg, hwaddr *addr, size_t num_sg, int is_write); int virtqueue_pop(VirtQueue *vq, VirtQueueElement *elem); +int virtqueue_pop_nomap(VirtQueue *vq, VirtQueueElement *elem); int virtqueue_avail_bytes(VirtQueue *vq, unsigned int in_bytes, unsigned int out_bytes); void virtqueue_get_avail_bytes(VirtQueue *vq, unsigned int *in_bytes, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] vNUMA: rename interface structures
On Tue, Nov 25, 2014 at 02:11:39PM +, Jan Beulich wrote: On 25.11.14 at 13:36, jbeul...@suse.com wrote: No-one (including me) paid attention during review that these structures don't adhere to the naming requirements of the public interface: Consistently use xen_ prefixes at least for all new additions. Signed-off-by: Jan Beulich jbeul...@suse.com Sorry, again forgot to Cc you (for 4.5): No functional change, but avoiding a later (incompatible) interface change. Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Jan --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1264,7 +1264,7 @@ int xc_domain_setvnuma(xc_interface *xch uint32_t nr_vnodes, uint32_t nr_regions, uint32_t nr_vcpus, -vmemrange_t *vmemrange, +xen_vmemrange_t *vmemrange, unsigned int *vdistance, unsigned int *vcpu_to_vnode, unsigned int *vnode_to_pnode); --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -2171,7 +2171,7 @@ int xc_domain_setvnuma(xc_interface *xch uint32_t nr_vnodes, uint32_t nr_vmemranges, uint32_t nr_vcpus, - vmemrange_t *vmemrange, + xen_vmemrange_t *vmemrange, unsigned int *vdistance, unsigned int *vcpu_to_vnode, unsigned int *vnode_to_pnode) --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -345,7 +345,7 @@ static struct vnuma_info *vnuma_alloc(un vnuma-vdistance = xmalloc_array(unsigned int, nr_vnodes * nr_vnodes); vnuma-vcpu_to_vnode = xmalloc_array(unsigned int, nr_vcpus); vnuma-vnode_to_pnode = xmalloc_array(unsigned int, nr_vnodes); -vnuma-vmemrange = xmalloc_array(vmemrange_t, nr_ranges); +vnuma-vmemrange = xmalloc_array(xen_vmemrange_t, nr_ranges); if ( vnuma-vdistance == NULL || vnuma-vmemrange == NULL || vnuma-vcpu_to_vnode == NULL || vnuma-vnode_to_pnode == NULL ) --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -972,7 +972,7 @@ long do_memory_op(unsigned long cmd, XEN case XENMEM_get_vnumainfo: { -struct vnuma_topology_info topology; +struct xen_vnuma_topology_info topology; struct domain *d; unsigned int dom_vnodes, dom_vranges, dom_vcpus; struct vnuma_info tmp; @@ -1033,7 +1033,7 @@ long do_memory_op(unsigned long cmd, XEN read_unlock(d-vnuma_rwlock); tmp.vdistance = xmalloc_array(unsigned int, dom_vnodes * dom_vnodes); -tmp.vmemrange = xmalloc_array(vmemrange_t, dom_vranges); +tmp.vmemrange = xmalloc_array(xen_vmemrange_t, dom_vranges); tmp.vcpu_to_vnode = xmalloc_array(unsigned int, dom_vcpus); if ( tmp.vdistance == NULL || --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -980,7 +980,7 @@ struct xen_domctl_vnuma { /* * memory rages for each vNUMA node */ -XEN_GUEST_HANDLE_64(vmemrange_t) vmemrange; +XEN_GUEST_HANDLE_64(xen_vmemrange_t) vmemrange; }; typedef struct xen_domctl_vnuma xen_domctl_vnuma_t; DEFINE_XEN_GUEST_HANDLE(xen_domctl_vnuma_t); --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -530,14 +530,13 @@ DEFINE_XEN_GUEST_HANDLE(xen_mem_sharing_ #define XENMEM_get_vnumainfo26 /* vNUMA node memory ranges */ -struct vmemrange { +struct xen_vmemrange { uint64_t start, end; unsigned int flags; unsigned int nid; }; - -typedef struct vmemrange vmemrange_t; -DEFINE_XEN_GUEST_HANDLE(vmemrange_t); +typedef struct xen_vmemrange xen_vmemrange_t; +DEFINE_XEN_GUEST_HANDLE(xen_vmemrange_t); /* * vNUMA topology specifies vNUMA node number, distance table, @@ -548,7 +547,7 @@ DEFINE_XEN_GUEST_HANDLE(vmemrange_t); * copied back to guest. Domain returns expected values of nr_vnodes, * nr_vmemranges and nr_vcpus to guest if the values where incorrect. */ -struct vnuma_topology_info { +struct xen_vnuma_topology_info { /* IN */ domid_t domid; uint16_t pad; @@ -566,12 +565,12 @@ struct vnuma_topology_info { uint64_t pad; } vcpu_to_vnode; union { -XEN_GUEST_HANDLE(vmemrange_t) h; +XEN_GUEST_HANDLE(xen_vmemrange_t) h; uint64_t pad; } vmemrange; }; -typedef struct vnuma_topology_info vnuma_topology_info_t; -DEFINE_XEN_GUEST_HANDLE(vnuma_topology_info_t); +typedef struct xen_vnuma_topology_info xen_vnuma_topology_info_t; +DEFINE_XEN_GUEST_HANDLE(xen_vnuma_topology_info_t); /* Next available subop number is 27
Re: [Xen-devel] [PATCH 0/2 V3] fix rename: xenstore not fully updated
On Tue, Nov 25, 2014 at 02:13:01PM +, Ian Campbell wrote: On Fri, 2014-11-21 at 12:01 -0500, Konrad Rzeszutek Wilk wrote: On Thu, Nov 20, 2014 at 03:28:30PM +, Ian Campbell wrote: On Wed, 2014-11-19 at 16:25 -0500, Konrad Rzeszutek Wilk wrote: On Wed, Nov 19, 2014 at 11:26:32AM +, Ian Jackson wrote: Hi Konrad, I have another release ack request: Chunyan Liu writes ([PATCH 0/2 V3] fix rename: xenstore not fully updated): Currently libxl__domain_rename only update /local/domain/domid/name, still some places in xenstore are not updated, including: /vm/uuid/name and /local/domain/0/backend/device/domid/.../domain. This patch series updates /vm/uuid/name in xenstore, This ([PATCH 2/2 V3] fix rename: xenstore not fully updated) is a bugfix which I think should go into Xen 4.5. The risk WITHOUT this patch is that there are out-of-tree tools which look here for the domain name and will get confused after it is renamed. When was this introduced? Did it exist with Xend? Based on: git grep domain\ RELEASE-4.4.0 tools/python/ git grep domain\' RELEASE-4.4.0 tools/python/ it doesn't appear so, but someone with a xend install would be needed to confirm for sure. Given that this has always been wrong for a libxl domain after migration it seems likely to me that noone is looking at this field. And this is not a regression though. What I am understanding is that we advertise to out-side toolstack users for a couple of releases something which is misleading and wrong. ...and also basically useless, there is really no reason to be looking for the domain's name under a subset of the backend nodes (only vkb, vfb and console have this key at all). None of the helper function which we provide do this. Also these nodes are not advertised as supported either via docs/misc/xenstore-paths.markdown or xen/include/public/io/*.h. My fear is that there such toolstack users out there who will get their pitchforks out when this shows up. But perhaps there is a way to mitigate this. Is there another way (and can it be in the commit description) to get the proper domain name? I presume it is just looking at /local/domain/%d/name ? As such could this be added: The proper way to get the domain name is to get it from /local/domain/%d/name instead from this field. The proper way is to use libxl_domid_to_name, not to go scrobbling around in xenstore. We could say this (or both) in the commit message though if it would help to reassure you. That (both) would most certainly reassure me. Thank you! Either way I think it really would be best to take this fix rather than worrying overly about the infinitesimal possibility that someone might be using this key. Right, so with the reassurance text added on: Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-xen-4.5] libxl: Allow copying smaller bitmap into a larger one
On Tue, Nov 25, 2014 at 09:57:54AM -0500, Boris Ostrovsky wrote: On 11/25/2014 06:41 AM, Dario Faggioli wrote: On Tue, 2014-11-25 at 11:15 +, Wei Liu wrote: And here it is. Boris, can you give it a shot? ---8--- From 77531e31d239887b9f36c03e434300bc30683092 Mon Sep 17 00:00:00 2001 From: Wei Liu wei.l...@citrix.com Date: Tue, 25 Nov 2014 10:59:47 + Subject: [PATCH] libxl: allow copying between bitmaps of different sizes When parsing bitmap objects JSON parser will create libxl_bitmap map of the smallest size needed. This can cause problems when saved image file specifies CPU affinity. For example, if 'vcpu_hard_affinity' in the saved image has only the first CPU specified, just a single byte will be allocated and libxl_bitmap-size will be set to 1. This will result in assertion in libxl_set_vcpuaffinity()-libxl_bitmap_copy() since the destination bitmap is created for maximum number of CPUs. We could allocate that bitmap of the same size as the source, however, it is later passed to xc_vcpu_setaffinity() which expects it to be sized to the max number of CPUs To fix this issue, introduce an internal function to allowing copying between bitmaps of different sizes. Note that this function is only used in libxl_set_vcpuaffinity at the moment. Though NUMA placement logic invoke libxl_bitmap_copy as well there's no need to replace those invocations. NUMA placement logic comes into effect when no vcpu / node pinning is provided, so it always operates on bitmap of the same sizes (that is, size of maximum number of cpus /nodes). Reported-by: Boris Ostrovsky boris.ostrov...@oracle.com Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Dario Faggioli dario.faggi...@citrix.com If this end up being the approach, it can have the following: Reviewed-by: Dario Faggioli dario.faggi...@citrix.com Tested-by: Boris Ostrovsky boris.ostrov...@oracle.com Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Thank you! ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/2 V3] fix rename: xenstore not fully updated
On Tue, Nov 25, 2014 at 03:58:33PM +, Ian Campbell wrote: On Tue, 2014-11-25 at 10:51 -0500, Konrad Rzeszutek Wilk wrote: Right, so with the reassurance text added on: Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Thanks. Chunyan, I propose to commit adding the following to the commit text of [PATCH 1/2 V3] remove domain field in xenstore backend dir: The correct way to obtain a domain's name is via libxl_domid_to_name(), or by reading from /local/domain/$DOMID/name for toolstacks not using libxl. Does that sound OK to you? Yes. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] INSTALL: correct EXTRA_CFLAGS handling
On Tue, Nov 25, 2014 at 05:27:53PM +0100, Olaf Hering wrote: On Tue, Nov 25, Konrad Rzeszutek Wilk wrote: On Tue, Nov 25, 2014 at 05:04:09PM +0100, Olaf Hering wrote: +Using additional CFLAGS to build tools running in dom0 is required when Why the mention of 'buld tools running in dom0'? It sounds like it is required to use dom0 to build tools? Does it really read like dom0 is required?! But I can change it as you It did to me but I am not an native English speaker. Perhaps what you meant was 'build tools which will run in dom0' ? suggest. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-xen-4.5] x86/pvh/vpmu: Disable VPMU for PVH guests
On Tue, Nov 25, 2014 at 04:39:54PM +, Jan Beulich wrote: On 25.11.14 at 17:19, boris.ostrov...@oracle.com wrote: On 11/25/2014 09:55 AM, Jan Beulich wrote: Regardless, do you think that disabling VPMU for PVH is worth anyway? That depends on what (bad) consequences not doing so has. I haven't seen anything (besides VAPIC accesses) but I think it would be prudent to prevent any VPMU activity from happening. I can see, for example MSRs and APIC vector being written. All of which look benign on the first sight but who knows... Yeah, it's not really a problem to put it in (if Konrad agrees; remember that PVH is still experimental, and hence fixing bugs caused only by it may be out of scope at this point - in any event I think that if your patch is to go in, mine should too). The beaty of experimental is that we can add it later in the cycle as at worst they will regress something that is unbaked already. From that perspective the bar to put fixes for 'experimental' is lower than normal code. The part that I am worried about is the common paths and this potentially causing regressions on the those. But the potential for that is low that I am OK with these patches going in. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] (4.5-rc1) Problems using xl migrate
On Tue, Nov 25, 2014 at 01:03:34PM -0500, Daniel De Graaf wrote: On 11/25/2014 05:07 AM, George Dunlap wrote: On Mon, Nov 24, 2014 at 10:05 PM, Daniel De Graaf dgde...@tycho.nsa.gov wrote: I do. The error is (XEN) flask_domctl: Unknown op 72 Incidentally, Flask is running in permissive mode. Michael Young This means that the new domctl needs to be added to the switch statement in flask/hooks.c. This error is triggered in permissive mode because it is a code error rather than a policy error (which is what permissive mode is intended to debug). If that's the case, should we make that a BUG_ON()? Or at least an ASSERT() (which will only bug when compiled with debug=y), followed by allow if in permissive mode, and deny if in enforcing mode? Having it default deny, even in permissive mode, breaks the principle of least surprise, I think. :-) -George Either one of these will allow a guest to crash the hypervisor by requesting an undefined domctl, which is not really a good idea. Linux uses a flag in the security policy which defines if unknown permissions are allowed or denied; I will send a patch adding this to Xen's security server and using it instead of -EPERM in the default case of the switch statements. Thought I think that for the DEBUG case we want to still be boldly told about it so we can fix it. The patch adding this feature probably shouldn't be applied to 4.5, but I'll send it anyway. I will also send a separate patch adding the 2 domctls. -- Daniel De Graaf National Security Agency ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] xsm/flask: add two missing domctls
On Tue, Nov 25, 2014 at 06:19:05PM +, Andrew Cooper wrote: On 25/11/14 16:57, Daniel De Graaf wrote: Reported-by: Michael Young m.a.yo...@durham.ac.uk Signed-off-by: Daniel De Graaf dgde...@tycho.nsa.gov Reviewed-by: Andrew Cooper andrew.coop...@citrix.com CC'd Konrad, as this should be accepted into Xen-4.5. Without it, migration/suspend fails with -EPERM in the default case when XSM is compiled into Xen. Yup. Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Daniel: there are 4 hypercalls for getting/setting bits of PV VCPU state: XEN_DOMCTL_{get,set}vcpucontext XEN_DOMCTL_{get,set}_ext_vcpucontext XEN_DOMCTL_{get,set}vcpuextstate XEN_DOMCTL_{get,set}_vcpu_msrs I see no reason for these to have separate access vectors; you typically either need to use all of them, or none, but I presume it is too late to coalesce the vectors in a backwards compatible way? ~Andrew --- xen/xsm/flask/hooks.c | 2 ++ xen/xsm/flask/policy/access_vectors | 2 ++ 2 files changed, 4 insertions(+) diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c index 0ba2ce9..d48463f 100644 --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -672,9 +672,11 @@ static int flask_domctl(struct domain *d, int cmd) return current_has_perm(d, SECCLASS_HVM, HVM__CACHEATTR); case XEN_DOMCTL_set_ext_vcpucontext: +case XEN_DOMCTL_set_vcpu_msrs: return current_has_perm(d, SECCLASS_DOMAIN, DOMAIN__SETEXTVCPUCONTEXT); case XEN_DOMCTL_get_ext_vcpucontext: +case XEN_DOMCTL_get_vcpu_msrs: return current_has_perm(d, SECCLASS_DOMAIN, DOMAIN__GETEXTVCPUCONTEXT); case XEN_DOMCTL_setvcpuextstate: diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors index 1cd451e..1da9f63 100644 --- a/xen/xsm/flask/policy/access_vectors +++ b/xen/xsm/flask/policy/access_vectors @@ -151,8 +151,10 @@ class domain # XEN_DOMCTL_sendtrigger trigger # XEN_DOMCTL_get_ext_vcpucontext +# XEN_DOMCTL_set_vcpu_msrs getextvcpucontext # XEN_DOMCTL_set_ext_vcpucontext +# XEN_DOMCTL_get_vcpu_msrs setextvcpucontext # XEN_DOMCTL_getvcpuextstate getvcpuextstate ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] kdump with xen-unstable on efi machine
On Wed, Nov 26, 2014 at 03:01:51PM +0100, Juergen Gross wrote: On 11/26/2014 01:41 PM, Andrew Cooper wrote: On 26/11/14 12:15, Juergen Gross wrote: Hi, I tried to enable kdump on my test-machine with actual xen-unstable booting via EFI. The kdump kernel is not being loaded. I'm seeing the memory being reserved: (XEN) EFI RAM map: (XEN) - 000a (usable) (XEN) 0010 - 4bc0 (usable) (XEN) 4bc0 - 5bc0 (reserved) (XEN) 5bc0 - 5bfec000 (usable) (XEN) 5bfec000 - 5c00 (ACPI NVS) (XEN) 5c00 - 6a429000 (usable) (XEN) 6a429000 - 6a42c000 (reserved) (XEN) 6a42c000 - 6a7a2000 (usable) (XEN) 6a7a2000 - 6a7a8000 (reserved) (XEN) 6a7a8000 - 6a987000 (usable) (XEN) 6a987000 - 6a98d000 (reserved) (XEN) 6a98d000 - 6aa63000 (usable) (XEN) 6aa63000 - 6aa73000 (reserved) (XEN) 6aa73000 - 6ac6 (usable) (XEN) 6ac6 - 6ac61000 (reserved) (XEN) 6ac61000 - 6ac9b000 (ACPI data) (XEN) 6ac9b000 - 6acac000 (reserved) (XEN) 6acac000 - 6acad000 (usable) (XEN) 6acad000 - 6acae000 (reserved) (XEN) 6acae000 - 7189c000 (usable) (XEN) 7189c000 - 71946000 (reserved) (XEN) 71946000 - 72d76000 (ACPI NVS) (XEN) 72d76000 - 72db2000 (ACPI data) (XEN) 72db2000 - 72edc000 (usable) (XEN) 8000 - 9000 (reserved) (XEN) 0001 - 00208000 (usable) (XEN) Kdump: 256MB (262144kB) at 0x206dff4000 I'd expect this area being visible in the efi or e820 map presented to dom0, but I can't see anything: This is expected. The dom0 kernel now has nothing at all do with loading crash kernel. Loading happens via hypercalls straight from the kexec utility. You need kexec-tools 2.0.4 (I think) or later, compiled with Xen support, but it should JustWork. Should. I have kexec 2.0.5 with Xen support. Doesn't work: Excerpt form strace: sysctl operation failed -- need to rebuild the user-space tool set?\n My personal translation: kexec is tightly coupled to the Xen version (this one was built against Xen 4.4.1 AFAIK). Odd, the hypercall interface did not change in Xen 4.5 for kexec? Perhaps it is making some other hypercalls that are tied in to the version of Xen (like sysctl ones?). I presume with recompiling it works? Perhaps we should add kexec to the tools directory? Gosh no. Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen-pciback: Add MODULE_ALIAS for pciback.
On Nov 26, 2014 11:39 AM, Luis R. Rodriguez mcg...@do-not-panic.com wrote: On Wed, Aug 20, 2014 at 1:26 PM, Ian Campbell ian.campb...@citrix.com wrote: On Wed, 2014-08-20 at 13:20 -0400, Konrad Rzeszutek Wilk wrote: On Wed, Aug 20, 2014 at 06:18:52PM +0100, Ian Campbell wrote: On Wed, 2014-08-20 at 12:40 -0400, Konrad Rzeszutek Wilk wrote: The rest of the Xen device drivers use an module alias to load devices when they shop up in XenBus. show. MODULE_LICENSE(Dual BSD/GPL); MODULE_ALIAS(xen-backend:pci); +MODULE_ALIAS(xen:pci); Isn't that xen-backend:pci already the right thing for a backend device? xen: is for frontends, I thought. Oh, you are right. Cool! Thanks! The patch turned out to be even more trivial than expected ;-) Is this what we expected to be pending work for the item device hotplug (MODULE_ALIAS) on the upstream TODO list? http://wiki.xenproject.org/wiki/XenParavirtOps#Upstream_delta_details This was simply to just auto load that driver when needed right? Right which it does now. Also as for actual PCI device hotplug support: http://wiki.xen.org/wiki/Xen_PCI_Passthrough#Hotplug I don't think we need a delta for that do we? Nope. This one is all done. Luis ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] tools/oxenstored: Fix | vs error in fd event handling
On Wed, Nov 26, 2014 at 06:24:11PM +, Dave Scott wrote: On 26 Nov 2014, at 15:38, Zheng Li d...@zheng.li wrote: On 26/11/2014 15:09, Andrew Cooper wrote: This makes fields 0 and 1 true more often than they should be, resulting problems when handling events. Indeed, looks like a mistake I made when rewriting the logic terms lately. The result is POLLUP or POLLERR events being returned in more categories than we'd interest. Thanks for fixing this! Acked-by: Zheng Li d...@zheng.li This also looks fine to me Acked-by: David Scott dave.sc...@citrix.com Would it be possible to get an Reviewed-by please? Thank you. Cheers, Dave Cheers, Zheng --- This was discovered with XenServers internal Coverity instance. I have yet to work out why the issue is not identified by the upstream coverity scanning. Konrad: This is a bug in the default event handling used by oxenstored in 4.5, as the default switched from select() to poll() in the 4.5 timeframe. It would appear that the negative side effects are limited to just logspam about certain clients attempting invalid actions, but I can't rule out anything more problematic. --- tools/ocaml/xenstored/select_stubs.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/ocaml/xenstored/select_stubs.c b/tools/ocaml/xenstored/select_stubs.c index 4a8edb5..af72b84 100644 --- a/tools/ocaml/xenstored/select_stubs.c +++ b/tools/ocaml/xenstored/select_stubs.c @@ -56,8 +56,8 @@ CAMLprim value stub_select_on_poll(value fd_events, value timeo) { events = Field(Field(fd_events, i), 1); if (c_fds[i].revents POLLNVAL) unix_error(EBADF, select, Nothing); - Field(events, 0) = Val_bool(c_fds[i].events | POLLIN c_fds[i].revents (POLLIN |POLLHUP|POLLERR)); - Field(events, 1) = Val_bool(c_fds[i].events | POLLOUT c_fds[i].revents (POLLOUT|POLLHUP|POLLERR)); + Field(events, 0) = Val_bool(c_fds[i].events POLLIN c_fds[i].revents (POLLIN |POLLHUP|POLLERR)); + Field(events, 1) = Val_bool(c_fds[i].events POLLOUT c_fds[i].revents (POLLOUT|POLLHUP|POLLERR)); Field(events, 2) = Val_bool(c_fds[i].revents POLLPRI); } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] docs/commandline: Refresh document for 4.5
On Nov 27, 2014 8:06 AM, Jan Beulich jbeul...@suse.com wrote: On 27.11.14 at 12:06, andrew.coop...@citrix.com wrote: Konrad: A release ack please. It would be nice if the command line documentation was correct for 4.5. Honestly, unless really controversial, I don't think release acks are really needed for purely documentation changes. nods Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [COVERITY ACCESS] Request for access to Coverity
On Nov 27, 2014 6:59 AM, Tim Deegan t...@xen.org wrote: At 11:39 + on 27 Nov (1417084797), George Dunlap wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA512 I agree to the conditions in the XenProject Coverity contribution guidelines [1]. I'm a developer working for Citrix Systems UK, Ltd. I've been active in the Xen community since 2006; I was release coordinator for the 4.3 and 4.4 releases, and I am currently maintainer of the Xen scheduling subsystem. I would like access primarily to be able to write and speak intelligently about Xen and Coverity in blog postings and conference talks. I figure it would be easier to go through the stats and history myself, rather than trying to get some other developer to do it for me. (Although of course once I have access I'll probably end up doing some work looking at scans anyway.) +1 +1 too. Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] INSTALL: correct EXTRA_CFLAGS handling
On November 27, 2014 4:26:26 AM EST, Olaf Hering o...@aepfle.de wrote: The already documented configure patch was not applied. Adjust documentation to describe existing behaviour. Signed-off-by: Olaf Hering o...@aepfle.de Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com Reviewed-by: me. Don't need an release ack for it. --- v3: reword as suggested by Konrad, trim Cc list v2: resend due to lack of Cc: tags INSTALL | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/INSTALL b/INSTALL index 6bb9d23..0bc67ea 100644 --- a/INSTALL +++ b/INSTALL @@ -128,13 +128,6 @@ original xenstored will be used. Valid names are xenstored and oxenstored. --with-xenstored=name -Using additional CFLAGS to build tools running in dom0 is required when -building distro packages. This is the option to pass things like -RPM_OPT_FLAGS. - --with-extra-cflags-tools=EXTRA_CFLAGS - --with-extra-cflags-qemu-traditional=EXTRA_CFLAGS - --with-extra-cflags-qemu-upstream=EXTRA_CFLAGS - Instead of starting the tools in dom0 with sysv runlevel scripts they can also be started by systemd. If this option is enabled xenstored will receive the communication socked directly from systemd. So starting it @@ -241,6 +234,13 @@ QEMU_UPSTREAM_URL= QEMU_TRADITIONAL_URL= SEABIOS_UPSTREAM_URL= +Using additional CFLAGS to build tools which will run in dom0 is +required when building distro packages. These variables can be used to +pass RPM_OPT_FLAGS. +EXTRA_CFLAGS_XEN_TOOLS= +EXTRA_CFLAGS_QEMU_TRADITIONAL= +EXTRA_CFLAGS_QEMU_XEN= + This variable can be used to use DIR/include and DIR/lib during build. This is the same as PREPEND_LIB and PREPEND_INCLUDES. APPEND_LIB and APPEND_INCLUDES= will be appended to the CFLAGS/LDFLAGS variable. @@ -310,10 +310,10 @@ sudo make install BOOT_DIR=/ood/path/boot EFI_DIR=/odd/path/efi %build export WGET=$(type -P false) export GIT=$(type -P false) +export EXTRA_CFLAGS_XEN_TOOLS=$RPM_OPT_FLAGS +export EXTRA_CFLAGS_QEMU_TRADITIONAL=$RPM_OPT_FLAGS +export EXTRA_CFLAGS_QEMU_XEN=$RPM_OPT_FLAGS %configure \ ---with-extra-cflags-tools=$RPM_OPT_FLAGS \ ---with-extra-cflags-qemu-traditional=$RPM_OPT_FLAGS \ ---with-extra-cflags-qemu-upstream=$RPM_OPT_FLAGS \ --with-initddir=%{_initddir} unset CFLAGS CXXFLAGS FFLAGS LDFLAGS make ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 3/3] python/xs: Correct the indirection of the NULL xshandle() check
On Thu, Nov 27, 2014 at 12:34:34PM +, Andrew Cooper wrote: The code now now matches its comment, and will actually catch the case of a bad xs handle. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Coverity-ID: 1055948 CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com CC: Xen Coverity Team cover...@xen.org Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- tools/python/xen/lowlevel/xs/xs.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/python/xen/lowlevel/xs/xs.c b/tools/python/xen/lowlevel/xs/xs.c index 84e1711..ec364bb 100644 --- a/tools/python/xen/lowlevel/xs/xs.c +++ b/tools/python/xen/lowlevel/xs/xs.c @@ -816,7 +816,7 @@ static int parse_transaction_path(XsHandle *self, PyObject *args, *xh = xshandle(self); -if (!xh) +if (!*xh) return 0; if (!PyArg_ParseTuple(args, ss, thstr, path)) -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 2/3] python/xc: Fix multiple issues in pyxc_readconsolering()
On Fri, Nov 28, 2014 at 11:38:52AM +, Ian Campbell wrote: On Thu, 2014-11-27 at 12:34 +, Andrew Cooper wrote: Don't leak a 16k allocation if PyArg_ParseTupleAndKeywords() or the first xc_readconsolering() fail. It is trivial to run throught the processes memory by repeatedly passing junk parameters to this function. In the case that the call to xc_readconsolering() in the while loop fails, reinstate str before breaking out, and passing a spurious pointer to free(). Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Coverity-IDs: 1054984 1055906 CC: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com CC: Xen Coverity Team cover...@xen.org Acked-by: Ian Campbell ian.campb...@citrix.com Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- tools/python/xen/lowlevel/xc/xc.c | 13 ++--- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/tools/python/xen/lowlevel/xc/xc.c b/tools/python/xen/lowlevel/xc/xc.c index c70b388..2aa0dc7 100644 --- a/tools/python/xen/lowlevel/xc/xc.c +++ b/tools/python/xen/lowlevel/xc/xc.c @@ -1089,7 +1089,7 @@ static PyObject *pyxc_readconsolering(XcObject *self, { unsigned int clear = 0, index = 0, incremental = 0; unsigned int count = 16384 + 1, size = count; -char*str = malloc(size), *ptr; +char*str, *ptr; PyObject*obj; int ret; @@ -1097,15 +1097,17 @@ static PyObject *pyxc_readconsolering(XcObject *self, if ( !PyArg_ParseTupleAndKeywords(args, kwds, |iii, kwd_list, clear, index, incremental) || - !str ) + !(str = malloc(size)) ) return NULL; ret = xc_readconsolering(self-xc_handle, str, count, clear, incremental, index); -if ( ret 0 ) +if ( ret 0 ) { +free(str); return pyxc_error_to_exception(self-xc_handle); +} -while ( !incremental count == size ) +while ( !incremental count == size ret = 0 ) { size += count - 1; if ( size count ) @@ -1119,9 +1121,6 @@ static PyObject *pyxc_readconsolering(XcObject *self, count = size - count; ret = xc_readconsolering(self-xc_handle, str, count, clear, 1, index); -if ( ret 0 ) -break; - count += str - ptr; str = ptr; } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxl: Don't derefence null new_name pointer in libxl_domain_rename()
On Mon, Dec 01, 2014 at 02:50:56PM +, Ian Campbell wrote: On Mon, 2014-12-01 at 14:27 +, Euan Harris wrote: libxl__domain_rename() unconditionally dereferences its new_name parameter, to check whether it is an empty string. Add a check to avoid a segfault if new_name is null. Signed-off-by: Euan Harris euan.har...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com I think this is a good fix to have for 4.5, Konrad CCd. Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- tools/libxl/libxl.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index f84f7c2..6e84b5d 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -385,6 +385,13 @@ int libxl__domain_rename(libxl__gc *gc, uint32_t domid, } } +if (!new_name) { +LIBXL__LOG(ctx, LIBXL__LOG_ERROR, +new domain name not specified); +rc = ERROR_INVAL; +goto x_rc; +} + if (new_name[0]) { /* nonempty names must be unique */ uint32_t domid_e; ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 7/7] xen/pciback: Restore configuration space when detaching from a guest.
On Mon, Dec 01, 2014 at 02:14:56PM +, David Vrabel wrote: On 21/11/14 22:17, Konrad Rzeszutek Wilk wrote: The commit xen/pciback: Don't deadlock when unbinding. was using the version of pci_reset_function which would lock the device lock. That is no good as we can dead-lock. As such we swapped to using the lock-less version and requiring that the callers of 'pcistub_put_pci_dev' take the device lock. And as such this bug got exposed. Using the lock-less version is OK, except that we tried to use 'pci_restore_state' after the lock-less version of __pci_reset_function_locked - which won't work as 'state_saved' is set to false. Said 'state_saved' is a toggle boolean that is to be used by the sequence of a) pci_save_state/pci_restore_state or b) pci_load_and_free_saved_state/pci_restore_state. We don't want to use a) as the guest might have messed up the PCI configuration space and we want it to revert to the state when the PCI device was binded to us. Therefore we pick b) to restore the configuration space. We restore from our 'golden' version of PCI configuration space, when an: - Device is unbinded from pciback - Device is detached from a guest. Reported-by: Sander Eikelenboom li...@eikelenboom.it Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- drivers/xen/xen-pciback/pci_stub.c | 20 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 843a2ba..eb8b58e 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -105,7 +105,7 @@ static void pcistub_device_release(struct kref *kref) */ __pci_reset_function_locked(dev); if (pci_load_and_free_saved_state(dev, dev_data-pci_saved_state)) - dev_dbg(dev-dev, Could not reload PCI state\n); + dev_info(dev-dev, Could not reload PCI state\n); Why dev_info when... @@ -279,9 +281,19 @@ void pcistub_put_pci_dev(struct pci_dev *dev) * (so it's ready for the next domain) */ device_lock_assert(dev-dev); - __pci_reset_function_locked(dev); - pci_restore_state(dev); - + dev_data = pci_get_drvdata(dev); + ret = pci_load_saved_state(dev, dev_data-pci_saved_state); + if (ret 0) + dev_warn(dev-dev, Could not reload PCI state\n); ... this one is dev_warn? Should be the same, dev_info I think? + else { + __pci_reset_function_locked(dev); I think the reset should always be attempted regardless of whether the correct state was loaded or not. /me nods. Will redo it as such. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 04/10] x86: paravirt: Wrap initialization of set_iopl_mask in a macro
On Sun, Nov 02, 2014 at 09:32:20AM -0800, Josh Triplett wrote: This will allow making set_iopl_mask optional later. Signed-off-by: Josh Triplett j...@joshtriplett.org Reviewed-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- arch/x86/include/asm/paravirt_types.h | 1 + arch/x86/kernel/paravirt.c| 2 +- arch/x86/xen/enlighten.c | 4 ++-- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 7549b8b..3caf2a8 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -143,6 +143,7 @@ struct pv_cpu_ops { void (*load_sp0)(struct tss_struct *tss, struct thread_struct *t); void (*set_iopl_mask)(unsigned mask); +#define INIT_SET_IOPL_MASK(f) .set_iopl_mask = f, void (*wbinvd)(void); void (*io_delay)(void); diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 548d25f..e7969d4 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -383,7 +383,7 @@ __visible struct pv_cpu_ops pv_cpu_ops = { .iret = native_iret, .swapgs = native_swapgs, - .set_iopl_mask = native_set_iopl_mask, + INIT_SET_IOPL_MASK(native_set_iopl_mask) .io_delay = native_io_delay, .start_context_switch = paravirt_nop, diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index 1a3f044..8ad0778 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -912,7 +912,7 @@ static void xen_load_sp0(struct tss_struct *tss, xen_mc_issue(PARAVIRT_LAZY_CPU); } -static void xen_set_iopl_mask(unsigned mask) +static void __maybe_unused xen_set_iopl_mask(unsigned mask) { struct physdev_set_iopl set_iopl; @@ -1279,7 +1279,7 @@ static const struct pv_cpu_ops xen_cpu_ops __initconst = { .write_idt_entry = xen_write_idt_entry, .load_sp0 = xen_load_sp0, - .set_iopl_mask = xen_set_iopl_mask, + INIT_SET_IOPL_MASK(xen_set_iopl_mask) .io_delay = xen_io_delay, /* Xen takes care of %gs when switching to usermode for us */ -- 2.1.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Q77 IGD instantly crashes on xen-pciback bind.
On Fri, Nov 28, 2014 at 08:00:40PM -0600, Dr. Greg Wettstein wrote: On Nov 27, 12:11pm, Sander Eikelenboom wrote: } Subject: Re: [Xen-devel] Q77 IGD instantly crashes on xen-pciback bind. Hi, hope the week has gone well for everyone. So we are obviously working with qemu-dm-traditional and with the IGD/LVDS BIOS configuration issue fixed the adapater passthrough is working and Windows7 is coming up and detecting the IGD as a standard VGA display adapter. Additional invocations of the VM after the first one result in failed passthrough with a garbled display. This is probably due to the current lack of slot/bus reset in xen-pciback, Konrad has a preliminary kernel patch for xen-pciback that does this. I have attached the patch, though it has some rough edges in the design :-) I'm currently running with his 3.19 xen-pciback patches series + the preliminary patch for slot/bus reset and rebooting a guest with vga/pci passthrough now works. (i'm running with a radeon card, passed through as a secondary card to the emulated qemu one, in a linux guest using qemu-xen, so i can't help you with your other questions and problems). Thanks for taking the time for respond and forward along the patch. I back ported the do_flr patch into the 3.14.x kernel and spent some time working with it. I thought it might be useful to others to document what we ran into. First of all the issue with the unsuccessful boot of Windows after the first invocation doesn't appear to have anything to do with resetting the card. This was fixed by installing the most recent version of the Intel HD drivers in the Windows guest. If IGD passthrough is done without the HD drivers Windows 7 appears to use its standard VGA driver which seems to be able to initialize and run the IGD device but does not appear to shutdown the device in a manner in which it can be re-started. After the first invocation of the guest is shutdown the screen goes to a solid color. Subsequent invocations result in the flashing multi-color screens which others have documented. With the HD drivers installed IGD passthrough works fine through multiple invocations of a guest with the stock xen-pciback in 3.14.x. We ran 40-50 repetitive Windows guest invocations and every one was completely deterministic. That being said we ran into an issue which we wanted to bounce off the list in the context of this thread. One of the things we were not able to do was to successfuly re-start the IGD display for dom0. After spending a lot of time going through Konrad's reset code it appears the IGD devices in the Q77 and Q87 chipset implementations we are looking at are not advertising reset functions which can be used. It appears that the IGD devices are advertising that they need a Device Specific Initialization (DSI) sequence in order to be enabled or powered up, if we interpret the output of lscpi properly, ie: --- 00:02.0 VGA compatible controller: Intel Corporation Device 0152 (rev 09) (prog-if 00 [VGA controller]) Subsystem: Intel Corporation Device 2036 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 11 Region 0: Memory at f780 (64-bit, non-prefetchable) [size=4M] Region 2: Memory at e000 (64-bit, prefetchable) [size=256M] Region 4: I/O ports at f000 [size=64] Expansion ROM at unassigned [disabled] Capabilities: [90] MSI: Mask- 64bit- Count=1/1 Enable- Address: Data: Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- Capabilities: [a4] PCIe advanced features ? --- The ATI adapters which we have a lot of passthrough experience with are FLR capable and can be re-enabled after passthrough by writing a '1' to the enable pseudo-file in the /sys/bus/pci/devices/BDF directory. The IGD adapters seem to fail to respond to a request to be re-enabled. We would certainly be interested in any suggestions the collective may have with respect to how to get the adapter turned back on and/or implement a reset. CC-ing Chien who has been looking at adding IGD passthrough support in the qemu-upstream. Perhaps he can provide some ideas. Thanks. Sander Have a good weekend. Greg }-- End of excerpt from Sander Eikelenboom As always, Dr. G.W. Wettstein, Ph.D. Enjellic Systems Development, LLC. 4206 N. 19th Ave. Specializing in
Re: [Xen-devel] [PATCH] xen/arm: Handle platforms with edge-triggered virtual timer
On Fri, Nov 28, 2014 at 03:17:06PM +, Julien Grall wrote: Some platforms (such as Xgene and ARMv8 models) use an edge-triggered interrupt for the virtual timer. Even if the timer output signal is masked in the context switch, the GIC will keep track that of any interrupts raised while IRQs are disabled. As soon as IRQs are re-enabled, the virtual interrupt timer will be injected to Xen. If an idle vVCPU was scheduled next then the interrupt handler doesn't expect to the receive the IRQ and will crash: (XEN)[00228388] _spin_lock_irqsave+0x28/0x94 (PC) (XEN)[00228380] _spin_lock_irqsave+0x20/0x94 (LR) (XEN)[00250510] vgic_vcpu_inject_irq+0x40/0x1b0 (XEN)[0024bcd0] vtimer_interrupt+0x4c/0x54 (XEN)[00247010] do_IRQ+0x1a4/0x220 (XEN)[00244864] gic_interrupt+0x50/0xec (XEN)[0024fbac] do_trap_irq+0x20/0x2c (XEN)[00255240] hyp_irq+0x5c/0x60 (XEN)[00241084] context_switch+0xb8/0xc4 (XEN)[0022482c] schedule+0x684/0x6d0 (XEN)[0022785c] __do_softirq+0xcc/0xe8 (XEN)[002278d4] do_softirq+0x14/0x1c (XEN)[00240fac] idle_loop+0x134/0x154 (XEN)[0024c160] start_secondary+0x14c/0x15c (XEN)[0001] 0001 The proper solution is to context switch the virtual interrupt state at the GIC level. This would also avoid masking the output signal which requires specific handling in the guest OS and more complex code in Xen to deal with EOIs, and so is desirable for that reason too. Sadly, this solution requires some refactoring which would not be suitable for a freeze exception for the Xen 4.5 release. For now implement a temporary solution which ignores the virtual timer interrupt when the idle VCPU is running. Signed-off-by: Julien Grall julien.gr...@linaro.org --- Changes in v2: - Reword the commit message and comment in the code to explain the real bug. Based on Ian's reword. - Use unlikely This patch is a bug fix candidate for Xen 4.5 and backport for Xen 4.4. Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com It affects at least Xgene platform and ARMv8 models where Xen may randomly crash. This patch don't inject the virtual timer interrupt if the current VCPU is the idle one. For now, I think this patch is the safest way to resolve the problem. I will work on a proper solution for Xen 4.6. --- xen/arch/arm/time.c | 13 + 1 file changed, 13 insertions(+) diff --git a/xen/arch/arm/time.c b/xen/arch/arm/time.c index a6436f1..471d7a9 100644 --- a/xen/arch/arm/time.c +++ b/xen/arch/arm/time.c @@ -169,6 +169,19 @@ static void timer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) static void vtimer_interrupt(int irq, void *dev_id, struct cpu_user_regs *regs) { +/* + * Edge-triggered interrupt can be used for the virtual timer. Even + * if the timer output signal is masked in the context switch, the + * GIC will keep track that of any interrupts raised while IRQS as + * disabled. As soon as IRQs are re-enabled, the virtual interrupt + * will be injected to Xen. + * + * If an IDLE vCPU was scheduled next then we should ignore the + * interrupt. + */ +if ( unlikely(is_idle_vcpu(current)) ) +return; + current-arch.virt_timer.ctl = READ_SYSREG32(CNTV_CTL_EL0); WRITE_SYSREG32(current-arch.virt_timer.ctl | CNTx_CTL_MASK, CNTV_CTL_EL0); vgic_vcpu_inject_irq(current, current-arch.virt_timer.irq); -- 2.1.3 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v13 09/11] pvqspinlock, x86: Add para-virtualization support
On Wed, Oct 29, 2014 at 04:19:09PM -0400, Waiman Long wrote: This patch adds para-virtualization support to the queue spinlock code base with minimal impact to the native case. There are some minor code changes in the generic qspinlock.c file which should be usable in other architectures. The other code changes are specific to x86 processors and so are all put under the arch/x86 directory. On the lock side, the slowpath code is split into 2 separate functions generated from the same code - one for bare metal and one for PV guest. The switching is done in the _raw_spin_lock* functions. This makes sure that the performance impact to the bare metal case is minimal, just a few NOPs in the _raw_spin_lock* functions. In the PV slowpath code, there are 2 paravirt callee saved calls that minimize register pressure. On the unlock side, however, the disabling of unlock function inlining does have some slight impact on bare metal performance. The actual paravirt code comes in 5 parts; - init_node; this initializes the extra data members required for PV state. PV state data is kept 1 cacheline ahead of the regular data. - link_and_wait_node; this replaces the regular MCS queuing code. CPU halting can happen if the wait is too long. - wait_head; this waits until the lock is avialable and the CPU will be halted if the wait is too long. - wait_check; this is called after acquiring the lock to see if the next queue head CPU is halted. If this is the case, the lock bit is changed to indicate the queue head will have to be kicked on unlock. - queue_unlock; this routine has a jump label to check if paravirt is enabled. If yes, it has to do an atomic cmpxchg to clear the lock bit or call the slowpath function to kick the queue head cpu. Tracking the head is done in two parts, firstly the pv_wait_head will store its cpu number in whichever node is pointed to by the tail part of the lock word. Secondly, pv_link_and_wait_node() will propagate the existing head from the old to the new tail node. Signed-off-by: Waiman Long waiman.l...@hp.com --- arch/x86/include/asm/paravirt.h | 19 ++ arch/x86/include/asm/paravirt_types.h | 20 ++ arch/x86/include/asm/pvqspinlock.h| 411 + arch/x86/include/asm/qspinlock.h | 71 ++- arch/x86/kernel/paravirt-spinlocks.c |6 + include/asm-generic/qspinlock.h |2 + kernel/locking/qspinlock.c| 69 +- 7 files changed, 591 insertions(+), 7 deletions(-) create mode 100644 arch/x86/include/asm/pvqspinlock.h diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index cd6e161..7e296e6 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -712,6 +712,24 @@ static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, #if defined(CONFIG_SMP) defined(CONFIG_PARAVIRT_SPINLOCKS) +#ifdef CONFIG_QUEUE_SPINLOCK + +static __always_inline void pv_kick_cpu(int cpu) +{ + PVOP_VCALLEE1(pv_lock_ops.kick_cpu, cpu); +} + +static __always_inline void pv_lockwait(u8 *lockbyte) +{ + PVOP_VCALLEE1(pv_lock_ops.lockwait, lockbyte); +} + +static __always_inline void pv_lockstat(enum pv_lock_stats type) +{ + PVOP_VCALLEE1(pv_lock_ops.lockstat, type); +} + +#else static __always_inline void __ticket_lock_spinning(struct arch_spinlock *lock, __ticket_t ticket) { @@ -723,6 +741,7 @@ static __always_inline void __ticket_unlock_kick(struct arch_spinlock *lock, { PVOP_VCALL2(pv_lock_ops.unlock_kick, lock, ticket); } +#endif #endif diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index 7549b8b..49e4b76 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -326,6 +326,9 @@ struct pv_mmu_ops { phys_addr_t phys, pgprot_t flags); }; +struct mcs_spinlock; +struct qspinlock; + struct arch_spinlock; #ifdef CONFIG_SMP #include asm/spinlock_types.h @@ -333,9 +336,26 @@ struct arch_spinlock; typedef u16 __ticket_t; #endif +#ifdef CONFIG_QUEUE_SPINLOCK +enum pv_lock_stats { + PV_HALT_QHEAD, /* Queue head halting */ + PV_HALT_QNODE, /* Other queue node halting */ + PV_HALT_ABORT, /* Halting aborted */ + PV_WAKE_KICKED, /* Wakeup by kicking*/ + PV_WAKE_SPURIOUS, /* Spurious wakeup */ + PV_KICK_NOHALT /* Kick but CPU not halted */ +}; +#endif + struct pv_lock_ops { +#ifdef CONFIG_QUEUE_SPINLOCK + struct paravirt_callee_save kick_cpu; + struct paravirt_callee_save lockstat; + struct paravirt_callee_save lockwait; +#else struct paravirt_callee_save lock_spinning; void (*unlock_kick)(struct
Re: [Xen-devel] [PATCH for-4.5] pygrub: Fix regression from c/s d1b93ea, attempt 2
On Tue, Dec 02, 2014 at 02:02:58PM +, Ian Campbell wrote: On Tue, 2014-12-02 at 02:43 +, Andrew Cooper wrote: On 01/12/2014 20:30, Konrad Rzeszutek Wilk wrote: On Fri, Nov 28, 2014 at 11:31:24AM +, Ian Campbell wrote: On Tue, 2014-11-25 at 11:11 -0500, Boris Ostrovsky wrote: c/s d1b93ea causes substantial functional regressions in pygrub's ability to parse bootloader configuration files. c/s d1b93ea itself changed an an interface which previously used exclusively integers, to using strings in the case of a grub configuration with explicit default set, along with changing the code calling the interface to require a string. The default value for default remained as an integer. As a result, any Extlinux or Lilo configuration (which drives this interface exclusively with integers), or Grub configuration which doesn't explicitly declare a default will die with an AttributeError when attempting to call self.cf.default.isdigit() where default is an integer. Sadly, this AttributeError gets swallowed by the blanket ignore in the loop which searches partitions for valid bootloader configurations, causing the issue to be reported as Unable to find partition containing kernel We should explicitly check type of default in image_index() and process it appropriately. Reported-by: Andrew Cooper andrew.coop...@citrix.com Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Acked-by: Ian Campbell ian.campb...@citrix.com In general I would prefer the first line of the commit message to be a short description of the actual functional change and not a reference to fixing some other commit, which is basically meaningless to anyone reading the log even now, nevermind in six months. I think I'm going to start picking up on this more frequently from now on. CCing Konrad for RM input. I'm much less worried about corner cases etc wrt the freeze etc than I was with the larger fix proposed earlier. I am bit lost. I thought Andrew NACKed this? I didn't, as I am not in a position to. I have not tested the result, but believe it is sufficient to fix the exact error at hand. If the maintainers think it is the best solution then so be it, but I am still of the opinion that it is is still a hack upon a hack. At this point in a freeze I'm much happier with: tools/pygrub/src/pygrub |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) The same here. This is now the season of handing out band-aids. As such Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com than tools/pygrub/src/ExtLinuxConf.py |6 +++--- tools/pygrub/src/GrubConf.py |7 ++- tools/pygrub/src/LiloConf.py |6 +++--- 3 files changed, 8 insertions(+), 11 deletions(-) Plus Boris' patch is far easier to reason about in isolation in a dynamically/duck typed language. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PVHVM drivers in upstream linux kernel
On Tue, Dec 02, 2014 at 11:05:14AM +, Ian Campbell wrote: On Tue, 2014-12-02 at 10:54 +, David Vrabel wrote: On 02/12/14 09:39, Juergen Gross wrote: Hi, looking into the upstream linux sources I realized that the PVHVM drivers of XEN are only available with the pvops kernel. Is this on purpose? Shouldn't the frontend drivers, xen/platform-pci.c etc. be configurable without having to enable CONFIG_PARAVIRT? I suppose that would be possible but I don't think it's a useful configuration because you would lose PV spinlocks for example. IIRC the reason this hasn't been implemented until now is that refactoring would be required to the various bits of driver code which assumes PAE + PARAVIRT when they aren't strictly needed, e.g. grant table code. Whether its worth the churn at this stage is debatable, but I think the (in)ability to use PV spinlocks is a red-herring. Adding PV drivers to an HVM guest is a useful thing to do, even without PV spinlocks. PV IO gets you far more incremental benefit than the locks do, adding PV IO paths is the number 1 thing which should be done to any guest. One actual usecase is installing from a distro installer which isn't PAE, let alone PARAVIRT enabled[0], to get far enough that you can install a more capable PVHVM kernel with more bells and whistles. If there were distros around who refused wholesale to enable PARAVIRT even in a non-default kernel then it would be more likely that they could be convinced to enable a set of PV IO drivers, since they have 0 impact on a non-PARAVIRT system, and still give significant benefit to Xen users. I don't know of any of the major distros are refusing PARAVIRT in this way though. Ian. [0] The default i386 Debian installer falls into this camp, but you can use the special PV Xen variant to install as PVHVM too so it's not so critical. And the Fedora 21 LiveISO (32-bit) does too. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel