Re: [Xen-devel] Backport request for tools/hotplug: set mtu from bridge for tap interface
On Tue, Nov 18, 2014 at 06:24:31PM +, Ian Jackson wrote: Daniel Kiper writes (Re: [Xen-devel] delaying 4.4.2 and 4.3.4): By the way, what I should do to have commit f3f5f1927f0d3aef9e3d2ce554dbfa0de73487d5 (tools/hotplug: set mtu from bridge for tap interface) in at least Xen 4.3? I am asking about that more than five months. This patch fixes real bug. I don't seem to be able to find these mails from you but my mailbox is very big. The normal thing ought to be for you to post a backport request and CC the stable tools maintainer (ie me). I'm sorry if I dropped your message. The patch looks reasonable to backport. I have put it on my list for backporting later. I'll wait a bit to see if anyone objects. (I have also CC'd the patch's original author and also Ian C because he acked it for unstable.) Does it apply cleanly to 4.3 and 4.4? I haven't checked. Daniel, if you could check that, that would be helpful. If it doesn't then the normal process would be for the backport requestor (ie you) to post the revised patch against 4.3 and/or 4.4. 4.4 and later have this patch. 4.3 and earlier ones do not have this patch. It could be cherry picked to 4.3 and 4.2 without any issues. Daniel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] fix restore: xenstore entries left when restore failed
On Wed, 2014-11-19 at 15:12 +0800, Chunyan Liu wrote: While running libvirt-tck domain/102-broken-save-restore.t test (save domain, corrupt saved file by truncate the last 512k, then restore), found that restore domain failed, but domain related xenstore entries still exist in xenstore. Add a patch to clear xenstore entries in this case. Signed-off-by: Chunyan Liu cy...@suse.com --- tools/libxl/libxl.c | 52 1 file changed, 52 insertions(+) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index de23fec..447840d 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -1525,6 +1525,54 @@ static void devices_destroy_cb(libxl__egc *egc, libxl__devices_remove_state *drs, int rc); +static void libxl_clear_xs_entry(libxl__gc *gc, uint32_t domid) This seems to duplicate a bunch of stuff already done by libxl__destroy_domid and libxl__devices_destroy etc. I think rather than duplicating this libxl__destroy_domid should be made to Do The Right Thing by trying to clean up any remnants of the domid even if it doesn't currently exist. Alternatively perhaps the real bug is in the error path of the restore functionality, which isn't calling the correct unwind path. Ian, any thoughts? Ian. +const char *dom_path, *vm_path; +char *path; +unsigned int num_kinds, num_dev_xsentries; +char **kinds = NULL, **devs = NULL; +int i, j; + +/* remove libxl path */ +libxl__xs_rm_checked(gc, XBT_NULL, libxl__xs_libxl_path(gc, domid)); + +dom_path = libxl__xs_get_dompath(gc, domid); +if (!dom_path) +return; + +/* remove backend entries */ +path = GCSPRINTF(%s/device, dom_path); +kinds = libxl__xs_directory(gc, XBT_NULL, path, num_kinds); +if (kinds num_kinds) { +for (i = 0; i num_kinds; i++) { +path = GCSPRINTF(%s/device/%s, dom_path, kinds[i]); +devs = libxl__xs_directory(gc, XBT_NULL, path, num_dev_xsentries); +if (!devs) +continue; +for (j = 0; j num_dev_xsentries; j++) { +path = GCSPRINTF(%s/device/%s/%s/backend, + dom_path, kinds[i], devs[j]); +path = libxl__xs_read(gc, XBT_NULL, path); +if (path) +libxl__xs_rm_checked(gc, XBT_NULL, path); +} +} +} + +path = GCSPRINTF(%s/console/backend, dom_path); +path = libxl__xs_read(gc, XBT_NULL, path); +if (path) +libxl__xs_rm_checked(gc, XBT_NULL, path); + +/* remove vm path */ +vm_path = libxl__xs_read(gc, XBT_NULL, GCSPRINTF(%s/vm, dom_path)); +if (vm_path) +libxl__xs_rm_checked(gc, XBT_NULL, vm_path); + +/* remove dom path */ +libxl__xs_rm_checked(gc, XBT_NULL, dom_path); +} + void libxl__destroy_domid(libxl__egc *egc, libxl__destroy_domid_state *dis) { STATE_AO_GC(dis-ao); @@ -1540,6 +1588,10 @@ void libxl__destroy_domid(libxl__egc *egc, libxl__destroy_domid_state *dis) break; case ERROR_INVAL: LIBXL__LOG(ctx, LIBXL__LOG_ERROR, non-existant domain %d, domid); +/* domain may not started successfully but some xenstore entries + * might be created already in earlier stage. We need to clear + * those entries. */ +libxl_clear_xs_entry(gc, domid); default: goto out; } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
Hi Stefano, Thank you for your support. You are right - with latest change you've proposed I got a continuous prints during platform hang: (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 Looks line issue needs further deeper debugging. Regards, Andrii On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: Hello Andrii, we are getting closer :-) It would help if you post the output with GIC_DEBUG defined but without the other change that fixes the issue. I think the problem is probably due to software irqs. You are getting too many gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending messages. That means you are loosing virtual SGIs (guest VCPU to guest VCPU). It would be best to investigate why, especially if you get many more of the same messages without the MAINTENANCE_IRQ change I suggested. This patch might also help understading the problem more: diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index b7516c0..5eaeca2 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v) list_for_each_entry_safe ( p, t, v-arch.vgic.lr_pending, lr_queue ) { i = find_first_zero_bit(this_cpu(lr_mask), nr_lrs); -if ( i = nr_lrs ) return; +if ( i = nr_lrs ) +{ +gdprintk(XENLOG_DEBUG, LRs full, not injecting irq=%u into d%dv%d\n, +p-irq, v-domain-domain_id, v-vcpu_id); +continue; +} spin_lock_irqsave(gic.lock, flags); gic_set_lr(i, p, GICH_LR_PENDING); On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, No hangs with this change. Complete log is the following: U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26) DRA752 ES1.0 ethaddr not set. Validating first E-fuse MAC cpsw - UART enabled - - CPU booting - - Xen starting in Hyp mode - - Zero BSS - - Setting up control registers - - Turning on paging - - Ready - (XEN) Checking for initrd in /chosen (XEN) RAM: 8000 - 9fff (XEN) RAM: a000 - bfff (XEN) RAM: c000 - dfff (XEN) (XEN) MODULE[1]: c200 - c20069aa (XEN) MODULE[2]: c000 - c200 (XEN) MODULE[3]: - (XEN) MODULE[4]: c300 - c301 (XEN) RESVD[0]: ba30 - bfd0 (XEN) RESVD[1]: 9580 - 9590 (XEN) RESVD[2]: 98a0 - 98b0 (XEN) RESVD[3]: 95f0 - 98a0 (XEN) RESVD[4]: 9590 - 95f0 (XEN) (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0 dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1 (XEN) Placing Xen at 0xdfe0-0xe000 (XEN) Xen heap: d200-de00 (49152 pages) (XEN) Dom heap: 344064 pages (XEN) Domain heap initialised (XEN) Looking for UART console serial0 Xen 4.5-unstable (XEN) Xen version 4.5-unstable (atseglytskyi@) (arm-linux-gnueabihf-gcc (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3 20130328 (prerelease)) debu4 (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty (XEN) Processor: 412fc0f2: ARM Limited, variant: 0x2, part 0xc0f, rev 0x2 (XEN) 32-bit Execution: (XEN) Processor Features: 1131:00011011 (XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle (XEN) Extensions: GenericTimer Security (XEN) Debug Features: 02010555 (XEN) Auxiliary Features: (XEN) Memory Model Features: 10201105 2000 0124 02102211 (XEN) ISA Features: 02101110 13112111 21232041 2131 10011142 (XEN) Platform: TI DRA7 (XEN) /psci method must be smc, but is: hvc (XEN) Set AuxCoreBoot1 to dfe0004c (0020004c) (XEN) Set AuxCoreBoot0 to 0x20 (XEN) Generic Timer IRQ: phys=30 hyp=26 virt=27 (XEN) Using generic timer at 6144 KHz (XEN) GIC initialization: (XEN) gic_dist_addr=48211000 (XEN) gic_cpu_addr=48212000 (XEN) gic_hyp_addr=48214000 (XEN) gic_vcpu_addr=48216000 (XEN) gic_maintenance_irq=25 (XEN) GIC: 192 lines, 2 cpus, secure (IID 043b). (XEN) Using scheduler: SMP Credit Scheduler (credit) (XEN) I/O virtualisation disabled (XEN) Allocated console ring of 16 KiB. (XEN) VFP implementer 0x41 architecture 4 part 0x30 variant 0xf rev 0x0 (XEN) Bringing up CPU1 - CPU
Re: [Xen-devel] [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps
From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, November 12, 2014 5:57 PM On 12.11.14 at 10:13, tiejun.c...@intel.com wrote: On 2014/11/12 17:02, Jan Beulich wrote: On 12.11.14 at 09:45, tiejun.c...@intel.com wrote: #2 flags field in each specific device of new domctl would control whether this device need to check/reserve its own RMRR range. But its not dependent on current device assignment domctl, so the user can use them to control which devices need to work as hotplug later, separately. And this could be left as a second step, in order for what needs to be done now to not get more complicated that necessary. Do you mean currently we still rely on the device assignment domctl to provide SBDF? So looks nothing should be changed in our policy. I can't connect your question to what I said. What I tried to tell you Something is misunderstanding to me. was that I don't currently see a need to make this overly complicated: Having the option to punch holes for all devices and (by default) dealing with just the devices assigned at boot may be sufficient as a first step. Yet (repeating just to avoid any misunderstanding) that makes things easier only if we decide to require device assignment to happen before memory getting populated (since in that case there's Here what do you mean, 'if we decide to require device assignment to happen before memory getting populated'? Because -quote- In the present the device assignment is always after memory population. And I also mentioned previously I double checked this sequence with printk. Or you already plan or deciede to change this sequence? So it is now the 3rd time that I'm telling you that part of your decision making as to which route to follow should be to re-consider whether the current sequence of operations shouldn't be changed. Please also consult with the VT-d maintainers (hint to them: participating in this discussion publicly would be really nice) on _all_ decisions to be made here. there's no decision made privately. we hope all the discussions publicly. will get back w/ our thoughts soon. Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 2/4] xen: arm: correct off by one in xgene-storm's map_one_mmio
On Tue, 2014-11-18 at 17:01 +, Julien Grall wrote: Hi Ian, On 11/18/2014 04:44 PM, Ian Campbell wrote: The callers pass the end as the pfn immediately *after* the last page to be mapped, therefore adding one is incorrect and causes an additional page to be mapped. At the same time correct the printing of the mfn values, zero-padding them to 16 digits as for a paddr when they are frame numbers is just confusing. Signed-off-by: Ian Campbell ian.campb...@citrix.com --- xen/arch/arm/platforms/xgene-storm.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/xen/arch/arm/platforms/xgene-storm.c b/xen/arch/arm/platforms/xgene-storm.c index 29c4752..38674cd 100644 --- a/xen/arch/arm/platforms/xgene-storm.c +++ b/xen/arch/arm/platforms/xgene-storm.c @@ -45,9 +45,9 @@ static int map_one_mmio(struct domain *d, const char *what, { int ret; -printk(Additional MMIO %PRIpaddr-%PRIpaddr (%s)\n, +printk(Additional MMIO %lx-%lx (%s)\n, start, end, what); -ret = map_mmio_regions(d, start, end - start + 1, start); +ret = map_mmio_regions(d, start, end - start, start); if ( ret ) printk(Failed to map %s @ %PRIpaddr to dom%d\n, what, start, d-domain_id); As you fixed the previous printf format. I would fix this one too. Yes, good idea. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxc: Expose the pdpe1gb cpuid flag to guest
At 01:29 + on 19 Nov (1416356943), Zhang, Yang Z wrote: Tim Deegan wrote on 2014-11-18: In this case, the guest is entitled to _expect_ pagefaults on 1GB mappings if CPUID claims they are not supported. That sounds like an unlikely thing for the guest to be relying on, but Xen itself does something similar for the SHOPT_FAST_FAULT_PATH (and now also for IOMMU entries for the deferred caching attribute updates). Indeed. How about adding the software check (as Andrew mentioned) firstly and leave the hardware problem (Actually, I don't think we can solve it currently). I don't think we should change the software path unless we can change the hardware behaviour too. It's better to be consistent, and it saves us some cycles in the pt walker. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 1/4] xen: arm: Add earlyprintk for McDivitt.
On Tue, 2014-11-18 at 16:59 +, Julien Grall wrote: Hi Ian, On 11/18/2014 04:44 PM, Ian Campbell wrote: Signed-off-by: Ian Campbell ian.campb...@citrix.com --- xen/arch/arm/Rules.mk |6 ++ 1 file changed, 6 insertions(+) diff --git a/xen/arch/arm/Rules.mk b/xen/arch/arm/Rules.mk index 572d854..ef887a5 100644 --- a/xen/arch/arm/Rules.mk +++ b/xen/arch/arm/Rules.mk @@ -95,6 +95,12 @@ EARLY_PRINTK_BAUD := 115200 EARLY_UART_BASE_ADDRESS := 0x1c02 EARLY_UART_REG_SHIFT := 2 endif +ifeq ($(CONFIG_EARLY_PRINTK), xgene-mcdivitt) +EARLY_PRINTK_INC := 8250 +EARLY_PRINTK_BAUD := 9600 EARLY_PRINTK_BAUD is not necessary as we don't use the initialization function (EARLY_PRINTK_INIT_UART is not set). Oh yes, oops. Also the baud is not even what is actually used, so it's not even serving a documentary purpose. With the EARLY_PRINTK_BAUD dropped, this could be merged with the xgene-storm early printk It's at a different base address. Long term I either want to make this (somewhat) runtime configurable or at least to rationalise the options into the form soc/soc-family-uartN, or perhaps even 8250|pl011| etc@address[,ratesettings], if it's not to skanky to arrange to parse that somewhere in the build system. Not for 4.5 though. (I didn't really understand why the baud rate is different). Different hardware might potentially have different baud rates configured in firmware which we would want to seemlessly follow, but it's moot since the right thing to do in most cases is leave the bootloader provided cfg alone. But I don't think it's 4.5 material. You mean the patch generally or the merging? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 4/4] xen: arm: Support the other 4 PCI buses on Xgene
On Tue, 2014-11-18 at 17:15 +, Julien Grall wrote: +default: +/* Ignore unknown PCI busses */ I would add a printk(Ignoring PCI busses %s\n, dt_node_full_name(dev)); +ret = 0; +break; continue? Yes, that makes sense (probably the ret = is then unnecessary). You can't assume the order of the PCI busses in the device tree. But, I don't understand what this has to do with using continue. +} + +if ( ret 0 ) +return ret; + +printk(Mapped additional regions for PCIe device at 0x%PRIx64\n, + addr); Printing the device tree path would be more helpful than the address. OK. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 4/4] xen: arm: Support the other 4 PCI buses on Xgene
Hi Ian, On 19/11/2014 09:56, Ian Campbell wrote: On Tue, 2014-11-18 at 17:15 +, Julien Grall wrote: +default: +/* Ignore unknown PCI busses */ I would add a printk(Ignoring PCI busses %s\n, dt_node_full_name(dev)); +ret = 0; +break; continue? Yes, that makes sense (probably the ret = is then unnecessary). You can't assume the order of the PCI busses in the device tree. But, I don't understand what this has to do with using continue. The current xgene-storm DTS has the different PCI busses ordered. So as soon as you don't find the PCI range, it means there is no more PCI busses. Without the continue, this patch gives the impression that you rely on the node order on the device tree. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Strangeness in generated xen-command-line.html
http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html has a bunch of random sha id's in it, where the 4.4-testing version does not. They seem to have replaced the various `= boolean` Default: `true` Bits. Andy, Any thoughts or should I investigate? I don't see anything since 4.4 touching the html generation itself (we added pandoc for pdf but didn't touch HTML afaict). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 4/4] xen: arm: Support the other 4 PCI buses on Xgene
On Wed, 2014-11-19 at 10:06 +, Julien Grall wrote: Hi Ian, On 19/11/2014 09:56, Ian Campbell wrote: On Tue, 2014-11-18 at 17:15 +, Julien Grall wrote: +default: +/* Ignore unknown PCI busses */ I would add a printk(Ignoring PCI busses %s\n, dt_node_full_name(dev)); +ret = 0; +break; continue? Yes, that makes sense (probably the ret = is then unnecessary). You can't assume the order of the PCI busses in the device tree. But, I don't understand what this has to do with using continue. The current xgene-storm DTS has the different PCI busses ordered. So as soon as you don't find the PCI range, it means there is no more PCI busses. I don't think it does, the patch iterates over all of the buses, even ones we don't understand, we don't give up at the first one we don't grok. Without the continue, this patch gives the impression that you rely on the node order on the device tree. Regards, ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Strangeness in generated xen-command-line.html
On Wed, 2014-11-19 at 10:12 +, Ian Campbell wrote: http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html has a bunch of random sha id's in it, where the 4.4-testing version does not. They seem to have replaced the various `= boolean` Default: `true` Bits. Andy, Any thoughts or should I investigate? I don't see anything since 4.4 touching the html generation itself (we added pandoc for pdf but didn't touch HTML afaict). FWIW it seems to happen from the conring_size entry onwards, The com1,com2 and earlier are OK. I can't see anything about thecom1,com2 entry which would be causing this... Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Strangeness in generated xen-command-line.html
On 19/11/14 10:12, Ian Campbell wrote: http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html has a bunch of random sha id's in it, where the 4.4-testing version does not. They seem to have replaced the various `= boolean` Default: `true` Bits. Andy, Any thoughts or should I investigate? I don't see anything since 4.4 touching the html generation itself (we added pandoc for pdf but didn't touch HTML afaict). Ian. I have looked into it before but didn't get very far. I suspect it might be a bug in wheezy's markdown. It doesn't reproduce when building using other versions of markdown. I had planned (given some non-existent free time) to see about converting it from markdown to pandoc which has leads to a far more nicely formatted document. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5 4/4] xen: arm: Support the other 4 PCI buses on Xgene
On 19/11/2014 10:18, Ian Campbell wrote: On Wed, 2014-11-19 at 10:06 +, Julien Grall wrote: Hi Ian, On 19/11/2014 09:56, Ian Campbell wrote: On Tue, 2014-11-18 at 17:15 +, Julien Grall wrote: +default: +/* Ignore unknown PCI busses */ I would add a printk(Ignoring PCI busses %s\n, dt_node_full_name(dev)); +ret = 0; +break; continue? Yes, that makes sense (probably the ret = is then unnecessary). You can't assume the order of the PCI busses in the device tree. But, I don't understand what this has to do with using continue. The current xgene-storm DTS has the different PCI busses ordered. So as soon as you don't find the PCI range, it means there is no more PCI busses. I don't think it does, the patch iterates over all of the buses, even ones we don't understand, we don't give up at the first one we don't grok. Hrmm you are right. I don't know why I though the break were bound to the loop and not the switch. Sorry for the noise. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Strangeness in generated xen-command-line.html
On 19/11/14 10:30, Ian Campbell wrote: On Wed, 2014-11-19 at 10:24 +, Andrew Cooper wrote: On 19/11/14 10:12, Ian Campbell wrote: http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html has a bunch of random sha id's in it, where the 4.4-testing version does not. They seem to have replaced the various `= boolean` Default: `true` Bits. Andy, Any thoughts or should I investigate? I don't see anything since 4.4 touching the html generation itself (we added pandoc for pdf but didn't touch HTML afaict). Ian. I have looked into it before but didn't get very far. I suspect it might be a bug in wheezy's markdown. It doesn't reproduce when building using other versions of markdown. Right. It seems to be triggered by the line: `S` is an integer 1 or 2 for the number of stop bits. just removing that makes the issue go away. It's not the `s since removing just those retains the issue. WTAF! Ian. So it does. As best as I can tell, that is all legal mardown for a nested block. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Strangeness in generated xen-command-line.html
On Wed, 2014-11-19 at 10:38 +, Ian Campbell wrote: I've not been able to find a workaround... This works for me... 8--- From 3483179d333c47deacfc8c2eb195bf7dc4a555ff Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Wed, 19 Nov 2014 10:42:18 + Subject: [PATCH] docs: workaround markdown parser error in xen-command-line.markdown Some versions of markdown (specifically the one in Debian Wheezy, currently used to generate http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be confused by nested lists in the middle of multi-paragraph parent list entries as seen in the com1,com2 entry. The effect is that the Default section of all following entries are replace by some sort of hash or checksum (at least, a string of 32 random seeming hex digits). Workaround this issue by making the decriptions of the DPS options a nested list, moving the existing nested list describing the options for S into a third level list. This seems to avoid the issue, and is arguably better formatting in its own right (at least its not a regression IMHO) Signed-off-by: Ian Campbell ian.campb...@citrix.com --- docs/misc/xen-command-line.markdown | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 0830e5f..c40f89b 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -248,17 +248,17 @@ Both option `com1` and `com2` follow the same format. * `DPS` represents the number of data bits, the parity, and the number of stop bits. - `D` is an integer between 5 and 8 for the number of data bits. + * `D` is an integer between 5 and 8 for the number of data bits. - `P` is a single character representing the type of parity: + * `P` is a single character representing the type of parity: - * `n` No - * `o` Odd - * `e` Even - * `m` Mark - * `s` Space + * `n` No + * `o` Odd + * `e` Even + * `m` Mark + * `s` Space - `S` is an integer 1 or 2 for the number of stop bits. + * `S` is an integer 1 or 2 for the number of stop bits. * `io-base` is an integer which specifies the IO base port for UART registers. -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Strangeness in generated xen-command-line.html
On 19/11/14 10:46, Ian Campbell wrote: On Wed, 2014-11-19 at 10:38 +, Ian Campbell wrote: I've not been able to find a workaround... This works for me... 8--- From 3483179d333c47deacfc8c2eb195bf7dc4a555ff Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Wed, 19 Nov 2014 10:42:18 + Subject: [PATCH] docs: workaround markdown parser error in xen-command-line.markdown Some versions of markdown (specifically the one in Debian Wheezy, currently used to generate http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be confused by nested lists in the middle of multi-paragraph parent list entries as seen in the com1,com2 entry. The effect is that the Default section of all following entries are replace by some sort of hash or checksum (at least, a string of 32 random seeming hex digits). Workaround this issue by making the decriptions of the DPS options a nested list, moving the existing nested list describing the options for S into a third level list. This seems to avoid the issue, and is arguably better formatting in its own right (at least its not a regression IMHO) Signed-off-by: Ian Campbell ian.campb...@citrix.com I had just identified a different way, but this way is slightly better. If you take out all the blank lines visible in the context below, the resulting HTML will be correctly formatted and rather neater (i.e. without sporadic blank lines). ~Andrew --- docs/misc/xen-command-line.markdown | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 0830e5f..c40f89b 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -248,17 +248,17 @@ Both option `com1` and `com2` follow the same format. * `DPS` represents the number of data bits, the parity, and the number of stop bits. - `D` is an integer between 5 and 8 for the number of data bits. + * `D` is an integer between 5 and 8 for the number of data bits. - `P` is a single character representing the type of parity: + * `P` is a single character representing the type of parity: - * `n` No - * `o` Odd - * `e` Even - * `m` Mark - * `s` Space + * `n` No + * `o` Odd + * `e` Even + * `m` Mark + * `s` Space - `S` is an integer 1 or 2 for the number of stop bits. + * `S` is an integer 1 or 2 for the number of stop bits. * `io-base` is an integer which specifies the IO base port for UART registers. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration
ping? On Tue, 18 Nov 2014, Stefano Stabellini wrote: Konrad, I think we should have this fix in Xen 4.5. Should I go ahead and backport it? On Mon, 17 Nov 2014, Don Slutz wrote: The other callers to blk_set_enable_write_cache() in this file already check for s-blk == NULL. Signed-off-by: Don Slutz dsl...@verizon.com --- I think this is a bugfix that should be back ported to stable releases. I also think this should be done in xen's copy of QEMU for 4.5 with back port(s) to active stable releases. Note: In 2.1 and earlier the routine is bdrv_set_enable_write_cache(); variable is s-bs. hw/ide/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 00e21cf..d4af5e2 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque, int version_id) { IDEState *s = opaque; -if (s-identify_set) { +if (s-blk s-identify_set) { blk_set_enable_write_cache(s-blk, !!(s-identify_data[85] (1 5))); } return 0; -- 1.8.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Strangeness in generated xen-command-line.html
On Wed, 2014-11-19 at 10:52 +, Andrew Cooper wrote: On 19/11/14 10:46, Ian Campbell wrote: On Wed, 2014-11-19 at 10:38 +, Ian Campbell wrote: I've not been able to find a workaround... This works for me... 8--- From 3483179d333c47deacfc8c2eb195bf7dc4a555ff Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Wed, 19 Nov 2014 10:42:18 + Subject: [PATCH] docs: workaround markdown parser error in xen-command-line.markdown Some versions of markdown (specifically the one in Debian Wheezy, currently used to generate http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be confused by nested lists in the middle of multi-paragraph parent list entries as seen in the com1,com2 entry. The effect is that the Default section of all following entries are replace by some sort of hash or checksum (at least, a string of 32 random seeming hex digits). Workaround this issue by making the decriptions of the DPS options a nested list, moving the existing nested list describing the options for S into a third level list. This seems to avoid the issue, and is arguably better formatting in its own right (at least its not a regression IMHO) Signed-off-by: Ian Campbell ian.campb...@citrix.com I had just identified a different way, but this way is slightly better. If you take out all the blank lines visible in the context below, the resulting HTML will be correctly formatted and rather neater (i.e. without sporadic blank lines). Agreed. 8-- From 53398a9729d391f1fb7b6f753a0032b1f3604d4d Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Wed, 19 Nov 2014 10:42:18 + Subject: [PATCH] docs: workaround markdown parser error in xen-command-line.markdown Some versions of markdown (specifically the one in Debian Wheezy, currently used to generate http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be confused by nested lists in the middle of multi-paragraph parent list entries as seen in the com1,com2 entry. The effect is that the Default section of all following entries are replace by some sort of hash or checksum (at least, a string of 32 random seeming hex digits). Workaround this issue by making the decriptions of the DPS options a nested list, moving the existing nested list describing the options for S into a third level list. This seems to avoid the issue, and is arguably better formatting in its own right (at least its not a regression IMHO) Signed-off-by: Ian Campbell ian.campb...@citrix.com --- v2: Less blank lines == nicer output. --- docs/misc/xen-command-line.markdown | 21 - 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 0830e5f..b7eaeea 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -247,19 +247,14 @@ Both option `com1` and `com2` follow the same format. * Optionally, a clock speed measured in hz can be specified. * `DPS` represents the number of data bits, the parity, and the number of stop bits. - - `D` is an integer between 5 and 8 for the number of data bits. - - `P` is a single character representing the type of parity: - - * `n` No - * `o` Odd - * `e` Even - * `m` Mark - * `s` Space - - `S` is an integer 1 or 2 for the number of stop bits. - + * `D` is an integer between 5 and 8 for the number of data bits. + * `P` is a single character representing the type of parity: + * `n` No + * `o` Odd + * `e` Even + * `m` Mark + * `s` Space + * `S` is an integer 1 or 2 for the number of stop bits. * `io-base` is an integer which specifies the IO base port for UART registers. * `irq` is the IRQ number to use, or `0` to use the UART in poll -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Strangeness in generated xen-command-line.html
On 19/11/14 11:04, Ian Campbell wrote: On Wed, 2014-11-19 at 10:52 +, Andrew Cooper wrote: On 19/11/14 10:46, Ian Campbell wrote: On Wed, 2014-11-19 at 10:38 +, Ian Campbell wrote: I've not been able to find a workaround... This works for me... 8--- From 3483179d333c47deacfc8c2eb195bf7dc4a555ff Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Wed, 19 Nov 2014 10:42:18 + Subject: [PATCH] docs: workaround markdown parser error in xen-command-line.markdown Some versions of markdown (specifically the one in Debian Wheezy, currently used to generate http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be confused by nested lists in the middle of multi-paragraph parent list entries as seen in the com1,com2 entry. The effect is that the Default section of all following entries are replace by some sort of hash or checksum (at least, a string of 32 random seeming hex digits). Workaround this issue by making the decriptions of the DPS options a nested list, moving the existing nested list describing the options for S into a third level list. This seems to avoid the issue, and is arguably better formatting in its own right (at least its not a regression IMHO) Signed-off-by: Ian Campbell ian.campb...@citrix.com I had just identified a different way, but this way is slightly better. If you take out all the blank lines visible in the context below, the resulting HTML will be correctly formatted and rather neater (i.e. without sporadic blank lines). Agreed. 8-- From 53398a9729d391f1fb7b6f753a0032b1f3604d4d Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Wed, 19 Nov 2014 10:42:18 + Subject: [PATCH] docs: workaround markdown parser error in xen-command-line.markdown Some versions of markdown (specifically the one in Debian Wheezy, currently used to generate http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be confused by nested lists in the middle of multi-paragraph parent list entries as seen in the com1,com2 entry. The effect is that the Default section of all following entries are replace by some sort of hash or checksum (at least, a string of 32 random seeming hex digits). Workaround this issue by making the decriptions of the DPS options a nested list, moving the existing nested list describing the options for S into a third level list. This seems to avoid the issue, and is arguably better formatting in its own right (at least its not a regression IMHO) Signed-off-by: Ian Campbell ian.campb...@citrix.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com --- v2: Less blank lines == nicer output. --- docs/misc/xen-command-line.markdown | 21 - 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 0830e5f..b7eaeea 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -247,19 +247,14 @@ Both option `com1` and `com2` follow the same format. * Optionally, a clock speed measured in hz can be specified. * `DPS` represents the number of data bits, the parity, and the number of stop bits. - - `D` is an integer between 5 and 8 for the number of data bits. - - `P` is a single character representing the type of parity: - - * `n` No - * `o` Odd - * `e` Even - * `m` Mark - * `s` Space - - `S` is an integer 1 or 2 for the number of stop bits. - + * `D` is an integer between 5 and 8 for the number of data bits. + * `P` is a single character representing the type of parity: + * `n` No + * `o` Odd + * `e` Even + * `m` Mark + * `s` Space + * `S` is an integer 1 or 2 for the number of stop bits. * `io-base` is an integer which specifies the IO base port for UART registers. * `irq` is the IRQ number to use, or `0` to use the UART in poll ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration
On November 19, 2014 5:52:58 AM EST, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: ping? On Tue, 18 Nov 2014, Stefano Stabellini wrote: Konrad, I think we should have this fix in Xen 4.5. Should I go ahead and backport it? Go for it. Release-Acked-by: Konrad Rzeszutek Wilk (konrad.w...@oracle.com) On Mon, 17 Nov 2014, Don Slutz wrote: The other callers to blk_set_enable_write_cache() in this file already check for s-blk == NULL. Signed-off-by: Don Slutz dsl...@verizon.com --- I think this is a bugfix that should be back ported to stable releases. I also think this should be done in xen's copy of QEMU for 4.5 with back port(s) to active stable releases. Note: In 2.1 and earlier the routine is bdrv_set_enable_write_cache(); variable is s-bs. hw/ide/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 00e21cf..d4af5e2 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque, int version_id) { IDEState *s = opaque; -if (s-identify_set) { +if (s-blk s-identify_set) { blk_set_enable_write_cache(s-blk, !!(s-identify_data[85] (1 5))); } return 0; -- 1.8.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [qemu-mainline test] 31668: regressions - FAIL
flight 31668 qemu-mainline real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/31668/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail REGR. vs. 30603 Tests which did not succeed, but are not blocking: test-armhf-armhf-libvirt 9 guest-start fail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-amd64-i386-libvirt 9 guest-start fail never pass test-armhf-armhf-xl 10 migrate-support-checkfail never pass test-amd64-amd64-libvirt 9 guest-start fail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass version targeted for testing: qemuuf874bf905ff2f8dcc17acbfc61e49a92a6f4d04b baseline version: qemuub00a0ddb31a393b8386d30a9bef4d9bbb249e7ec People who touched revisions under test: Adam Crume adamcr...@gmail.com Alex Bennée alex.ben...@linaro.org Alex Williamson alex.william...@redhat.com Alexander Graf ag...@suse.de Alexey Kardashevskiy a...@ozlabs.ru Amit Shah amit.s...@redhat.com Amos Kong ak...@redhat.com Andreas Färber afaer...@suse.de Andrew Jones drjo...@redhat.com Ard Biesheuvel ard.biesheu...@linaro.org Aurelien Jarno aurel...@aurel32.net Bastian Koppelmann kbast...@mail.uni-paderborn.de Bharata B Rao bhar...@linux.vnet.ibm.com Bin Wu wu.wu...@huawei.com Chao Peng chao.p.p...@linux.intel.com Chen Fan chen.fan.f...@cn.fujitsu.com Chen Gang gang.chen.5...@gmail.com Chenliang chenlian...@huawei.com Chris Johns chr...@rtems.org Chris Spiegel chris.spie...@cypherpath.com Christian Borntraeger borntrae...@de.ibm.com Claudio Fontana claudio.font...@huawei.com Cole Robinson crobi...@redhat.com Corey Minyard cminy...@mvista.com Cornelia Huck cornelia.h...@de.ibm.com David Gibson da...@gibson.dropbear.id.au David Hildenbrand d...@linux.vnet.ibm.com Denis V. Lunev d...@openvz.org Don Slutz dsl...@verizon.com Dongxue Zhang elta@gmail.com Dr. David Alan Gilbert dgilb...@redhat.com Edgar E. Iglesias edgar.igles...@xilinx.com Eduardo Habkost ehabk...@redhat.com Eduardo Otubo eduardo.ot...@profitbricks.com Fabian Aggeler aggel...@ethz.ch Fam Zheng f...@redhat.com Frank Blaschka blasc...@linux.vnet.ibm.com Gal Hammer gham...@redhat.com Gerd Hoffmann kra...@redhat.com Gonglei arei.gong...@huawei.com Greg Bellows greg.bell...@linaro.org Gu Zheng guz.f...@cn.fujitsu.com Hannes Reinecke h...@suse.de Heinz Graalfs graa...@linux.vnet.ibm.com Igor Mammedov imamm...@redhat.com James Harper james.har...@ejbdigital.com.au James Harper ja...@ejbdigital.com.au Jan Kiszka jan.kis...@siemens.com Jan Vesely jano.ves...@gmail.com Jens Freimann jf...@linux.vnet.ibm.com Joel Schopp jsch...@linux.vnet.ibm.com John Snow js...@redhat.com Jonas Gorski j...@openwrt.org Jonas Maebe jonas.ma...@elis.ugent.be Juan Quintela quint...@redhat.com Juan Quintela quint...@trasno.org Jun Li junm...@gmail.com Kevin Wolf kw...@redhat.com KONRAD Frederic fred.kon...@greensocs.com Laszlo Ersek ler...@redhat.com Leon Alrae leon.al...@imgtec.com Li Liang liang.z...@intel.com Li Liu john.li...@huawei.com Luiz Capitulino lcapitul...@redhat.com Maciej W. Rozycki ma...@codesourcery.com Magnus Reftel ref...@spotify.com Marc-André Lureau marcandre.lur...@gmail.com Marcel Apfelbaum marce...@redhat.com Mark Cave-Ayland mark.cave-ayl...@ilande.co.uk Markus Armbruster arm...@redhat.com Martin Decky mar...@decky.cz Martin Simmons mar...@lispworks.com Max Filippov jcmvb...@gmail.com Max Reitz mre...@redhat.com Michael
Re: [Xen-devel] Strangeness in generated xen-command-line.html
On November 19, 2014 6:05:33 AM EST, Andrew Cooper andrew.coop...@citrix.com wrote: On 19/11/14 11:04, Ian Campbell wrote: On Wed, 2014-11-19 at 10:52 +, Andrew Cooper wrote: On 19/11/14 10:46, Ian Campbell wrote: On Wed, 2014-11-19 at 10:38 +, Ian Campbell wrote: I've not been able to find a workaround... This works for me... 8--- From 3483179d333c47deacfc8c2eb195bf7dc4a555ff Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Wed, 19 Nov 2014 10:42:18 + Subject: [PATCH] docs: workaround markdown parser error in xen-command-line.markdown Some versions of markdown (specifically the one in Debian Wheezy, currently used to generate http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be confused by nested lists in the middle of multi-paragraph parent list entries as seen in the com1,com2 entry. The effect is that the Default section of all following entries are replace by some sort of hash or checksum (at least, a string of 32 random seeming hex digits). Workaround this issue by making the decriptions of the DPS options a nested list, moving the existing nested list describing the options for S into a third level list. This seems to avoid the issue, and is arguably better formatting in its own right (at least its not a regression IMHO) Signed-off-by: Ian Campbell ian.campb...@citrix.com I had just identified a different way, but this way is slightly better. If you take out all the blank lines visible in the context below, the resulting HTML will be correctly formatted and rather neater (i.e. without sporadic blank lines). Agreed. 8-- From 53398a9729d391f1fb7b6f753a0032b1f3604d4d Mon Sep 17 00:00:00 2001 From: Ian Campbell ian.campb...@citrix.com Date: Wed, 19 Nov 2014 10:42:18 + Subject: [PATCH] docs: workaround markdown parser error in xen-command-line.markdown Some versions of markdown (specifically the one in Debian Wheezy, currently used to generate http://xenbits.xen.org/docs/unstable/misc/xen-command-line.html) seem to be confused by nested lists in the middle of multi-paragraph parent list entries as seen in the com1,com2 entry. The effect is that the Default section of all following entries are replace by some sort of hash or checksum (at least, a string of 32 random seeming hex digits). Workaround this issue by making the decriptions of the DPS options a nested list, moving the existing nested list describing the options for S into a third level list. This seems to avoid the issue, and is arguably better formatting in its own right (at least its not a regression IMHO) Signed-off-by: Ian Campbell ian.campb...@citrix.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com Release-Acked-by: Konrad Rzeszutek Wilk (konrad.w...@oracle.com) In case you were thinking of putting in 4.5 --- v2: Less blank lines == nicer output. --- docs/misc/xen-command-line.markdown | 21 - 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 0830e5f..b7eaeea 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -247,19 +247,14 @@ Both option `com1` and `com2` follow the same format. * Optionally, a clock speed measured in hz can be specified. * `DPS` represents the number of data bits, the parity, and the number of stop bits. - - `D` is an integer between 5 and 8 for the number of data bits. - - `P` is a single character representing the type of parity: - - * `n` No - * `o` Odd - * `e` Even - * `m` Mark - * `s` Space - - `S` is an integer 1 or 2 for the number of stop bits. - + * `D` is an integer between 5 and 8 for the number of data bits. + * `P` is a single character representing the type of parity: + * `n` No + * `o` Odd + * `e` Even + * `m` Mark + * `s` Space + * `S` is an integer 1 or 2 for the number of stop bits. * `io-base` is an integer which specifies the IO base port for UART registers. * `irq` is the IRQ number to use, or `0` to use the UART in poll ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/2 V3] remove domain field in xenstore backend dir
Chunyan Liu writes ([PATCH 1/2 V3] remove domain field in xenstore backend dir): Remove the unusual 'domain' field under backend directory. The affected are backend/console, backend/vfb, backend/vkbd. Thanks. Acked-by: Ian Jackson ian.jack...@eu.citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, Thank you for your support. You are right - with latest change you've proposed I got a continuous prints during platform hang: (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 Looks line issue needs further deeper debugging. Cool! You could simply print what irqs are in all LRs when they are full, for example you could call gic_dump_info. That would tell us what is taking all the LRs space we have. How many LRs are available on omap5 anyway? I doubt you have so much interrupt traffic to actually fill all the LRs, so I am thinking that a few LRs might not be cleared properly (that should happen on hypervisor entry, gic_update_one_lr should take care of it). Regards, Andrii On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: Hello Andrii, we are getting closer :-) It would help if you post the output with GIC_DEBUG defined but without the other change that fixes the issue. I think the problem is probably due to software irqs. You are getting too many gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending messages. That means you are loosing virtual SGIs (guest VCPU to guest VCPU). It would be best to investigate why, especially if you get many more of the same messages without the MAINTENANCE_IRQ change I suggested. This patch might also help understading the problem more: diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index b7516c0..5eaeca2 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v) list_for_each_entry_safe ( p, t, v-arch.vgic.lr_pending, lr_queue ) { i = find_first_zero_bit(this_cpu(lr_mask), nr_lrs); -if ( i = nr_lrs ) return; +if ( i = nr_lrs ) +{ +gdprintk(XENLOG_DEBUG, LRs full, not injecting irq=%u into d%dv%d\n, +p-irq, v-domain-domain_id, v-vcpu_id); +continue; +} spin_lock_irqsave(gic.lock, flags); gic_set_lr(i, p, GICH_LR_PENDING); On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, No hangs with this change. Complete log is the following: U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26) DRA752 ES1.0 ethaddr not set. Validating first E-fuse MAC cpsw - UART enabled - - CPU booting - - Xen starting in Hyp mode - - Zero BSS - - Setting up control registers - - Turning on paging - - Ready - (XEN) Checking for initrd in /chosen (XEN) RAM: 8000 - 9fff (XEN) RAM: a000 - bfff (XEN) RAM: c000 - dfff (XEN) (XEN) MODULE[1]: c200 - c20069aa (XEN) MODULE[2]: c000 - c200 (XEN) MODULE[3]: - (XEN) MODULE[4]: c300 - c301 (XEN) RESVD[0]: ba30 - bfd0 (XEN) RESVD[1]: 9580 - 9590 (XEN) RESVD[2]: 98a0 - 98b0 (XEN) RESVD[3]: 95f0 - 98a0 (XEN) RESVD[4]: 9590 - 95f0 (XEN) (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0 dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1 (XEN) Placing Xen at 0xdfe0-0xe000 (XEN) Xen heap: d200-de00 (49152 pages) (XEN) Dom heap: 344064 pages (XEN) Domain heap initialised (XEN) Looking for UART console serial0 Xen 4.5-unstable (XEN) Xen version 4.5-unstable (atseglytskyi@) (arm-linux-gnueabihf-gcc (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3 20130328 (prerelease)) debu4 (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty (XEN) Processor: 412fc0f2: ARM Limited, variant: 0x2, part 0xc0f, rev 0x2 (XEN) 32-bit Execution: (XEN) Processor Features: 1131:00011011 (XEN) Instruction Sets: AArch32 Thumb Thumb-2 ThumbEE Jazelle (XEN) Extensions: GenericTimer Security (XEN) Debug Features: 02010555 (XEN) Auxiliary Features: (XEN) Memory Model Features: 10201105 2000 0124 02102211 (XEN) ISA Features: 02101110 13112111 21232041 2131 10011142 (XEN) Platform: TI DRA7 (XEN) /psci method must be smc, but is: hvc (XEN) Set AuxCoreBoot1 to dfe0004c (0020004c) (XEN) Set AuxCoreBoot0 to 0x20 (XEN) Generic
Re: [Xen-devel] [PATCH 2/2 V3] fix rename: xenstore not fully updated
Chunyan Liu writes ([PATCH 2/2 V3] fix rename: xenstore not fully updated): libxl__domain_rename only updates /local/domain/domid/name, /vm/uuid/name in xenstore are not updated. Add code in libxl__domain_rename to update /vm/uuid/name too. Acked-by: Ian Jackson ian.jack...@eu.citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, Nov 19, 2014 at 1:12 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, Thank you for your support. You are right - with latest change you've proposed I got a continuous prints during platform hang: (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 (XEN) gic.c:725:d0v0 LRs full, not injecting irq=2 into d0v0 Looks line issue needs further deeper debugging. Cool! You could simply print what irqs are in all LRs when they are full, for example you could call gic_dump_info. That would tell us what is taking all the LRs space we have. How many LRs are available on omap5 anyway? :) Already done this: (XEN) gic.c:725:d0v0 LRs full, not injecting irq=27 nr_lrs 4 i 4 into d0v0 (XEN) GICH_LRs (vcpu 0) mask=f (XEN)HW_LR[0]=1a1f (XEN)HW_LR[1]=9a00e439 (XEN)HW_LR[2]=1a02 (XEN)HW_LR[3]=9a015856 (XEN) Inflight irq=31 lr=0 (XEN) Inflight irq=57 lr=1 (XEN) Inflight irq=2 lr=2 (XEN) Inflight irq=86 lr=3 (XEN) Inflight irq=27 lr=255 (XEN) Pending irq=27 I doubt you have so much interrupt traffic to actually fill all the LRs, so I am thinking that a few LRs might not be cleared properly (that should happen on hypervisor entry, gic_update_one_lr should take care of it). This actually explains why this happens during domU start - SGI traffic might be very heavy this time Regards, Andrii On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: Hello Andrii, we are getting closer :-) It would help if you post the output with GIC_DEBUG defined but without the other change that fixes the issue. I think the problem is probably due to software irqs. You are getting too many gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending messages. That means you are loosing virtual SGIs (guest VCPU to guest VCPU). It would be best to investigate why, especially if you get many more of the same messages without the MAINTENANCE_IRQ change I suggested. This patch might also help understading the problem more: diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index b7516c0..5eaeca2 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v) list_for_each_entry_safe ( p, t, v-arch.vgic.lr_pending, lr_queue ) { i = find_first_zero_bit(this_cpu(lr_mask), nr_lrs); -if ( i = nr_lrs ) return; +if ( i = nr_lrs ) +{ +gdprintk(XENLOG_DEBUG, LRs full, not injecting irq=%u into d%dv%d\n, +p-irq, v-domain-domain_id, v-vcpu_id); +continue; +} spin_lock_irqsave(gic.lock, flags); gic_set_lr(i, p, GICH_LR_PENDING); On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, No hangs with this change. Complete log is the following: U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26) DRA752 ES1.0 ethaddr not set. Validating first E-fuse MAC cpsw - UART enabled - - CPU booting - - Xen starting in Hyp mode - - Zero BSS - - Setting up control registers - - Turning on paging - - Ready - (XEN) Checking for initrd in /chosen (XEN) RAM: 8000 - 9fff (XEN) RAM: a000 - bfff (XEN) RAM: c000 - dfff (XEN) (XEN) MODULE[1]: c200 - c20069aa (XEN) MODULE[2]: c000 - c200 (XEN) MODULE[3]: - (XEN) MODULE[4]: c300 - c301 (XEN) RESVD[0]: ba30 - bfd0 (XEN) RESVD[1]: 9580 - 9590 (XEN) RESVD[2]: 98a0 - 98b0 (XEN) RESVD[3]: 95f0 - 98a0 (XEN) RESVD[4]: 9590 - 95f0 (XEN) (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0 dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1 (XEN) Placing Xen at 0xdfe0-0xe000 (XEN) Xen heap: d200-de00 (49152 pages) (XEN) Dom heap: 344064 pages (XEN) Domain heap initialised (XEN) Looking for UART console serial0 Xen 4.5-unstable (XEN) Xen version 4.5-unstable (atseglytskyi@) (arm-linux-gnueabihf-gcc (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) 4.7.3 20130328 (prerelease)) debu4 (XEN) Latest ChangeSet: Thu Jul 3 12:55:26 2014 +0300 git:3ee354f-dirty (XEN) Processor: 412fc0f2: ARM Limited, variant: 0x2,
Re: [Xen-devel] RFC: vNUMA project
On Tue, Nov 11, 2014 at 5:36 PM, Wei Liu wei.l...@citrix.com wrote: Third stage: Basic PoD Ballooning Mem_relocation PV/PVH Y na Y na HVM Y YY X NUMA-aware PoD? Hmm, that will certainly be interesting. :-) The point of PoD is to allocate a chunk of memory at guest creation time and have the VM balloon down to fit that amount of memory. If we assume that vnodes correspond to some set of pnodes, then the initial allocation will (ideally) have to come from *some* subset of those pnodes; but depending on the situation, it may be any combinaton. So for example, a guest with 2 vnodes each with 2GiB each might end up with 1G on each pnode, or 2 G on one pnode and none on another. In this case, the only way to get an ideal memory layout is to communicate back to the balloon driver how much memory to free on each virtual node. If the split is 1G / 1G, then the balloon driver will need to allocate 1G for each vnode. If the split was 0.5G / 1.5G, then it would have to allocate 1.5G / 0.5G, c. -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] docs/commandline: Fix formatting issues
On Wed, 2014-11-19 at 11:17 +, Andrew Cooper wrote: In both of these cases, markdown was interpreting the text as regular text, and reflowing it as a regular paragraph, leading to a single line as output. Reformat them as code blocks inside blockquote blocks, which causes them to take their precise whitespace layout. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- Konrad: this is a documentation fix, so requesting a 4.5 ack please. FWIW IMHO documentation fixes in general should have a very low bar to cross until very late in the release cycle... --- docs/misc/xen-command-line.markdown | 38 +-- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index f054d4b..e3a5a15 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -475,13 +475,13 @@ defaults of 1 and unlimited respectively are used instead. For example, with `dom0_max_vcpus=4-8`: - Number of - PCPUs | Dom0 VCPUs - 2| 4 - 4| 4 - 6| 6 - 8| 8 - 10| 8 +Number of + PCPUs | Dom0 VCPUs + 2| 4 + 4| 4 + 6| 6 + 8| 8 + 10| 8 ### dom0\_mem `= List of ( min:size | max:size | size )` @@ -684,18 +684,18 @@ supported only when compiled with XSM\_ENABLE=y on x86. The specified value is a bit mask with the individual bits having the following meaning: -Bit 0 - debug level 0 (unused at present) -Bit 1 - debug level 1 (Control Register logging) -Bit 2 - debug level 2 (VMX logging of MSR restores when context switching) -Bit 3 - debug level 3 (unused at present) -Bit 4 - I/O operation logging -Bit 5 - vMMU logging -Bit 6 - vLAPIC general logging -Bit 7 - vLAPIC timer logging -Bit 8 - vLAPIC interrupt logging -Bit 9 - vIOAPIC logging -Bit 10 - hypercall logging -Bit 11 - MSR operation logging + Bit 0 - debug level 0 (unused at present) + Bit 1 - debug level 1 (Control Register logging) + Bit 2 - debug level 2 (VMX logging of MSR restores when context switching) + Bit 3 - debug level 3 (unused at present) + Bit 4 - I/O operation logging + Bit 5 - vMMU logging + Bit 6 - vLAPIC general logging + Bit 7 - vLAPIC timer logging + Bit 8 - vLAPIC interrupt logging + Bit 9 - vIOAPIC logging + Bit 10 - hypercall logging + Bit 11 - MSR operation logging Recognized in debug builds of the hypervisor only. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 0/2 V3] fix rename: xenstore not fully updated
Hi Konrad, I have another release ack request: Chunyan Liu writes ([PATCH 0/2 V3] fix rename: xenstore not fully updated): Currently libxl__domain_rename only update /local/domain/domid/name, still some places in xenstore are not updated, including: /vm/uuid/name and /local/domain/0/backend/device/domid/.../domain. This patch series updates /vm/uuid/name in xenstore, This ([PATCH 2/2 V3] fix rename: xenstore not fully updated) is a bugfix which I think should go into Xen 4.5. The risk WITHOUT this patch is that there are out-of-tree tools which look here for the domain name and will get confused after it is renamed. The risk WITH this patch is that the implementation could be wrong somehow, in which case the code would need to be updated again. But it's a very small patch and has been fully reviewed. and removes the unusual 'domain' field under backend directory. This is a reference to [PATCH 1/2 V3] remove domain field in xenstore backend dir. The change to libxl is that it no longer writes /local/domain/0/backend/vfb/3/0/domain = name of frontend domain It seems hardly conceivable that anyone could be using this field. Existing users will not work after the domain is renamed, anyway. The risk on both sides of the decision lies entirely with out-of-tree software which looks here for the domain name for some reason. We don't think any such tools exist. Note that the domain name cannot be used directly by a non-dom0 programs because the mapping between domids and domain names is in a part of xenstore which is not accessible to guests. (It is possible that a guest would read this value merely to display it.) If such out-of-tree software exists: The risk WITHOUT this patch is that it might report, or (worse) operate on, the wrong domain entirely. The risk WITH this patch is that it (or some subset of its functionality) would stop working right away. An alternative would be to update all of these entries on rename. That's a large and somewhat fiddly patch which we don't think is appropriate given that the presence of this key is a mistake. Thanks, ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [BUGFIX][PATCH for 2.2 1/1] hw/ide/core.c: Prevent SIGSEGV during migration
On Wed, 19 Nov 2014, Konrad Rzeszutek Wilk wrote: On November 19, 2014 5:52:58 AM EST, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: ping? On Tue, 18 Nov 2014, Stefano Stabellini wrote: Konrad, I think we should have this fix in Xen 4.5. Should I go ahead and backport it? Go for it. Release-Acked-by: Konrad Rzeszutek Wilk (konrad.w...@oracle.com) Done, thanks! On Mon, 17 Nov 2014, Don Slutz wrote: The other callers to blk_set_enable_write_cache() in this file already check for s-blk == NULL. Signed-off-by: Don Slutz dsl...@verizon.com --- I think this is a bugfix that should be back ported to stable releases. I also think this should be done in xen's copy of QEMU for 4.5 with back port(s) to active stable releases. Note: In 2.1 and earlier the routine is bdrv_set_enable_write_cache(); variable is s-bs. hw/ide/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 00e21cf..d4af5e2 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -2401,7 +2401,7 @@ static int ide_drive_post_load(void *opaque, int version_id) { IDEState *s = opaque; -if (s-identify_set) { +if (s-blk s-identify_set) { blk_set_enable_write_cache(s-blk, !!(s-identify_data[85] (1 5))); } return 0; -- 1.8.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote: So it looks like there is not actually anything wrong, is just that you have too much inflight irqs? It should cause problems because in that case GICH_HCR_UIE should be set and you should get a maintenance interrupt when LRs become available (actually when none, or only one, of the List register entries is marked as a valid interrupt). Maybe GICH_HCR_UIE is the one that doesn't work properly. How much testing did this aspect get when the no-maint-irq series originally went in? Did you manage to find a workload which filled all the LRs or try artificially limiting the number of LRs somehow in order to provoke it? I ask because my intuition is that this won't happen very much, meaning those code paths may not be as well tested... It might be worth checking that you are receiving maintenance interrupts: diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index b7516c0..b3eaa44 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -868,6 +868,8 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r * on return to guest that is going to clear the old LRs and inject * new interrupts. */ + +gdprintk(XENLOG_DEBUG, maintenance interrupt\n); } void gic_dump_info(struct vcpu *v) You could also try to replace GICH_HCR_UIE with GICH_HCR_NPIE, you should still be receiving maintenance interrupts when one or more LRs become available. I doubt you have so much interrupt traffic to actually fill all the LRs, so I am thinking that a few LRs might not be cleared properly (that should happen on hypervisor entry, gic_update_one_lr should take care of it). This actually explains why this happens during domU start - SGI traffic might be very heavy this time Regards, Andrii On Tue, Nov 18, 2014 at 7:51 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: Hello Andrii, we are getting closer :-) It would help if you post the output with GIC_DEBUG defined but without the other change that fixes the issue. I think the problem is probably due to software irqs. You are getting too many gic.c:617:d0v1 trying to inject irq=2 into d0v0, when it is still lr_pending messages. That means you are loosing virtual SGIs (guest VCPU to guest VCPU). It would be best to investigate why, especially if you get many more of the same messages without the MAINTENANCE_IRQ change I suggested. This patch might also help understading the problem more: diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index b7516c0..5eaeca2 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -717,7 +717,12 @@ static void gic_restore_pending_irqs(struct vcpu *v) list_for_each_entry_safe ( p, t, v-arch.vgic.lr_pending, lr_queue ) { i = find_first_zero_bit(this_cpu(lr_mask), nr_lrs); -if ( i = nr_lrs ) return; +if ( i = nr_lrs ) +{ +gdprintk(XENLOG_DEBUG, LRs full, not injecting irq=%u into d%dv%d\n, +p-irq, v-domain-domain_id, v-vcpu_id); +continue; +} spin_lock_irqsave(gic.lock, flags); gic_set_lr(i, p, GICH_LR_PENDING); On Tue, 18 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, No hangs with this change. Complete log is the following: U-Boot SPL 2013.10-00499-g062782f (Oct 14 2014 - 11:36:26) DRA752 ES1.0 ethaddr not set. Validating first E-fuse MAC cpsw - UART enabled - - CPU booting - - Xen starting in Hyp mode - - Zero BSS - - Setting up control registers - - Turning on paging - - Ready - (XEN) Checking for initrd in /chosen (XEN) RAM: 8000 - 9fff (XEN) RAM: a000 - bfff (XEN) RAM: c000 - dfff (XEN) (XEN) MODULE[1]: c200 - c20069aa (XEN) MODULE[2]: c000 - c200 (XEN) MODULE[3]: - (XEN) MODULE[4]: c300 - c301 (XEN) RESVD[0]: ba30 - bfd0 (XEN) RESVD[1]: 9580 - 9590 (XEN) RESVD[2]: 98a0 - 98b0 (XEN) RESVD[3]: 95f0 - 98a0 (XEN) RESVD[4]: 9590 - 95f0 (XEN) (XEN) Command line: dom0_mem=128M console=dtuart dtuart=serial0 dom0_max_vcpus=2 bootscrub=0 flask_enforcing=1 (XEN) Placing Xen at 0xdfe0-0xe000 (XEN) Xen heap: d200-de00 (49152 pages) (XEN) Dom heap: 344064 pages (XEN) Domain heap initialised (XEN) Looking for UART console serial0 Xen
Re: [Xen-devel] Xen 4.5 random freeze question
Hi Stefano, if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) -GICH[GICH_HCR] |= GICH_HCR_UIE; +GICH[GICH_HCR] |= GICH_HCR_NPIE; else -GICH[GICH_HCR] = ~GICH_HCR_UIE; +GICH[GICH_HCR] = ~GICH_HCR_NPIE; } Yes, exactly I tried, hang still occurs with this change Regards, Andrii -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
Hi Julien, On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall julien.gr...@linaro.org wrote: On 11/19/2014 12:17 PM, Stefano Stabellini wrote: On Wed, 19 Nov 2014, Ian Campbell wrote: On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote: So it looks like there is not actually anything wrong, is just that you have too much inflight irqs? It should cause problems because in that case GICH_HCR_UIE should be set and you should get a maintenance interrupt when LRs become available (actually when none, or only one, of the List register entries is marked as a valid interrupt). Maybe GICH_HCR_UIE is the one that doesn't work properly. How much testing did this aspect get when the no-maint-irq series originally went in? Did you manage to find a workload which filled all the LRs or try artificially limiting the number of LRs somehow in order to provoke it? I ask because my intuition is that this won't happen very much, meaning those code paths may not be as well tested... I did test it by artificially limiting the number of LRs to 1. However there have been many iterations of that series and I didn't run this test at every iteration. am I the only to think this may not be related to this bug? All the LRs are full with IRQ of the same priority. So it's valid. As gic_restore_pending_irqs is called every time that we return to the guest. It could be anything else. It would be interesting to see why we are trapping all the time in Xen. I may perform any test if you have some specific scenario. Regards, -- Julien Grall -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
On 11/19/2014 12:40 PM, Andrii Tseglytskyi wrote: Hi Julien, On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall julien.gr...@linaro.org wrote: On 11/19/2014 12:17 PM, Stefano Stabellini wrote: On Wed, 19 Nov 2014, Ian Campbell wrote: On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote: So it looks like there is not actually anything wrong, is just that you have too much inflight irqs? It should cause problems because in that case GICH_HCR_UIE should be set and you should get a maintenance interrupt when LRs become available (actually when none, or only one, of the List register entries is marked as a valid interrupt). Maybe GICH_HCR_UIE is the one that doesn't work properly. How much testing did this aspect get when the no-maint-irq series originally went in? Did you manage to find a workload which filled all the LRs or try artificially limiting the number of LRs somehow in order to provoke it? I ask because my intuition is that this won't happen very much, meaning those code paths may not be as well tested... I did test it by artificially limiting the number of LRs to 1. However there have been many iterations of that series and I didn't run this test at every iteration. am I the only to think this may not be related to this bug? All the LRs are full with IRQ of the same priority. So it's valid. As gic_restore_pending_irqs is called every time that we return to the guest. It could be anything else. It would be interesting to see why we are trapping all the time in Xen. I may perform any test if you have some specific scenario. I have no specific scenario in my mind :/. It looks like I'm able to reproduce it on my ARM board by the restricted the number of LRs to 1. I will investigate. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)
Il 14/11/2014 12:25, Fabio Fantoni ha scritto: dom0 xen-unstable from staging git with x86/hvm: Extend HVM cpuid leaf with vcpu id and x86/hvm: Add per-vcpu evtchn upcalls patches, and qemu 2.2 from spice git (spice/next commit e779fa0a715530311e6f59fc8adb0f6eca914a89): https://github.com/Fantu/Xen/commits/rebase/m2r-staging I tried with qemu tag v2.2.0-rc2 and crash still happen, here the full backtrace of latest test: Program received signal SIGSEGV, Segmentation fault. 0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73 73 eax = env-regs[R_EAX]; (gdb) bt full #0 0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73 s = 0x564443a0 cs = 0x0 cpu = 0x0 __func__ = vmport_ioport_read env = 0x8250 command = 0 '\000' eax = 0 #1 0x55655fc4 in memory_region_read_accessor (mr=0x5628, addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410 tmp = 0 #2 0x556562b7 in access_with_adjusted_size (addr=0, value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4, access=0x55655f62 memory_region_read_accessor, mr=0x5628) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480 access_mask = 4294967295 access_size = 4 i = 0 #3 0x556590e9 in memory_region_dispatch_read1 (mr=0x5628, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077 data = 0 #4 0x556591b1 in memory_region_dispatch_read (mr=0x5628, addr=0, pval=0x7fffd9a8, size=4) ---Type return to continue, or q return to quit--- at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099 No locals. #5 0x5565cbbc in io_mem_read (mr=0x5628, addr=0, pval=0x7fffd9a8, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1962 No locals. #6 0x5560a1ca in address_space_rw (as=0x55eaf920, addr=22104, buf=0x7fffda50 \377\377\377\377, len=4, is_write=false) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2167 l = 4 ptr = 0x55a92d87 %s/%d:\n val = 7852232130387826944 addr1 = 0 mr = 0x5628 error = false #7 0x5560a38f in address_space_read (as=0x55eaf920, addr=22104, buf=0x7fffda50 \377\377\377\377, len=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2205 No locals. #8 0x5564fd4b in cpu_inl (addr=22104) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/ioport.c:117 buf = \377\377\377\377 val = 21845 #9 0x55670c73 in do_inp (addr=22104, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:684 ---Type return to continue, or q return to quit--- No locals. #10 0x55670ee0 in cpu_ioreq_pio (req=0x77ff3020) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:747 i = 1 #11 0x556714b3 in handle_ioreq (state=0x563c2510, req=0x77ff3020) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:853 No locals. #12 0x55671826 in cpu_handle_ioreq (opaque=0x563c2510) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:931 state = 0x563c2510 req = 0x77ff3020 #13 0x5596e240 in qemu_iohandler_poll (pollfds=0x56389a30, ret=1) at iohandler.c:143 revents = 1 pioh = 0x563f7610 ioh = 0x56450a40 #14 0x5596de1c in main_loop_wait (nonblocking=0) at main-loop.c:495 ret = 1 timeout = 4294967295 timeout_ns = 3965432 #15 0x55756d3f in main_loop () at vl.c:1882 nonblocking = false last_io = 0 #16 0x5575ea49 in main (argc=62, argv=0x7fffe048, envp=0x7fffe240) at vl.c:4400 ---Type return to continue, or q return to quit--- i = 128 snapshot = 0 linux_boot = 0 initrd_filename = 0x0 kernel_filename = 0x0 kernel_cmdline = 0x55a48f86 boot_order = 0x56387460 dc ds = 0x564b2040 cyls = 0 heads = 0 secs = 0 translation = 0 hda_opts = 0x0 opts = 0x563873b0 machine_opts = 0x56389010 icount_opts = 0x0 olist = 0x55e57e80 optind = 62 optarg = 0x7fffe914 file=/mnt/vm/disks/FEDORA19.disk1.xm,if=ide,index=0,media=disk,format=raw,cache=writeback loadvm = 0x0 machine_class = 0x5637d5c0 cpu_model = 0x0 vga_model = 0x0 qtest_chrdev = 0x0 ---Type return to continue, or q return to quit--- qtest_log = 0x0 pid_file = 0x0 incoming = 0x0 show_vnc_port = 0 defconfig = true userconfig = true log_mask = 0x0 log_file = 0x0
Re: [Xen-devel] Xen 4.5 random freeze question
On 11/19/2014 01:30 PM, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 3:26 PM, Julien Grall julien.gr...@linaro.org wrote: On 11/19/2014 12:40 PM, Andrii Tseglytskyi wrote: Hi Julien, On Wed, Nov 19, 2014 at 2:23 PM, Julien Grall julien.gr...@linaro.org wrote: On 11/19/2014 12:17 PM, Stefano Stabellini wrote: On Wed, 19 Nov 2014, Ian Campbell wrote: On Wed, 2014-11-19 at 11:42 +, Stefano Stabellini wrote: So it looks like there is not actually anything wrong, is just that you have too much inflight irqs? It should cause problems because in that case GICH_HCR_UIE should be set and you should get a maintenance interrupt when LRs become available (actually when none, or only one, of the List register entries is marked as a valid interrupt). Maybe GICH_HCR_UIE is the one that doesn't work properly. How much testing did this aspect get when the no-maint-irq series originally went in? Did you manage to find a workload which filled all the LRs or try artificially limiting the number of LRs somehow in order to provoke it? I ask because my intuition is that this won't happen very much, meaning those code paths may not be as well tested... I did test it by artificially limiting the number of LRs to 1. However there have been many iterations of that series and I didn't run this test at every iteration. am I the only to think this may not be related to this bug? All the LRs are full with IRQ of the same priority. So it's valid. As gic_restore_pending_irqs is called every time that we return to the guest. It could be anything else. It would be interesting to see why we are trapping all the time in Xen. I may perform any test if you have some specific scenario. I have no specific scenario in my mind :/. It looks like I'm able to reproduce it on my ARM board by the restricted the number of LRs to 1. Do you mean that you got a hang with current xen/master branch ? Yes but I forgot to update another part of the code. With the patch below to restrict the number of LRs I'm still able to boot. And don't see any maintenance interrupt. Stefano, is it valid? diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c index faad1ff..c1c0f7ff 100644 --- a/xen/arch/arm/gic-v2.c +++ b/xen/arch/arm/gic-v2.c @@ -327,6 +327,7 @@ static void __cpuinit gicv2_hyp_init(void) vtr = readl_gich(GICH_VTR); nr_lrs = (vtr GICH_V2_VTR_NRLRGS) + 1; gicv2_info.nr_lrs = nr_lrs; +gicv2_info.nr_lrs = 1; writel_gich(GICH_MISR_EOI, GICH_MISR); } @@ -488,6 +489,16 @@ static void gicv2_write_lr(int lr, const struct gic_lr *lr_reg) static void gicv2_hcr_status(uint32_t flag, bool_t status) { +uint32_t lr = readl_gich(GICH_LR + 0); + +if ( status ) +lr |= GICH_V2_LR_MAINTENANCE_IRQ; +else +lr = ~GICH_V2_LR_MAINTENANCE_IRQ; + +writel_gich(lr, GICH_LR + 0); + +#if 0 uint32_t hcr = readl_gich(GICH_HCR); if ( status ) @@ -496,6 +507,7 @@ static void gicv2_hcr_status(uint32_t flag, bool_t status) hcr = (~flag); writel_gich(hcr, GICH_HCR); +#endif } static unsigned int gicv2_read_vmcr_priority(void) diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 70d10d6..c726d7a 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -599,6 +599,7 @@ static void maintenance_interrupt(int irq, void *dev_id, struct cpu_user_regs *r * on return to guest that is going to clear the old LRs and inject * new interrupts. */ +gdprintk(XENLOG_DEBUG, \n); } void gic_dump_info(struct vcpu *v) -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) -GICH[GICH_HCR] |= GICH_HCR_UIE; +GICH[GICH_HCR] |= GICH_HCR_NPIE; else -GICH[GICH_HCR] = ~GICH_HCR_UIE; +GICH[GICH_HCR] = ~GICH_HCR_NPIE; } Yes, exactly I tried, hang still occurs with this change We need to figure out why during the hang you still have all the LRs busy even if you are getting maintenance interrupts that should cause them to be cleared. Could you please call gic_dump_info(current) from maintenance_interrupt, and post the output during the hang? Remove the other gic_dump_info to avoid confusion, we want to understand what is the status of the LRs after clearing them upon receiving a maintenance interrupt at busy times. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)
I think I know what is happening here. But you are pointing at the wrong change. commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4 Is what I am guessing at this time is the issue. I think that xen_enabled() is returning false in pc_machine_initfn. Where as in pc_init1 is is returning true. I am thinking that: diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 7bb97a4..3268c29 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = { .desc = Xen Fully-virtualized PC, .init = pc_xen_hvm_init, .max_cpus = HVM_MAX_VCPUS, -.default_machine_opts = accel=xen, +.default_machine_opts = accel=xen,vmport=off, .hot_add_cpu = pc_hot_add_cpu, }; #endif Will fix your issue. I have not tested this yet. -Don Slutz On 11/19/14 09:04, Fabio Fantoni wrote: Il 14/11/2014 12:25, Fabio Fantoni ha scritto: dom0 xen-unstable from staging git with x86/hvm: Extend HVM cpuid leaf with vcpu id and x86/hvm: Add per-vcpu evtchn upcalls patches, and qemu 2.2 from spice git (spice/next commit e779fa0a715530311e6f59fc8adb0f6eca914a89): https://github.com/Fantu/Xen/commits/rebase/m2r-staging I tried with qemu tag v2.2.0-rc2 and crash still happen, here the full backtrace of latest test: Program received signal SIGSEGV, Segmentation fault. 0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73 73 eax = env-regs[R_EAX]; (gdb) bt full #0 0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73 s = 0x564443a0 cs = 0x0 cpu = 0x0 __func__ = vmport_ioport_read env = 0x8250 command = 0 '\000' eax = 0 #1 0x55655fc4 in memory_region_read_accessor (mr=0x5628, addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410 tmp = 0 #2 0x556562b7 in access_with_adjusted_size (addr=0, value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4, access=0x55655f62 memory_region_read_accessor, mr=0x5628) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480 access_mask = 4294967295 access_size = 4 i = 0 #3 0x556590e9 in memory_region_dispatch_read1 (mr=0x5628, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077 data = 0 #4 0x556591b1 in memory_region_dispatch_read (mr=0x5628, addr=0, pval=0x7fffd9a8, size=4) ---Type return to continue, or q return to quit--- at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099 No locals. #5 0x5565cbbc in io_mem_read (mr=0x5628, addr=0, pval=0x7fffd9a8, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1962 No locals. #6 0x5560a1ca in address_space_rw (as=0x55eaf920, addr=22104, buf=0x7fffda50 \377\377\377\377, len=4, is_write=false) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2167 l = 4 ptr = 0x55a92d87 %s/%d:\n val = 7852232130387826944 addr1 = 0 mr = 0x5628 error = false #7 0x5560a38f in address_space_read (as=0x55eaf920, addr=22104, buf=0x7fffda50 \377\377\377\377, len=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2205 No locals. #8 0x5564fd4b in cpu_inl (addr=22104) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/ioport.c:117 buf = \377\377\377\377 val = 21845 #9 0x55670c73 in do_inp (addr=22104, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:684 ---Type return to continue, or q return to quit--- No locals. #10 0x55670ee0 in cpu_ioreq_pio (req=0x77ff3020) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:747 i = 1 #11 0x556714b3 in handle_ioreq (state=0x563c2510, req=0x77ff3020) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:853 No locals. #12 0x55671826 in cpu_handle_ioreq (opaque=0x563c2510) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:931 state = 0x563c2510 req = 0x77ff3020 #13 0x5596e240 in qemu_iohandler_poll (pollfds=0x56389a30, ret=1) at iohandler.c:143 revents = 1 pioh = 0x563f7610 ioh = 0x56450a40 #14 0x5596de1c in main_loop_wait (nonblocking=0) at main-loop.c:495 ret = 1 timeout = 4294967295 timeout_ns = 3965432 #15 0x55756d3f in main_loop () at vl.c:1882 nonblocking = false last_io = 0 #16 0x5575ea49 in main (argc=62, argv=0x7fffe048, envp=0x7fffe240) at vl.c:4400 ---Type return to continue, or q return to quit--- i = 128 snapshot = 0 linux_boot = 0 initrd_filename = 0x0 kernel_filename =
Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq
On Wed, Nov 19, 2014 at 12:16:44PM +0100, Sander Eikelenboom wrote: Wednesday, November 19, 2014, 2:55:41 AM, you wrote: On Tue, Nov 18, 2014 at 11:12:54PM +0100, Sander Eikelenboom wrote: Tuesday, November 18, 2014, 9:56:33 PM, you wrote: Uhmm i thought i had these switched off (due to problems earlier and then forgot about them .. however looking at the earlier reports these lines were also in those reports). The xen-syms and these last runs are all with a prestine xen tree cloned today (staging branch), so the qemu-xen and seabios defined with that were also freshly cloned and had a new default seabios config. (just to rule out anything stale in my tree) If you don't see those messages .. perhaps your seabios and qemu trees (and at least the seabios config) are not the most recent (they don't get updated automatically when you just do a git pull on the main tree) ? In /tools/firmware/seabios-dir/.config i have: CONFIG_USB=y CONFIG_USB_UHCI=y CONFIG_USB_OHCI=y CONFIG_USB_EHCI=y CONFIG_USB_XHCI=y CONFIG_USB_MSC=y CONFIG_USB_UAS=y CONFIG_USB_HUB=y CONFIG_USB_KEYBOARD=y CONFIG_USB_MOUSE=y I seem to have the same thing. Perhaps it is my XHCI controller being wonky. And this is all just from a: - git clone git://xenbits.xen.org/xen.git -b staging - make clean ./configure make -j6 make -j6 install Aye. .. snip.. 1) test_and_[set|clear]_bit sometimes return unexpected values. [But this might be invalid as the addition of the 8303faaf25a8 might be correct - as the second dpci the softirq is processing could be the MSI one] Would there be an easy way to stress test this function separately in some debugging function to see if it indeed is returning unexpected values ? Sadly no. But you got me looking in the right direction when you mentioned 'timeout'. 2) INIT_LIST_HEAD operations on the same CPU are not honored. Just curious, have you also tested the patches on AMD hardware ? Yes. To reproduce this the first thing I did was to get an AMD box. When i look at the combination of (2) and (3), It seems it could be an interaction between the two passed through devices and/or different IRQ types. Could be - as in it is causing this issue to show up faster than expected. Or it is the one that triggers more than one dpci happening at the same time. Well that didn't seem to be it (see separate amendment i mailed previously) Right, the current theory I've is that the interrupts are not being Acked within 8 milisecond and we reset the 'state' - and at the same time we get an interrupt and schedule it - while we are still processing the same interrupt. This would explain why the 'test_and_clear_bit' got the wrong value. In regards to the list poison - following this thread of logic - with the 'state = 0' set we open the floodgates for any CPU to put the same 'struct hvm_pirq_dpci' on its list. We do reset the 'state' on _every_ GSI that is mapped to a guest - so we also reset the 'state' for the MSI one (XHCI). Anyhow in your case: CPUX: CPUY: pt_irq_time_out: state = 0; [out of timer coder, theraise_softirq pirq_dpci is on the dpci_list] [adds the pirq_dpci as state == 0] softirq_dpcisoftirq_dpci: list_del [entries poison] list_del = BOOM Is what I believe is happening. The INTX device - once I put a load on it - does not trigger any pt_irq_time_out, so that would explain why I cannot hit this. But I believe your card hits these hiccups. Hi Konrad, I just tested you 5 patches and as a result i still got an(other) host crash: (complete serial log attached) (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc x86_64 debug=y Not tainted ] (XEN) [2014-11-18 21:55:41.591] CPU:0 (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc x86_64 debug=y Not tainted ] (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08012c7e7]CPU:2 (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08014a461] hvm_do_IRQ_dpci+0xbd/0x13c (XEN) [2014-11-18 21:55:41.591] RFLAGS: 00010006 _spin_unlock+0x1f/0x30CONTEXT: hypervisor Duh! Here is another patch on top of the five you have (attached and inline). Hi Konrad, Happy to report it has been running with this additional patch for 2 hours now without any problems. I think you nailed it :-) Could you also do an 'xl debug-keys k' and send that please? More than happy to test the definitive patch as well.
Re: [Xen-devel] Problems accessing passthrough PCI device
Hello Jan and Konrad, Tuesday, November 18, 2014, 1:49:13 PM, you wrote: I've just checked this with lspci. I see that the IO is being enabled. Memory you mean. Yes. Sorry. Any other idea on why I might be reading back 0xff for all PCI memory area reads? The lspci output follows. Since this isn't behind a bridge - no, not really. Did you try this with any other device for comparison purposes? This is getting more interesting. It seems that something is overwriting the pci-back configuration data. Starting from a fresh reboot I checked the Dom0 pci configuration and got this: root@smartin-xen:~# lspci -s 00:19.0 -x 00:19.0 Ethernet controller: Intel Corporation Device 1559 (rev 04) 00: 86 80 59 15 00 00 10 00 04 00 00 02 00 00 00 00 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 54 20 30: 00 00 00 00 c8 00 00 00 00 00 00 00 05 01 00 00 I then start/stop my DomU and checked the Dom0 pci configuration again and got this: root@smartin-xen:~# lspci -s 00:19.0 -x 00:19.0 Ethernet controller: Intel Corporation Device 1559 (rev 04) 00: 86 80 59 15 00 00 10 00 04 00 00 02 00 00 00 00 10: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 54 20 30: 00 00 00 00 c8 00 00 00 00 00 00 00 05 01 00 00 Inside my DomU I added code to print the PCI configuration registers and what I get after restarting the DomU is: (d18) 14:57:04.042 src/e1000e.c@00150: 00: 86 80 59 15 00 00 10 00 04 00 00 02 00 00 00 00 (d18) 14:57:04.042 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 00 00 00 00 (d18) 14:57:04.042 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 54 20 (d18) 14:57:04.043 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 00 14 01 00 00 (d18) 14:57:04.043 src/e1000e.c@00324: Enable PCI Memory Access (d18) 14:57:05.043 src/e1000e.c@00150: 00: 86 80 59 15 03 00 10 00 04 00 00 02 00 00 00 00 (d18) 14:57:05.044 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 00 00 00 00 (d18) 14:57:05.044 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 54 20 (d18) 14:57:05.045 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 00 14 01 00 00 As you can see the pci configuration read from the pci-back driver by my DomU is different to the data in the Dom0 pci configuration! Just before leaving my DomU I disable the pci memory access and this is what I see (d18) 15:01:02.051 src/e1000e.c@00150: 00: 86 80 59 15 03 00 10 00 04 00 00 02 00 00 00 00 (d18) 15:01:02.051 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 00 00 00 00 (d18) 15:01:02.051 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 54 20 (d18) 15:01:02.052 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 00 14 01 00 00 (d18) 15:01:02.052 src/e1000e.c@00541: Disable PCI Memory Access (d18) 15:01:02.052 src/e1000e.c@00150: 00: 86 80 59 15 00 00 10 00 04 00 00 02 00 00 00 00 (d18) 15:01:02.052 src/e1000e.c@00150: 10: 00 00 d0 f7 00 c0 d3 f7 81 f0 00 00 00 00 00 00 (d18) 15:01:02.052 src/e1000e.c@00150: 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 54 20 (d18) 15:01:02.053 src/e1000e.c@00150: 30: 00 00 00 00 c8 00 00 00 00 00 00 00 14 01 00 00 As you can see the data is consistent with just writing to the pci control register. This is the output from the debug version of the xen-pciback module. [ 5429.351231] pciback :00:19.0: enabling device ( - 0003) [ 5429.351367] xen: registering gsi 20 triggering 0 polarity 1 [ 5429.351373] Already setup the GSI :20 [ 5429.351387] pciback :00:19.0: xen-pciback[:00:19.0]: #20 on disable- enable [ 5429.351436] pciback :00:19.0: xen-pciback[:00:19.0]: #20 on enabled [ 5434.360078] pciback :00:19.0: xen-pciback[:00:19.0]: #20 off enable- disable [ 5434.360116] pciback :00:19.0: xen-pciback[:00:19.0]: #0 off disabled [ 5434.361491] xen-pciback pci-20-0: fe state changed 5 [ 5434.362473] xen-pciback pci-20-0: fe state changed 6 [ 5434.363540] xen-pciback pci-20-0: fe state changed 0 [ 5434.363544] xen-pciback pci-20-0: frontend is gone! unregister device [ 5434.467359] pciback :00:19.0: resetting virtual configuration space [ 5434.467376] pciback :00:19.0: free-ing dynamically allocated virtual configuration space fields Does this make any sense to you? -- Best regards, Simonmailto:furryfutt...@gmail.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 0/5 v2 for-4.5] xen: arm: xgene bug fixes + support for McDivitt
These patches: * fix up an off by one bug in the xgene mapping of additional PCI bus resources, which would cause an additional extra page to be mapped * correct the size of the mapped regions to match the docs * adds support for the other 4 PCI buses on the chip, which enables mcdivitt and presumably most other Xgene based platforms which uses PCI buses other than pcie0. * adds earlyprintk for the mcdivitt platform They can also be found at: git://xenbits.xen.org/people/ianc/xen.git mcdivitt-v2 McDivitt is the X-Gene based HP Moonshot cartridge (McDivitt is the code name, I think the product is called m400, not quite sure). Other than the bug fixes I'd like to see the mcdivitt support (specifically the other 4 PCI buses one) in 4.5 because Moonshot is an interesting and exciting platform for arm64. It is also being used for ongoing work on Xen on ARM on Openstack in Linaro. The earlyprintk patch is totally harmless unless it's explicitly enabled at compile time, IMHO if we are taking the rest we may as well throw it in... The risk here is that we break the existing support for the Mustang platform, which would be the most likely failure case for the second patch. I've tested these on a Mustang, including firing up a PCI NIC device. The new mappings are a superset of the existing ones so the potential for breakage should be quite small. I've also successfully tested on a McDivitt. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 for-4.5 1/5] xen: arm: Add earlyprintk for McDivitt.
Signed-off-by: Ian Campbell ian.campb...@citrix.com --- v2: Remove pointless/unused baud rate setting. A bunch of other entries have these, but cleaning them up is out of scope here I think. --- xen/arch/arm/Rules.mk |5 + 1 file changed, 5 insertions(+) diff --git a/xen/arch/arm/Rules.mk b/xen/arch/arm/Rules.mk index 572d854..30c7823 100644 --- a/xen/arch/arm/Rules.mk +++ b/xen/arch/arm/Rules.mk @@ -95,6 +95,11 @@ EARLY_PRINTK_BAUD := 115200 EARLY_UART_BASE_ADDRESS := 0x1c02 EARLY_UART_REG_SHIFT := 2 endif +ifeq ($(CONFIG_EARLY_PRINTK), xgene-mcdivitt) +EARLY_PRINTK_INC := 8250 +EARLY_UART_BASE_ADDRESS := 0x1c021000 +EARLY_UART_REG_SHIFT := 2 +endif ifeq ($(CONFIG_EARLY_PRINTK), juno) EARLY_PRINTK_INC := pl011 EARLY_PRINTK_BAUD := 115200 -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 for-4.5 2/5] xen: arm: Drop EARLY_PRINTK_BAUD from entries which don't set ..._INIT_UART
EARLY_PRINTK_BAUD doesn't do anything unless EARLY_PRINTK_INIT_UART is set. Furthermore only the pl011 driver implements the init routine at all, so the entries which use 8250 and specified a BAUD were doubly wrong. Signed-off-by: Ian Campbell ian.campb...@citrix.com --- v2: New patch. --- xen/arch/arm/Rules.mk |7 --- 1 file changed, 7 deletions(-) diff --git a/xen/arch/arm/Rules.mk b/xen/arch/arm/Rules.mk index 30c7823..4ee51a9 100644 --- a/xen/arch/arm/Rules.mk +++ b/xen/arch/arm/Rules.mk @@ -45,7 +45,6 @@ ifeq ($(debug),y) # Early printk for versatile express ifeq ($(CONFIG_EARLY_PRINTK), vexpress) EARLY_PRINTK_INC := pl011 -EARLY_PRINTK_BAUD := 38400 EARLY_UART_BASE_ADDRESS := 0x1c09 endif ifeq ($(CONFIG_EARLY_PRINTK), fastmodel) @@ -56,12 +55,10 @@ EARLY_UART_BASE_ADDRESS := 0x1c09 endif ifeq ($(CONFIG_EARLY_PRINTK), exynos5250) EARLY_PRINTK_INC := exynos4210 -EARLY_PRINTK_BAUD := 115200 EARLY_UART_BASE_ADDRESS := 0x12c2 endif ifeq ($(CONFIG_EARLY_PRINTK), midway) EARLY_PRINTK_INC := pl011 -EARLY_PRINTK_BAUD := 115200 EARLY_UART_BASE_ADDRESS := 0xfff36000 endif ifeq ($(CONFIG_EARLY_PRINTK), omap5432) @@ -91,7 +88,6 @@ EARLY_UART_REG_SHIFT := 2 endif ifeq ($(CONFIG_EARLY_PRINTK), xgene-storm) EARLY_PRINTK_INC := 8250 -EARLY_PRINTK_BAUD := 115200 EARLY_UART_BASE_ADDRESS := 0x1c02 EARLY_UART_REG_SHIFT := 2 endif @@ -102,18 +98,15 @@ EARLY_UART_REG_SHIFT := 2 endif ifeq ($(CONFIG_EARLY_PRINTK), juno) EARLY_PRINTK_INC := pl011 -EARLY_PRINTK_BAUD := 115200 EARLY_UART_BASE_ADDRESS := 0x7ff8 endif ifeq ($(CONFIG_EARLY_PRINTK), hip04-d01) EARLY_PRINTK_INC := 8250 -EARLY_PRINTK_BAUD := 115200 EARLY_UART_BASE_ADDRESS := 0xE4007000 EARLY_UART_REG_SHIFT := 2 endif ifeq ($(CONFIG_EARLY_PRINTK), seattle) EARLY_PRINTK_INC := pl011 -EARLY_PRINTK_BAUD := 115200 EARLY_UART_BASE_ADDRESS := 0xe101 endif -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 for-4.5 4/5] xen: arm: correct specific mappings for PCIE0 on X-Gene
The region assigned to PCIE0, according to the docs, is 0x0e0 to 0x100. They make no distinction between PCI CFG and PCI IO mem within this range (in fact, I'm not sure that isn't up to the driver). Signed-off-by: Ian Campbell ian.campb...@citrix.com Reviewed-by: Julien Grall julien.gr...@linaro.org --- xen/arch/arm/platforms/xgene-storm.c | 18 ++ 1 file changed, 2 insertions(+), 16 deletions(-) diff --git a/xen/arch/arm/platforms/xgene-storm.c b/xen/arch/arm/platforms/xgene-storm.c index 8685c93..8c27f24 100644 --- a/xen/arch/arm/platforms/xgene-storm.c +++ b/xen/arch/arm/platforms/xgene-storm.c @@ -89,22 +89,8 @@ static int xgene_storm_specific_mapping(struct domain *d) int ret; /* Map the PCIe bus resources */ -ret = map_one_mmio(d, PCI MEM REGION, paddr_to_pfn(0xe0UL), -paddr_to_pfn(0xe01000UL)); -if ( ret ) -goto err; - -ret = map_one_mmio(d, PCI IO REGION, paddr_to_pfn(0xe08000UL), - paddr_to_pfn(0xe08001UL)); -if ( ret ) -goto err; - -ret = map_one_mmio(d, PCI CFG REGION, paddr_to_pfn(0xe0d000UL), -paddr_to_pfn(0xe0d020UL)); -if ( ret ) -goto err; -ret = map_one_mmio(d, PCI MSI REGION, paddr_to_pfn(0xe01000UL), -paddr_to_pfn(0xe01080UL)); +ret = map_one_mmio(d, PCI MEMORY, paddr_to_pfn(0x0e0UL), +paddr_to_pfn(0x010UL)); if ( ret ) goto err; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) -GICH[GICH_HCR] |= GICH_HCR_UIE; +GICH[GICH_HCR] |= GICH_HCR_NPIE; else -GICH[GICH_HCR] = ~GICH_HCR_UIE; +GICH[GICH_HCR] = ~GICH_HCR_NPIE; } Yes, exactly I tried, hang still occurs with this change We need to figure out why during the hang you still have all the LRs busy even if you are getting maintenance interrupts that should cause them to be cleared. I see that I have free LRs during maintenance interrupt (XEN) gic.c:871:d0v0 maintenance interrupt (XEN) GICH_LRs (vcpu 0) mask=0 (XEN)HW_LR[0]=9a015856 (XEN)HW_LR[1]=0 (XEN)HW_LR[2]=0 (XEN)HW_LR[3]=0 (XEN) Inflight irq=86 lr=0 (XEN) Inflight irq=2 lr=255 (XEN) Pending irq=2 But I see that after I got hang - maintenance interrupts are generated continuously. Platform continues printing the same log till reboot. Exactly the same log? As in the one above you just pasted? That is very very suspicious. I am thinking that we are not handling GICH_HCR_UIE correctly and something we do in Xen, maybe writing to an LR register, might trigger a new maintenance interrupt immediately causing an infinite loop. Could you please try this patch? It disable GICH_HCR_UIE immediately on hypervisor entry. diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 4d2a92d..6ae8dc4 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +GICH[GICH_HCR] = ~GICH_HCR_UIE; + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -821,12 +823,8 @@ void gic_inject(void) gic_restore_pending_irqs(current); - if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) GICH[GICH_HCR] |= GICH_HCR_UIE; -else -GICH[GICH_HCR] = ~GICH_HCR_UIE; - } static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)
Il 19/11/2014 15:56, Don Slutz ha scritto: I think I know what is happening here. But you are pointing at the wrong change. commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4 Is what I am guessing at this time is the issue. I think that xen_enabled() is returning false in pc_machine_initfn. Where as in pc_init1 is is returning true. I am thinking that: diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 7bb97a4..3268c29 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = { .desc = Xen Fully-virtualized PC, .init = pc_xen_hvm_init, .max_cpus = HVM_MAX_VCPUS, -.default_machine_opts = accel=xen, +.default_machine_opts = accel=xen,vmport=off, .hot_add_cpu = pc_hot_add_cpu, }; #endif Will fix your issue. I have not tested this yet. Tested now and it solves regression of linux hvm domUs with qemu 2.2, thanks. I think that I'm not the only with this regression and that this patch (or a fix to the cause in vmport) should be applied before qemu 2.2 final. -Don Slutz On 11/19/14 09:04, Fabio Fantoni wrote: Il 14/11/2014 12:25, Fabio Fantoni ha scritto: dom0 xen-unstable from staging git with x86/hvm: Extend HVM cpuid leaf with vcpu id and x86/hvm: Add per-vcpu evtchn upcalls patches, and qemu 2.2 from spice git (spice/next commit e779fa0a715530311e6f59fc8adb0f6eca914a89): https://github.com/Fantu/Xen/commits/rebase/m2r-staging I tried with qemu tag v2.2.0-rc2 and crash still happen, here the full backtrace of latest test: Program received signal SIGSEGV, Segmentation fault. 0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73 73 eax = env-regs[R_EAX]; (gdb) bt full #0 0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73 s = 0x564443a0 cs = 0x0 cpu = 0x0 __func__ = vmport_ioport_read env = 0x8250 command = 0 '\000' eax = 0 #1 0x55655fc4 in memory_region_read_accessor (mr=0x5628, addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410 tmp = 0 #2 0x556562b7 in access_with_adjusted_size (addr=0, value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4, access=0x55655f62 memory_region_read_accessor, mr=0x5628) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480 access_mask = 4294967295 access_size = 4 i = 0 #3 0x556590e9 in memory_region_dispatch_read1 (mr=0x5628, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077 data = 0 #4 0x556591b1 in memory_region_dispatch_read (mr=0x5628, addr=0, pval=0x7fffd9a8, size=4) ---Type return to continue, or q return to quit--- at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099 No locals. #5 0x5565cbbc in io_mem_read (mr=0x5628, addr=0, pval=0x7fffd9a8, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1962 No locals. #6 0x5560a1ca in address_space_rw (as=0x55eaf920, addr=22104, buf=0x7fffda50 \377\377\377\377, len=4, is_write=false) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2167 l = 4 ptr = 0x55a92d87 %s/%d:\n val = 7852232130387826944 addr1 = 0 mr = 0x5628 error = false #7 0x5560a38f in address_space_read (as=0x55eaf920, addr=22104, buf=0x7fffda50 \377\377\377\377, len=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2205 No locals. #8 0x5564fd4b in cpu_inl (addr=22104) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/ioport.c:117 buf = \377\377\377\377 val = 21845 #9 0x55670c73 in do_inp (addr=22104, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:684 ---Type return to continue, or q return to quit--- No locals. #10 0x55670ee0 in cpu_ioreq_pio (req=0x77ff3020) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:747 i = 1 #11 0x556714b3 in handle_ioreq (state=0x563c2510, req=0x77ff3020) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:853 No locals. #12 0x55671826 in cpu_handle_ioreq (opaque=0x563c2510) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:931 state = 0x563c2510 req = 0x77ff3020 #13 0x5596e240 in qemu_iohandler_poll (pollfds=0x56389a30, ret=1) at iohandler.c:143 revents = 1 pioh = 0x563f7610 ioh = 0x56450a40 #14 0x5596de1c in main_loop_wait (nonblocking=0) at main-loop.c:495 ret = 1 timeout = 4294967295 timeout_ns = 3965432 #15 0x55756d3f in main_loop () at vl.c:1882 nonblocking = false
Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)
On Wed, 19 Nov 2014, Fabio Fantoni wrote: Il 19/11/2014 15:56, Don Slutz ha scritto: I think I know what is happening here. But you are pointing at the wrong change. commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4 Is what I am guessing at this time is the issue. I think that xen_enabled() is returning false in pc_machine_initfn. Where as in pc_init1 is is returning true. I am thinking that: diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 7bb97a4..3268c29 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = { .desc = Xen Fully-virtualized PC, .init = pc_xen_hvm_init, .max_cpus = HVM_MAX_VCPUS, -.default_machine_opts = accel=xen, +.default_machine_opts = accel=xen,vmport=off, .hot_add_cpu = pc_hot_add_cpu, }; #endif Will fix your issue. I have not tested this yet. Tested now and it solves regression of linux hvm domUs with qemu 2.2, thanks. I think that I'm not the only with this regression and that this patch (or a fix to the cause in vmport) should be applied before qemu 2.2 final. Don, please submit a proper patch with a Signed-off-by. Thanks! - Stefano -Don Slutz On 11/19/14 09:04, Fabio Fantoni wrote: Il 14/11/2014 12:25, Fabio Fantoni ha scritto: dom0 xen-unstable from staging git with x86/hvm: Extend HVM cpuid leaf with vcpu id and x86/hvm: Add per-vcpu evtchn upcalls patches, and qemu 2.2 from spice git (spice/next commit e779fa0a715530311e6f59fc8adb0f6eca914a89): https://github.com/Fantu/Xen/commits/rebase/m2r-staging I tried with qemu tag v2.2.0-rc2 and crash still happen, here the full backtrace of latest test: Program received signal SIGSEGV, Segmentation fault. 0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73 73 eax = env-regs[R_EAX]; (gdb) bt full #0 0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73 s = 0x564443a0 cs = 0x0 cpu = 0x0 __func__ = vmport_ioport_read env = 0x8250 command = 0 '\000' eax = 0 #1 0x55655fc4 in memory_region_read_accessor (mr=0x5628, addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410 tmp = 0 #2 0x556562b7 in access_with_adjusted_size (addr=0, value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4, access=0x55655f62 memory_region_read_accessor, mr=0x5628) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480 access_mask = 4294967295 access_size = 4 i = 0 #3 0x556590e9 in memory_region_dispatch_read1 (mr=0x5628, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077 data = 0 #4 0x556591b1 in memory_region_dispatch_read (mr=0x5628, addr=0, pval=0x7fffd9a8, size=4) ---Type return to continue, or q return to quit--- at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099 No locals. #5 0x5565cbbc in io_mem_read (mr=0x5628, addr=0, pval=0x7fffd9a8, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1962 No locals. #6 0x5560a1ca in address_space_rw (as=0x55eaf920, addr=22104, buf=0x7fffda50 \377\377\377\377, len=4, is_write=false) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2167 l = 4 ptr = 0x55a92d87 %s/%d:\n val = 7852232130387826944 addr1 = 0 mr = 0x5628 error = false #7 0x5560a38f in address_space_read (as=0x55eaf920, addr=22104, buf=0x7fffda50 \377\377\377\377, len=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/exec.c:2205 No locals. #8 0x5564fd4b in cpu_inl (addr=22104) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/ioport.c:117 buf = \377\377\377\377 val = 21845 #9 0x55670c73 in do_inp (addr=22104, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:684 ---Type return to continue, or q return to quit--- No locals. #10 0x55670ee0 in cpu_ioreq_pio (req=0x77ff3020) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:747 i = 1 #11 0x556714b3 in handle_ioreq (state=0x563c2510, req=0x77ff3020) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:853 No locals. #12 0x55671826 in cpu_handle_ioreq (opaque=0x563c2510) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/xen-hvm.c:931
[Xen-devel] [PATCHv3 0/4]: dma, x86, xen: reduce SWIOTLB usage in Xen guests
On systems where DMA addresses and physical addresses are not 1:1 (such as Xen PV guests), the generic dma_get_required_mask() will not return the correct mask (since it uses max_pfn). Some device drivers (such as mptsas, mpt2sas) use dma_get_required_mask() to set the device's DMA mask to allow them to use only 32-bit DMA addresses in hardware structures. This results in unnecessary use of the SWIOTLB if DMA addresses are more than 32-bits, impacting performance significantly. This series allows Xen PV guests to override the default dma_get_required_mask() with one that calculates the DMA mask from the maximum MFN (and not the PFN). Changes in v3: - fix off-by-one in xen_dma_get_required_mask() - split ia64 changes into separate patch. Changes in v2: - split x86 and xen changes into separate patches David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 4/4] x86/xen: use the maximum MFN to calculate the required DMA mask
On a Xen PV guest the DMA addresses and physical addresses are not 1:1 (such as Xen PV guests) and the generic dma_get_required_mask() does not return the correct mask (since it uses max_pfn). Some device drivers (such as mptsas, mpt2sas) use dma_get_required_mask() to set the device's DMA mask to allow them to use only 32-bit DMA addresses in hardware structures. This results in unnecessary use of the SWIOTLB if DMA addresses are more than 32-bits, impacting performance significantly. Provide a get_required_mask op that uses the maximum MFN to calculate the DMA mask. Signed-off-by: David Vrabel david.vra...@citrix.com --- arch/x86/xen/pci-swiotlb-xen.c |1 + drivers/xen/swiotlb-xen.c | 13 + include/xen/swiotlb-xen.h |4 3 files changed, 18 insertions(+) diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c index 0e98e5d..a5d180a 100644 --- a/arch/x86/xen/pci-swiotlb-xen.c +++ b/arch/x86/xen/pci-swiotlb-xen.c @@ -31,6 +31,7 @@ static struct dma_map_ops xen_swiotlb_dma_ops = { .map_page = xen_swiotlb_map_page, .unmap_page = xen_swiotlb_unmap_page, .dma_supported = xen_swiotlb_dma_supported, + .get_required_mask = xen_swiotlb_get_required_mask, }; /* diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index ebd8f21..654587d 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -42,9 +42,11 @@ #include xen/page.h #include xen/xen-ops.h #include xen/hvc-console.h +#include xen/interface/memory.h #include asm/dma-mapping.h #include asm/xen/page-coherent.h +#include asm/xen/hypercall.h #include trace/events/swiotlb.h /* @@ -683,3 +685,14 @@ xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask) return 0; } EXPORT_SYMBOL_GPL(xen_swiotlb_set_dma_mask); + +u64 +xen_swiotlb_get_required_mask(struct device *dev) +{ + unsigned long max_mfn; + + max_mfn = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL); + + return DMA_BIT_MASK(fls_long(max_mfn - 1) + PAGE_SHIFT); +} +EXPORT_SYMBOL_GPL(xen_swiotlb_get_required_mask); diff --git a/include/xen/swiotlb-xen.h b/include/xen/swiotlb-xen.h index 8b2eb93..640 100644 --- a/include/xen/swiotlb-xen.h +++ b/include/xen/swiotlb-xen.h @@ -58,4 +58,8 @@ xen_swiotlb_dma_supported(struct device *hwdev, u64 mask); extern int xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask); + +extern u64 +xen_swiotlb_get_required_mask(struct device *dev); + #endif /* __LINUX_SWIOTLB_XEN_H */ -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) -GICH[GICH_HCR] |= GICH_HCR_UIE; +GICH[GICH_HCR] |= GICH_HCR_NPIE; else -GICH[GICH_HCR] = ~GICH_HCR_UIE; +GICH[GICH_HCR] = ~GICH_HCR_NPIE; } Yes, exactly I tried, hang still occurs with this change We need to figure out why during the hang you still have all the LRs busy even if you are getting maintenance interrupts that should cause them to be cleared. I see that I have free LRs during maintenance interrupt (XEN) gic.c:871:d0v0 maintenance interrupt (XEN) GICH_LRs (vcpu 0) mask=0 (XEN)HW_LR[0]=9a015856 (XEN)HW_LR[1]=0 (XEN)HW_LR[2]=0 (XEN)HW_LR[3]=0 (XEN) Inflight irq=86 lr=0 (XEN) Inflight irq=2 lr=255 (XEN) Pending irq=2 But I see that after I got hang - maintenance interrupts are generated continuously. Platform continues printing the same log till reboot. Exactly the same log? As in the one above you just pasted? That is very very suspicious. Yes exactly the same log. And looks like it means that LRs are flushed correctly. I am thinking that we are not handling GICH_HCR_UIE correctly and something we do in Xen, maybe writing to an LR register, might trigger a new maintenance interrupt immediately causing an infinite loop. Yes, this is what I'm thinking about. Taking in account all collected debug info it looks like once LRs are overloaded with SGIs - maintenance interrupt occurs. And then it is not handled properly, and occurs again and again - so platform hangs inside its handler. Could you please try this patch? It disable GICH_HCR_UIE immediately on hypervisor entry. Now trying. diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 4d2a92d..6ae8dc4 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +GICH[GICH_HCR] = ~GICH_HCR_UIE; + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -821,12 +823,8 @@ void gic_inject(void) gic_restore_pending_irqs(current); - if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) GICH[GICH_HCR] |= GICH_HCR_UIE; -else -GICH[GICH_HCR] = ~GICH_HCR_UIE; - } static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 1/4] dma: add dma_get_required_mask_from_max_pfn()
A generic dma_get_required_mask() is useful even for architectures (such as ia64) that define ARCH_HAS_GET_REQUIRED_MASK. Signed-off-by: David Vrabel david.vra...@citrix.com Reviewed-by: Stefano Stabellini stefano.stabell...@eu.citrix.com --- drivers/base/platform.c | 10 -- include/linux/dma-mapping.h |1 + 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/base/platform.c b/drivers/base/platform.c index b2afc29..f9f3930 100644 --- a/drivers/base/platform.c +++ b/drivers/base/platform.c @@ -1009,8 +1009,7 @@ int __init platform_bus_init(void) return error; } -#ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK -u64 dma_get_required_mask(struct device *dev) +u64 dma_get_required_mask_from_max_pfn(struct device *dev) { u32 low_totalram = ((max_pfn - 1) PAGE_SHIFT); u32 high_totalram = ((max_pfn - 1) (32 - PAGE_SHIFT)); @@ -1028,6 +1027,13 @@ u64 dma_get_required_mask(struct device *dev) } return mask; } +EXPORT_SYMBOL_GPL(dma_get_required_mask_from_max_pfn); + +#ifndef ARCH_HAS_DMA_GET_REQUIRED_MASK +u64 dma_get_required_mask(struct device *dev) +{ + return dma_get_required_mask_from_max_pfn(dev); +} EXPORT_SYMBOL_GPL(dma_get_required_mask); #endif diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index d5d3881..6e2fdfc 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -127,6 +127,7 @@ static inline int dma_coerce_mask_and_coherent(struct device *dev, u64 mask) return dma_set_mask_and_coherent(dev, mask); } +extern u64 dma_get_required_mask_from_max_pfn(struct device *dev); extern u64 dma_get_required_mask(struct device *dev); #ifndef set_arch_dma_coherent_ops -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 3/4] x86: allow dma_get_required_mask() to be overridden
Use dma_ops-get_required_mask() if provided, defaulting to dma_get_requried_mask_from_max_pfn(). This is needed on systems (such as Xen PV guests) where the DMA address and the physical address are not equal. ARCH_HAS_DMA_GET_REQUIRED_MASK is defined in asm/device.h instead of asm/dma-mapping.h because linux/dma-mapping.h uses the define before including asm/dma-mapping.h Signed-off-by: David Vrabel david.vra...@citrix.com Reviewed-by: Stefano Stabellini stefano.stabell...@eu.citrix.com --- arch/x86/include/asm/device.h |2 ++ arch/x86/kernel/pci-dma.c |8 2 files changed, 10 insertions(+) diff --git a/arch/x86/include/asm/device.h b/arch/x86/include/asm/device.h index 03dd729..10bc628 100644 --- a/arch/x86/include/asm/device.h +++ b/arch/x86/include/asm/device.h @@ -13,4 +13,6 @@ struct dev_archdata { struct pdev_archdata { }; +#define ARCH_HAS_DMA_GET_REQUIRED_MASK + #endif /* _ASM_X86_DEVICE_H */ diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c index a25e202..5154400 100644 --- a/arch/x86/kernel/pci-dma.c +++ b/arch/x86/kernel/pci-dma.c @@ -140,6 +140,14 @@ void dma_generic_free_coherent(struct device *dev, size_t size, void *vaddr, free_pages((unsigned long)vaddr, get_order(size)); } +u64 dma_get_required_mask(struct device *dev) +{ + if (dma_ops-get_required_mask) + return dma_ops-get_required_mask(dev); + return dma_get_required_mask_from_max_pfn(dev); +} +EXPORT_SYMBOL_GPL(dma_get_required_mask); + /* * See Documentation/x86/x86_64/boot-options.txt for the iommu kernel * parameter documentation. -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/4] ia64: use common dma_get_required_mask_from_pfn()
Signed-off-by: David Vrabel david.vra...@citrix.com Cc: Tony Luck tony.l...@intel.com Cc: Fenghua Yu fenghua...@intel.com Cc: linux-i...@vger.kernel.org --- arch/ia64/include/asm/machvec.h |2 +- arch/ia64/include/asm/machvec_init.h |1 - arch/ia64/pci/pci.c | 20 3 files changed, 1 insertion(+), 22 deletions(-) diff --git a/arch/ia64/include/asm/machvec.h b/arch/ia64/include/asm/machvec.h index 9c39bdf..beaa47d 100644 --- a/arch/ia64/include/asm/machvec.h +++ b/arch/ia64/include/asm/machvec.h @@ -287,7 +287,7 @@ extern struct dma_map_ops *dma_get_ops(struct device *); # define platform_dma_get_ops dma_get_ops #endif #ifndef platform_dma_get_required_mask -# define platform_dma_get_required_mask ia64_dma_get_required_mask +# define platform_dma_get_required_mask dma_get_required_mask_from_max_pfn #endif #ifndef platform_irq_to_vector # define platform_irq_to_vector__ia64_irq_to_vector diff --git a/arch/ia64/include/asm/machvec_init.h b/arch/ia64/include/asm/machvec_init.h index 37a4698..ef964b2 100644 --- a/arch/ia64/include/asm/machvec_init.h +++ b/arch/ia64/include/asm/machvec_init.h @@ -3,7 +3,6 @@ extern ia64_mv_send_ipi_t ia64_send_ipi; extern ia64_mv_global_tlb_purge_t ia64_global_tlb_purge; -extern ia64_mv_dma_get_required_mask ia64_dma_get_required_mask; extern ia64_mv_irq_to_vector __ia64_irq_to_vector; extern ia64_mv_local_vector_to_irq __ia64_local_vector_to_irq; extern ia64_mv_pci_get_legacy_mem_t ia64_pci_get_legacy_mem; diff --git a/arch/ia64/pci/pci.c b/arch/ia64/pci/pci.c index 291a582..79da21b 100644 --- a/arch/ia64/pci/pci.c +++ b/arch/ia64/pci/pci.c @@ -791,26 +791,6 @@ static void __init set_pci_dfl_cacheline_size(void) pci_dfl_cache_line_size = (1 cci.pcci_line_size) / 4; } -u64 ia64_dma_get_required_mask(struct device *dev) -{ - u32 low_totalram = ((max_pfn - 1) PAGE_SHIFT); - u32 high_totalram = ((max_pfn - 1) (32 - PAGE_SHIFT)); - u64 mask; - - if (!high_totalram) { - /* convert to mask just covering totalram */ - low_totalram = (1 (fls(low_totalram) - 1)); - low_totalram += low_totalram - 1; - mask = low_totalram; - } else { - high_totalram = (1 (fls(high_totalram) - 1)); - high_totalram += high_totalram - 1; - mask = (((u64)high_totalram) 32) + 0x; - } - return mask; -} -EXPORT_SYMBOL_GPL(ia64_dma_get_required_mask); - u64 dma_get_required_mask(struct device *dev) { return platform_dma_get_required_mask(dev); -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi andrii.tseglyts...@globallogic.com wrote: On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) -GICH[GICH_HCR] |= GICH_HCR_UIE; +GICH[GICH_HCR] |= GICH_HCR_NPIE; else -GICH[GICH_HCR] = ~GICH_HCR_UIE; +GICH[GICH_HCR] = ~GICH_HCR_NPIE; } Yes, exactly I tried, hang still occurs with this change We need to figure out why during the hang you still have all the LRs busy even if you are getting maintenance interrupts that should cause them to be cleared. I see that I have free LRs during maintenance interrupt (XEN) gic.c:871:d0v0 maintenance interrupt (XEN) GICH_LRs (vcpu 0) mask=0 (XEN)HW_LR[0]=9a015856 (XEN)HW_LR[1]=0 (XEN)HW_LR[2]=0 (XEN)HW_LR[3]=0 (XEN) Inflight irq=86 lr=0 (XEN) Inflight irq=2 lr=255 (XEN) Pending irq=2 But I see that after I got hang - maintenance interrupts are generated continuously. Platform continues printing the same log till reboot. Exactly the same log? As in the one above you just pasted? That is very very suspicious. Yes exactly the same log. And looks like it means that LRs are flushed correctly. I am thinking that we are not handling GICH_HCR_UIE correctly and something we do in Xen, maybe writing to an LR register, might trigger a new maintenance interrupt immediately causing an infinite loop. Yes, this is what I'm thinking about. Taking in account all collected debug info it looks like once LRs are overloaded with SGIs - maintenance interrupt occurs. And then it is not handled properly, and occurs again and again - so platform hangs inside its handler. Could you please try this patch? It disable GICH_HCR_UIE immediately on hypervisor entry. Now trying. diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 4d2a92d..6ae8dc4 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +GICH[GICH_HCR] = ~GICH_HCR_UIE; + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -821,12 +823,8 @@ void gic_inject(void) gic_restore_pending_irqs(current); - if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) GICH[GICH_HCR] |= GICH_HCR_UIE; -else -GICH[GICH_HCR] = ~GICH_HCR_UIE; - } static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) Heh - I don't see hangs with this patch :) But also I see that maintenance interrupt doesn't occur (and no hang as result) Stefano - is this expected? No maintenance interrupts at all? That's strange. You should be receiving them when LRs are full and you still have interrupts pending to be added to them. You could add another printk here to see if you should be receiving them: if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) +{ +gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n); GICH[GICH_HCR] |= GICH_HCR_UIE; -else -GICH[GICH_HCR] = ~GICH_HCR_UIE; - +} } -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi andrii.tseglyts...@globallogic.com wrote: On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) -GICH[GICH_HCR] |= GICH_HCR_UIE; +GICH[GICH_HCR] |= GICH_HCR_NPIE; else -GICH[GICH_HCR] = ~GICH_HCR_UIE; +GICH[GICH_HCR] = ~GICH_HCR_NPIE; } Yes, exactly I tried, hang still occurs with this change We need to figure out why during the hang you still have all the LRs busy even if you are getting maintenance interrupts that should cause them to be cleared. I see that I have free LRs during maintenance interrupt (XEN) gic.c:871:d0v0 maintenance interrupt (XEN) GICH_LRs (vcpu 0) mask=0 (XEN)HW_LR[0]=9a015856 (XEN)HW_LR[1]=0 (XEN)HW_LR[2]=0 (XEN)HW_LR[3]=0 (XEN) Inflight irq=86 lr=0 (XEN) Inflight irq=2 lr=255 (XEN) Pending irq=2 But I see that after I got hang - maintenance interrupts are generated continuously. Platform continues printing the same log till reboot. Exactly the same log? As in the one above you just pasted? That is very very suspicious. Yes exactly the same log. And looks like it means that LRs are flushed correctly. I am thinking that we are not handling GICH_HCR_UIE correctly and something we do in Xen, maybe writing to an LR register, might trigger a new maintenance interrupt immediately causing an infinite loop. Yes, this is what I'm thinking about. Taking in account all collected debug info it looks like once LRs are overloaded with SGIs - maintenance interrupt occurs. And then it is not handled properly, and occurs again and again - so platform hangs inside its handler. Could you please try this patch? It disable GICH_HCR_UIE immediately on hypervisor entry. Now trying. diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 4d2a92d..6ae8dc4 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +GICH[GICH_HCR] = ~GICH_HCR_UIE; + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -821,12 +823,8 @@ void gic_inject(void) gic_restore_pending_irqs(current); - if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) GICH[GICH_HCR] |= GICH_HCR_UIE; -else -GICH[GICH_HCR] = ~GICH_HCR_UIE; - } static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) Heh - I don't see hangs with this patch :) But also I see that maintenance interrupt doesn't occur (and no hang as result) Stefano - is this expected? No maintenance interrupts at all? That's strange. You should be receiving them when LRs are full and you still have interrupts pending to be added to them. You could add another printk here to see if you should be receiving them: if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) +{ +gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n); GICH[GICH_HCR] |= GICH_HCR_UIE; -else -GICH[GICH_HCR] = ~GICH_HCR_UIE; - +} } Requested properly: (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt But does not occur -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
Gic dump during interrupt requesting: (XEN) GICH_LRs (vcpu 0) mask=f (XEN)HW_LR[0]=3a1f (XEN)HW_LR[1]=9a015856 (XEN)HW_LR[2]=1a1b (XEN)HW_LR[3]=9a00e439 (XEN) Inflight irq=31 lr=0 (XEN) Inflight irq=86 lr=1 (XEN) Inflight irq=27 lr=2 (XEN) Inflight irq=57 lr=3 (XEN) Inflight irq=2 lr=255 (XEN) Pending irq=2 On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi andrii.tseglyts...@globallogic.com wrote: On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi andrii.tseglyts...@globallogic.com wrote: On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) -GICH[GICH_HCR] |= GICH_HCR_UIE; +GICH[GICH_HCR] |= GICH_HCR_NPIE; else -GICH[GICH_HCR] = ~GICH_HCR_UIE; +GICH[GICH_HCR] = ~GICH_HCR_NPIE; } Yes, exactly I tried, hang still occurs with this change We need to figure out why during the hang you still have all the LRs busy even if you are getting maintenance interrupts that should cause them to be cleared. I see that I have free LRs during maintenance interrupt (XEN) gic.c:871:d0v0 maintenance interrupt (XEN) GICH_LRs (vcpu 0) mask=0 (XEN)HW_LR[0]=9a015856 (XEN)HW_LR[1]=0 (XEN)HW_LR[2]=0 (XEN)HW_LR[3]=0 (XEN) Inflight irq=86 lr=0 (XEN) Inflight irq=2 lr=255 (XEN) Pending irq=2 But I see that after I got hang - maintenance interrupts are generated continuously. Platform continues printing the same log till reboot. Exactly the same log? As in the one above you just pasted? That is very very suspicious. Yes exactly the same log. And looks like it means that LRs are flushed correctly. I am thinking that we are not handling GICH_HCR_UIE correctly and something we do in Xen, maybe writing to an LR register, might trigger a new maintenance interrupt immediately causing an infinite loop. Yes, this is what I'm thinking about. Taking in account all collected debug info it looks like once LRs are overloaded with SGIs - maintenance interrupt occurs. And then it is not handled properly, and occurs again and again - so platform hangs inside its handler. Could you please try this patch? It disable GICH_HCR_UIE immediately on hypervisor entry. Now trying. diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 4d2a92d..6ae8dc4 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +GICH[GICH_HCR] = ~GICH_HCR_UIE; + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -821,12 +823,8 @@ void gic_inject(void) gic_restore_pending_irqs(current); - if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) GICH[GICH_HCR] |= GICH_HCR_UIE; -else -GICH[GICH_HCR] = ~GICH_HCR_UIE; - } static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) Heh - I don't see hangs with this patch :) But also I see that maintenance interrupt doesn't occur (and no hang as result) Stefano - is this expected? No maintenance interrupts at all? That's strange. You should be receiving them when LRs are full and you still have interrupts pending to be added to them. You could add another printk here to see if you should be receiving them: if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) +{ +gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n); GICH[GICH_HCR] |= GICH_HCR_UIE; -else -GICH[GICH_HCR] = ~GICH_HCR_UIE; - +} } Requested properly: (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt But does not occur -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com
Re: [Xen-devel] Xen 4.5 random freeze question
BTW - shouldn't this flag GICH_LR_MAINTENANCE_IRQ be set after maintenance interrupt requesting ? On Wed, Nov 19, 2014 at 6:32 PM, Andrii Tseglytskyi andrii.tseglyts...@globallogic.com wrote: Gic dump during interrupt requesting: (XEN) GICH_LRs (vcpu 0) mask=f (XEN)HW_LR[0]=3a1f (XEN)HW_LR[1]=9a015856 (XEN)HW_LR[2]=1a1b (XEN)HW_LR[3]=9a00e439 (XEN) Inflight irq=31 lr=0 (XEN) Inflight irq=86 lr=1 (XEN) Inflight irq=27 lr=2 (XEN) Inflight irq=57 lr=3 (XEN) Inflight irq=2 lr=255 (XEN) Pending irq=2 On Wed, Nov 19, 2014 at 6:29 PM, Andrii Tseglytskyi andrii.tseglyts...@globallogic.com wrote: On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi andrii.tseglyts...@globallogic.com wrote: On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) -GICH[GICH_HCR] |= GICH_HCR_UIE; +GICH[GICH_HCR] |= GICH_HCR_NPIE; else -GICH[GICH_HCR] = ~GICH_HCR_UIE; +GICH[GICH_HCR] = ~GICH_HCR_NPIE; } Yes, exactly I tried, hang still occurs with this change We need to figure out why during the hang you still have all the LRs busy even if you are getting maintenance interrupts that should cause them to be cleared. I see that I have free LRs during maintenance interrupt (XEN) gic.c:871:d0v0 maintenance interrupt (XEN) GICH_LRs (vcpu 0) mask=0 (XEN)HW_LR[0]=9a015856 (XEN)HW_LR[1]=0 (XEN)HW_LR[2]=0 (XEN)HW_LR[3]=0 (XEN) Inflight irq=86 lr=0 (XEN) Inflight irq=2 lr=255 (XEN) Pending irq=2 But I see that after I got hang - maintenance interrupts are generated continuously. Platform continues printing the same log till reboot. Exactly the same log? As in the one above you just pasted? That is very very suspicious. Yes exactly the same log. And looks like it means that LRs are flushed correctly. I am thinking that we are not handling GICH_HCR_UIE correctly and something we do in Xen, maybe writing to an LR register, might trigger a new maintenance interrupt immediately causing an infinite loop. Yes, this is what I'm thinking about. Taking in account all collected debug info it looks like once LRs are overloaded with SGIs - maintenance interrupt occurs. And then it is not handled properly, and occurs again and again - so platform hangs inside its handler. Could you please try this patch? It disable GICH_HCR_UIE immediately on hypervisor entry. Now trying. diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 4d2a92d..6ae8dc4 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +GICH[GICH_HCR] = ~GICH_HCR_UIE; + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -821,12 +823,8 @@ void gic_inject(void) gic_restore_pending_irqs(current); - if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) GICH[GICH_HCR] |= GICH_HCR_UIE; -else -GICH[GICH_HCR] = ~GICH_HCR_UIE; - } static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) Heh - I don't see hangs with this patch :) But also I see that maintenance interrupt doesn't occur (and no hang as result) Stefano - is this expected? No maintenance interrupts at all? That's strange. You should be receiving them when LRs are full and you still have interrupts pending to be added to them. You could add another printk here to see if you should be receiving them: if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) +{ +gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n); GICH[GICH_HCR] |= GICH_HCR_UIE; -else -GICH[GICH_HCR] = ~GICH_HCR_UIE; - +} } Requested properly: (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt But does not occur -- Andrii Tseglytskyi | Embedded Dev GlobalLogic www.globallogic.com -- Andrii Tseglytskyi | Embedded Dev
Re: [Xen-devel] [PATCH v10 for-xen-4.5 2/2] dpci: Replace tasklet with an softirq
On Fri, Nov 14, 2014 at 11:11:46AM -0500, Konrad Rzeszutek Wilk wrote: On Fri, Nov 14, 2014 at 03:13:42PM +, Jan Beulich wrote: On 12.11.14 at 03:23, konrad.w...@oracle.com wrote: +static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) +{ +struct domain *d = pirq_dpci-dom; + +ASSERT(spin_is_locked(d-event_lock)); + +switch ( cmpxchg(pirq_dpci-state, 1 STATE_SCHED, 0) ) +{ +case (1 STATE_SCHED): +/* + * We are going to try to de-schedule the softirq before it goes in + * STATE_RUN. Whoever clears STATE_SCHED MUST refcount the 'dom'. + */ +put_domain(d); +/* fallthrough. */ Considering Sander's report, the only suspicious place I find is this one: When the STATE_SCHED flag is set, pirq_dpci is on some CPU's list. What guarantees it to get removed from that list before getting inserted on another one? None. The moment that STATE_SCHED is cleared, 'raise_softirq_for' is free to manipulate the list. I was too quick to say this. A bit more inspection shows that while 'raise_softirq_for' is free to manipulate the list - it won't be called. The reason is that the pt_pirq_softirq_reset is called _after_ the IRQ action handler are removed for this IRQ. That means we will not receive any interrupts for it and call 'raise_softirq_for'. At least until 'pt_irq_create_bind' is called. And said function has a check for this too: 42 * A crude 'while' loop with us dropping the spinlock and giving 243 * the softirq_dpci a chance to run. 244 * We MUST check for this condition as the softirq could be scheduled 245 * and hasn't run yet. Note that this code replaced tasklet_kill which 246 * would have spun forever and would do the same thing (wait to flush out 247 * outstanding hvm_dirq_assist calls. 248 */ 249 if ( pt_pirq_softirq_active(pirq_dpci) ) Hence the patch below is not needed. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi andrii.tseglyts...@globallogic.com wrote: On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) -GICH[GICH_HCR] |= GICH_HCR_UIE; +GICH[GICH_HCR] |= GICH_HCR_NPIE; else -GICH[GICH_HCR] = ~GICH_HCR_UIE; +GICH[GICH_HCR] = ~GICH_HCR_NPIE; } Yes, exactly I tried, hang still occurs with this change We need to figure out why during the hang you still have all the LRs busy even if you are getting maintenance interrupts that should cause them to be cleared. I see that I have free LRs during maintenance interrupt (XEN) gic.c:871:d0v0 maintenance interrupt (XEN) GICH_LRs (vcpu 0) mask=0 (XEN)HW_LR[0]=9a015856 (XEN)HW_LR[1]=0 (XEN)HW_LR[2]=0 (XEN)HW_LR[3]=0 (XEN) Inflight irq=86 lr=0 (XEN) Inflight irq=2 lr=255 (XEN) Pending irq=2 But I see that after I got hang - maintenance interrupts are generated continuously. Platform continues printing the same log till reboot. Exactly the same log? As in the one above you just pasted? That is very very suspicious. Yes exactly the same log. And looks like it means that LRs are flushed correctly. I am thinking that we are not handling GICH_HCR_UIE correctly and something we do in Xen, maybe writing to an LR register, might trigger a new maintenance interrupt immediately causing an infinite loop. Yes, this is what I'm thinking about. Taking in account all collected debug info it looks like once LRs are overloaded with SGIs - maintenance interrupt occurs. And then it is not handled properly, and occurs again and again - so platform hangs inside its handler. Could you please try this patch? It disable GICH_HCR_UIE immediately on hypervisor entry. Now trying. diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 4d2a92d..6ae8dc4 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +GICH[GICH_HCR] = ~GICH_HCR_UIE; + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -821,12 +823,8 @@ void gic_inject(void) gic_restore_pending_irqs(current); - if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) GICH[GICH_HCR] |= GICH_HCR_UIE; -else -GICH[GICH_HCR] = ~GICH_HCR_UIE; - } static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) Heh - I don't see hangs with this patch :) But also I see that maintenance interrupt doesn't occur (and no hang as result) Stefano - is this expected? No maintenance interrupts at all? That's strange. You should be receiving them when LRs are full and you still have interrupts pending to be added to them. You could add another printk here to see if you should be receiving them: if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) +{ +gdprintk(XENLOG_DEBUG, requesting maintenance interrupt\n); GICH[GICH_HCR] |= GICH_HCR_UIE; -else -GICH[GICH_HCR] = ~GICH_HCR_UIE; - +} } Requested properly: (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt (XEN) gic.c:756:d0v0 requesting maintenance interrupt But does not occur OK, let's see what's going on then by printing the irq number of the maintenance interrupt: diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 4d2a92d..fed3167 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -55,6 +55,7 @@ static struct { static DEFINE_PER_CPU(uint64_t, lr_mask); static uint8_t nr_lrs; +static bool uie_on; #define lr_all_full() (this_cpu(lr_mask) == ((1 nr_lrs) - 1)) /* The GIC mapping of CPU interfaces does not necessarily match the @@ -694,6 +695,7 @@ void gic_clear_lrs(struct vcpu *v) { int i = 0; unsigned long flags; +unsigned
Re: [Xen-devel] Xen 4.5 random freeze question
I think that's OK: it looks like that on your board for some reasons when UIE is set you get irq 1023 (spurious interrupt) instead of your normal maintenance interrupt. But everything should work anyway without issues. This is the same patch as before but on top of the lastest xen-unstable tree. Please confirm if it works. diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 70d10d6..df140b9 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -527,8 +529,6 @@ void gic_inject(void) if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1); -else -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); } static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: I got this strange log: (XEN) received maintenance interrupt irq=1023 And platform does not hang due to this: +hcr = GICH[GICH_HCR]; +if ( hcr GICH_HCR_UIE ) +{ +GICH[GICH_HCR] = ~GICH_HCR_UIE; +uie_on = 1; +} On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi andrii.tseglyts...@globallogic.com wrote: On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) -GICH[GICH_HCR] |= GICH_HCR_UIE; +GICH[GICH_HCR] |= GICH_HCR_NPIE; else -GICH[GICH_HCR] = ~GICH_HCR_UIE; +GICH[GICH_HCR] = ~GICH_HCR_NPIE; } Yes, exactly I tried, hang still occurs with this change We need to figure out why during the hang you still have all the LRs busy even if you are getting maintenance interrupts that should cause them to be cleared. I see that I have free LRs during maintenance interrupt (XEN) gic.c:871:d0v0 maintenance interrupt (XEN) GICH_LRs (vcpu 0) mask=0 (XEN)HW_LR[0]=9a015856 (XEN)HW_LR[1]=0 (XEN)HW_LR[2]=0 (XEN)HW_LR[3]=0 (XEN) Inflight irq=86 lr=0 (XEN) Inflight irq=2 lr=255 (XEN) Pending irq=2 But I see that after I got hang - maintenance interrupts are generated continuously. Platform continues printing the same log till reboot. Exactly the same log? As in the one above you just pasted? That is very very suspicious. Yes exactly the same log. And looks like it means that LRs are flushed correctly. I am thinking that we are not handling GICH_HCR_UIE correctly and something we do in Xen, maybe writing to an LR register, might trigger a new maintenance interrupt immediately causing an infinite loop. Yes, this is what I'm thinking about. Taking in account all collected debug info it looks like once LRs are overloaded with SGIs - maintenance interrupt occurs. And then it is not handled properly, and occurs again and again - so platform hangs inside its handler. Could you please try this patch? It disable GICH_HCR_UIE immediately on hypervisor entry. Now trying. diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 4d2a92d..6ae8dc4 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +GICH[GICH_HCR] = ~GICH_HCR_UIE; + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -821,12 +823,8 @@ void gic_inject(void) gic_restore_pending_irqs(current); - if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) GICH[GICH_HCR] |= GICH_HCR_UIE; -else -GICH[GICH_HCR] = ~GICH_HCR_UIE; - } static void do_sgi(struct cpu_user_regs *regs, int othercpu, enum gic_sgi sgi) Heh - I don't see hangs with this patch :) But also I see that maintenance interrupt doesn't occur (and no hang as
Re: [Xen-devel] Xen-unstable: xen panic RIP: dpci_softirq
Wednesday, November 19, 2014, 4:04:59 PM, you wrote: On Wed, Nov 19, 2014 at 12:16:44PM +0100, Sander Eikelenboom wrote: Wednesday, November 19, 2014, 2:55:41 AM, you wrote: On Tue, Nov 18, 2014 at 11:12:54PM +0100, Sander Eikelenboom wrote: Tuesday, November 18, 2014, 9:56:33 PM, you wrote: Uhmm i thought i had these switched off (due to problems earlier and then forgot about them .. however looking at the earlier reports these lines were also in those reports). The xen-syms and these last runs are all with a prestine xen tree cloned today (staging branch), so the qemu-xen and seabios defined with that were also freshly cloned and had a new default seabios config. (just to rule out anything stale in my tree) If you don't see those messages .. perhaps your seabios and qemu trees (and at least the seabios config) are not the most recent (they don't get updated automatically when you just do a git pull on the main tree) ? In /tools/firmware/seabios-dir/.config i have: CONFIG_USB=y CONFIG_USB_UHCI=y CONFIG_USB_OHCI=y CONFIG_USB_EHCI=y CONFIG_USB_XHCI=y CONFIG_USB_MSC=y CONFIG_USB_UAS=y CONFIG_USB_HUB=y CONFIG_USB_KEYBOARD=y CONFIG_USB_MOUSE=y I seem to have the same thing. Perhaps it is my XHCI controller being wonky. And this is all just from a: - git clone git://xenbits.xen.org/xen.git -b staging - make clean ./configure make -j6 make -j6 install Aye. .. snip.. 1) test_and_[set|clear]_bit sometimes return unexpected values. [But this might be invalid as the addition of the 8303faaf25a8 might be correct - as the second dpci the softirq is processing could be the MSI one] Would there be an easy way to stress test this function separately in some debugging function to see if it indeed is returning unexpected values ? Sadly no. But you got me looking in the right direction when you mentioned 'timeout'. 2) INIT_LIST_HEAD operations on the same CPU are not honored. Just curious, have you also tested the patches on AMD hardware ? Yes. To reproduce this the first thing I did was to get an AMD box. When i look at the combination of (2) and (3), It seems it could be an interaction between the two passed through devices and/or different IRQ types. Could be - as in it is causing this issue to show up faster than expected. Or it is the one that triggers more than one dpci happening at the same time. Well that didn't seem to be it (see separate amendment i mailed previously) Right, the current theory I've is that the interrupts are not being Acked within 8 milisecond and we reset the 'state' - and at the same time we get an interrupt and schedule it - while we are still processing the same interrupt. This would explain why the 'test_and_clear_bit' got the wrong value. In regards to the list poison - following this thread of logic - with the 'state = 0' set we open the floodgates for any CPU to put the same 'struct hvm_pirq_dpci' on its list. We do reset the 'state' on _every_ GSI that is mapped to a guest - so we also reset the 'state' for the MSI one (XHCI). Anyhow in your case: CPUX: CPUY: pt_irq_time_out: state = 0; [out of timer coder, theraise_softirq pirq_dpci is on the dpci_list] [adds the pirq_dpci as state == 0] softirq_dpcisoftirq_dpci: list_del [entries poison] list_del = BOOM Is what I believe is happening. The INTX device - once I put a load on it - does not trigger any pt_irq_time_out, so that would explain why I cannot hit this. But I believe your card hits these hiccups. Hi Konrad, I just tested you 5 patches and as a result i still got an(other) host crash: (complete serial log attached) (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc x86_64 debug=y Not tainted ] (XEN) [2014-11-18 21:55:41.591] CPU:0 (XEN) [2014-11-18 21:55:41.591] [ Xen-4.5.0-rc x86_64 debug=y Not tainted ] (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08012c7e7]CPU:2 (XEN) [2014-11-18 21:55:41.591] RIP:e008:[82d08014a461] hvm_do_IRQ_dpci+0xbd/0x13c (XEN) [2014-11-18 21:55:41.591] RFLAGS: 00010006 _spin_unlock+0x1f/0x30CONTEXT: hypervisor Duh! Here is another patch on top of the five you have (attached and inline). Hi Konrad, Happy to report it has been running with this additional patch for 2 hours now without any problems. I think you nailed it :-) Could you also do an 'xl debug-keys k' and send that please? Sure: (XEN)
[Xen-devel] [for-xen-4.5 PATCH] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.
If we pass in INTx type devices to a guest on an over-subscribed machine - and in an over-worked guest - we can cause the pirq_dpci-softirq_list to become corrupted. The reason for this is that the 'pt_irq_guest_eoi' ends up setting the 'state' to zero value. However the 'state' value (STATE_SCHED, STATE_RUN) is used to communicate between 'raise_softirq_for' and 'dpci_softirq' to determine whether the 'struct hvm_pirq_dpci' can be re-scheduled. We are ignoring the teardown path for simplicity for right now. The 'pt_irq_guest_eoi' was not adhering to the proper dialogue and was not using locked cmpxchg or test_bit operations and ended setting 'state' set to zero. That meant 'raise_softirq_for' was free to schedule it while the 'struct hvm_pirq_dpci'' was still on an per-cpu list. The end result was list_del being called twice and the second call corrupting the per-cpu list. For this to occur one of the CPUs must be in the idle loop executing softirqs and the interrupt handler in the guest must not respond to the pending interrupt within 8ms, and we must receive another interrupt for this device on another CPU. CPU0: CPU1: timer_softirq_action \- pt_irq_time_out state = 0;do_IRQ [out of timer code, theraise_softirq pirq_dpci is on the CPU0 dpci_list] [adds the pirq_dpci to CPU1 dpci_list as state == 0] softirq_dpci:softirq_dpci: list_del [list entries are poisoned] list_del = BOOM The fix is simple - enroll 'pt_irq_guest_eoi' to use the locked semantics for 'state'. We piggyback on pt_pirq_softirq_cancel (was pt_pirq_softirq_reset) to use cmpxchg. We also expand said function to reset the '-dom' only on the teardown paths - but not on the timeouts. Reported-by: Sander Eikelenboom li...@eikelenboom.it Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen/drivers/passthrough/io.c | 27 +-- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index efc66dc..2039d31 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -57,7 +57,7 @@ enum { * This can be called multiple times, but the softirq is only raised once. * That is until the STATE_SCHED state has been cleared. The state can be * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'), - * or by 'pt_pirq_softirq_reset' (which will try to clear the state before + * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before * the softirq had a chance to run). */ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) @@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci) } /* - * Reset the pirq_dpci-dom parameter to NULL. + * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set, + * reset pirq_dpci-dom parameter to NULL (used for teardown). * * This function checks the different states to make sure it can do it * at the right time. If it unschedules the 'hvm_dirq_assist' from running * it also refcounts (which is what the softirq would have done) properly. */ -static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) +static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci, + unsigned int clear) { struct domain *d = pirq_dpci-dom; @@ -125,8 +127,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom' * in local variable before it sets STATE_RUN - and therefore will not * dereference '-dom' which would crash. + * + * However, if this is called from 'pt_irq_time_out' we do not want to + * clear the '-dom' as we can re-use the 'pirq_dpci' after that and + * need '-dom'. */ -pirq_dpci-dom = NULL; +if ( clear ) +pirq_dpci-dom = NULL; break; } } @@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci, if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT, pirq_dpci-flags) ) { -pirq_dpci-state = 0; +pt_pirq_softirq_cancel(pirq_dpci, 0 /* keep dom */); pirq_dpci-pending = 0; pirq_guest_eoi(dpci_pirq(pirq_dpci)); } @@ -285,7 +292,7 @@ int pt_irq_create_bind( * to be scheduled but we must deal with the one that may be * in the queue. */ -pt_pirq_softirq_reset(pirq_dpci); +pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */); } } if ( unlikely(rc) ) @@ -536,9 +543,9 @@ int
Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: I think that's OK: it looks like that on your board for some reasons when UIE is set you get irq 1023 (spurious interrupt) instead of your normal maintenance interrupt. OK, but I think this should be investigated too. What do you think ? I think it is harmless: my guess is that if we clear UIE before reading GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance interrupt. But it doesn't really matter to us. But everything should work anyway without issues. This is the same patch as before but on top of the lastest xen-unstable tree. Please confirm if it works. diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 70d10d6..df140b9 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -527,8 +529,6 @@ void gic_inject(void) if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1); -else -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); } I confirm - it works fine. Will this be a final fix ? Yep :-) Many thanks for your help on this! Regards, Andrii static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: I got this strange log: (XEN) received maintenance interrupt irq=1023 And platform does not hang due to this: +hcr = GICH[GICH_HCR]; +if ( hcr GICH_HCR_UIE ) +{ +GICH[GICH_HCR] = ~GICH_HCR_UIE; +uie_on = 1; +} On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi andrii.tseglyts...@globallogic.com wrote: On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) -GICH[GICH_HCR] |= GICH_HCR_UIE; +GICH[GICH_HCR] |= GICH_HCR_NPIE; else -GICH[GICH_HCR] = ~GICH_HCR_UIE; +GICH[GICH_HCR] = ~GICH_HCR_NPIE; } Yes, exactly I tried, hang still occurs with this change We need to figure out why during the hang you still have all the LRs busy even if you are getting maintenance interrupts that should cause them to be cleared. I see that I have free LRs during maintenance interrupt (XEN) gic.c:871:d0v0 maintenance interrupt (XEN) GICH_LRs (vcpu 0) mask=0 (XEN)HW_LR[0]=9a015856 (XEN)HW_LR[1]=0 (XEN)HW_LR[2]=0 (XEN)HW_LR[3]=0 (XEN) Inflight irq=86 lr=0 (XEN) Inflight irq=2 lr=255 (XEN) Pending irq=2 But I see that after I got hang - maintenance interrupts are generated continuously. Platform continues printing the same log till reboot. Exactly the same log? As in the one above you just pasted? That is very very suspicious. Yes exactly the same log. And looks like it means that LRs are flushed correctly. I am thinking that we are not handling GICH_HCR_UIE correctly and something we do in Xen, maybe writing to an LR register, might trigger a new maintenance interrupt immediately causing an infinite loop. Yes, this is what I'm thinking about. Taking in account all collected debug info it looks like once LRs are overloaded with SGIs - maintenance interrupt occurs. And then it is not handled properly, and occurs again and again - so platform hangs inside its handler. Could you please try this patch? It disable GICH_HCR_UIE immediately on hypervisor entry. Now trying. diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 4d2a92d..6ae8dc4 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -701,6 +701,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) )
Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, Nov 19, 2014 at 7:42 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: I think that's OK: it looks like that on your board for some reasons when UIE is set you get irq 1023 (spurious interrupt) instead of your normal maintenance interrupt. OK, but I think this should be investigated too. What do you think ? I think it is harmless: my guess is that if we clear UIE before reading GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance interrupt. But it doesn't really matter to us. OK. I think catching this will be a good exercise for someone )) But out of scope for this issue. But everything should work anyway without issues. This is the same patch as before but on top of the lastest xen-unstable tree. Please confirm if it works. diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 70d10d6..df140b9 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -527,8 +529,6 @@ void gic_inject(void) if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1); -else -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); } I confirm - it works fine. Will this be a final fix ? Yep :-) Many thanks for your help on this! Thank you Stefano. This issue was really critical for us :) Regards, Andrii Regards, Andrii static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: I got this strange log: (XEN) received maintenance interrupt irq=1023 And platform does not hang due to this: +hcr = GICH[GICH_HCR]; +if ( hcr GICH_HCR_UIE ) +{ +GICH[GICH_HCR] = ~GICH_HCR_UIE; +uie_on = 1; +} On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi andrii.tseglyts...@globallogic.com wrote: On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) -GICH[GICH_HCR] |= GICH_HCR_UIE; +GICH[GICH_HCR] |= GICH_HCR_NPIE; else -GICH[GICH_HCR] = ~GICH_HCR_UIE; +GICH[GICH_HCR] = ~GICH_HCR_NPIE; } Yes, exactly I tried, hang still occurs with this change We need to figure out why during the hang you still have all the LRs busy even if you are getting maintenance interrupts that should cause them to be cleared. I see that I have free LRs during maintenance interrupt (XEN) gic.c:871:d0v0 maintenance interrupt (XEN) GICH_LRs (vcpu 0) mask=0 (XEN)HW_LR[0]=9a015856 (XEN)HW_LR[1]=0 (XEN)HW_LR[2]=0 (XEN)HW_LR[3]=0 (XEN) Inflight irq=86 lr=0 (XEN) Inflight irq=2 lr=255 (XEN) Pending irq=2 But I see that after I got hang - maintenance interrupts are generated continuously. Platform continues printing the same log till reboot. Exactly the same log? As in the one above you just pasted? That is very very suspicious. Yes exactly the same log. And looks like it means that LRs are flushed correctly. I am thinking that we are not handling GICH_HCR_UIE correctly and something we do in Xen, maybe writing to an LR register, might trigger a new maintenance interrupt immediately causing an infinite loop. Yes, this is what I'm thinking about. Taking in account all collected debug info it looks like once LRs are overloaded with SGIs - maintenance interrupt occurs. And then it is not handled properly, and occurs again and again - so platform hangs inside its handler. Could you please try this patch? It disable GICH_HCR_UIE immediately on hypervisor entry. Now trying.
[Xen-devel] [PATCH for-4.5] xen/arm: clear UIE on hypervisor entry
UIE being set can cause maintenance interrupts to occur when Xen writes to one or more LR registers. The effect is a busy loop around the interrupt handler in Xen (http://marc.info/?l=xen-develm=141597517132682): everything gets stuck. Konrad, this fixes an actual bug, at least on OMAP5. It should have no bad side effects on any other platforms as far as I can tell. It should go in 4.5. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Tested-by: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 70d10d6..df140b9 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -527,8 +529,6 @@ void gic_inject(void) if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1); -else -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); } static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 4/4] x86/xen: use the maximum MFN to calculate the required DMA mask
On Wed, 19 Nov 2014, David Vrabel wrote: On a Xen PV guest the DMA addresses and physical addresses are not 1:1 (such as Xen PV guests) and the generic dma_get_required_mask() does not return the correct mask (since it uses max_pfn). Some device drivers (such as mptsas, mpt2sas) use dma_get_required_mask() to set the device's DMA mask to allow them to use only 32-bit DMA addresses in hardware structures. This results in unnecessary use of the SWIOTLB if DMA addresses are more than 32-bits, impacting performance significantly. Provide a get_required_mask op that uses the maximum MFN to calculate the DMA mask. Signed-off-by: David Vrabel david.vra...@citrix.com --- arch/x86/xen/pci-swiotlb-xen.c |1 + drivers/xen/swiotlb-xen.c | 13 + include/xen/swiotlb-xen.h |4 3 files changed, 18 insertions(+) diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c index 0e98e5d..a5d180a 100644 --- a/arch/x86/xen/pci-swiotlb-xen.c +++ b/arch/x86/xen/pci-swiotlb-xen.c @@ -31,6 +31,7 @@ static struct dma_map_ops xen_swiotlb_dma_ops = { .map_page = xen_swiotlb_map_page, .unmap_page = xen_swiotlb_unmap_page, .dma_supported = xen_swiotlb_dma_supported, + .get_required_mask = xen_swiotlb_get_required_mask, }; /* diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index ebd8f21..654587d 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -42,9 +42,11 @@ #include xen/page.h #include xen/xen-ops.h #include xen/hvc-console.h +#include xen/interface/memory.h #include asm/dma-mapping.h #include asm/xen/page-coherent.h +#include asm/xen/hypercall.h #include trace/events/swiotlb.h /* @@ -683,3 +685,14 @@ xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask) return 0; } EXPORT_SYMBOL_GPL(xen_swiotlb_set_dma_mask); + +u64 +xen_swiotlb_get_required_mask(struct device *dev) +{ + unsigned long max_mfn; + + max_mfn = HYPERVISOR_memory_op(XENMEM_maximum_ram_page, NULL); As Jan pointed out, I think you need to change the prototype of HYPERVISOR_memory_op to return long. Please do consistently across all relevant archs. + return DMA_BIT_MASK(fls_long(max_mfn - 1) + PAGE_SHIFT); +} +EXPORT_SYMBOL_GPL(xen_swiotlb_get_required_mask); diff --git a/include/xen/swiotlb-xen.h b/include/xen/swiotlb-xen.h index 8b2eb93..640 100644 --- a/include/xen/swiotlb-xen.h +++ b/include/xen/swiotlb-xen.h @@ -58,4 +58,8 @@ xen_swiotlb_dma_supported(struct device *hwdev, u64 mask); extern int xen_swiotlb_set_dma_mask(struct device *dev, u64 dma_mask); + +extern u64 +xen_swiotlb_get_required_mask(struct device *dev); + #endif /* __LINUX_SWIOTLB_XEN_H */ -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
That's right, the maintenance interrupt handler is not called, but it doesn't do anything so we are fine. The important thing is that an interrupt is sent and git_clear_lrs gets called on hypervisor entry. On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: The only ambiguity left - maintenance interrupt handler is not called. It was requested for specific IRQ number, retrieved from device tree. But when we trigger GICH_HCR_UIE - we got maintenance interrupt for spurious number 1023. Regards, Andrii On Wed, Nov 19, 2014 at 7:47 PM, Andrii Tseglytskyi andrii.tseglyts...@globallogic.com wrote: On Wed, Nov 19, 2014 at 7:42 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, On Wed, Nov 19, 2014 at 7:07 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: I think that's OK: it looks like that on your board for some reasons when UIE is set you get irq 1023 (spurious interrupt) instead of your normal maintenance interrupt. OK, but I think this should be investigated too. What do you think ? I think it is harmless: my guess is that if we clear UIE before reading GICC_IAR, GICC_IAR returns spurious interrupt instead of maintenance interrupt. But it doesn't really matter to us. OK. I think catching this will be a good exercise for someone )) But out of scope for this issue. But everything should work anyway without issues. This is the same patch as before but on top of the lastest xen-unstable tree. Please confirm if it works. diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 70d10d6..df140b9 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -527,8 +529,6 @@ void gic_inject(void) if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1); -else -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); } I confirm - it works fine. Will this be a final fix ? Yep :-) Many thanks for your help on this! Thank you Stefano. This issue was really critical for us :) Regards, Andrii Regards, Andrii static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: I got this strange log: (XEN) received maintenance interrupt irq=1023 And platform does not hang due to this: +hcr = GICH[GICH_HCR]; +if ( hcr GICH_HCR_UIE ) +{ +GICH[GICH_HCR] = ~GICH_HCR_UIE; +uie_on = 1; +} On Wed, Nov 19, 2014 at 6:50 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 6:13 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: On Wed, Nov 19, 2014 at 6:01 PM, Andrii Tseglytskyi andrii.tseglyts...@globallogic.com wrote: On Wed, Nov 19, 2014 at 5:41 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, On Wed, Nov 19, 2014 at 4:52 PM, Stefano Stabellini stefano.stabell...@eu.citrix.com wrote: On Wed, 19 Nov 2014, Andrii Tseglytskyi wrote: Hi Stefano, if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) -GICH[GICH_HCR] |= GICH_HCR_UIE; +GICH[GICH_HCR] |= GICH_HCR_NPIE; else -GICH[GICH_HCR] = ~GICH_HCR_UIE; +GICH[GICH_HCR] = ~GICH_HCR_NPIE; } Yes, exactly I tried, hang still occurs with this change We need to figure out why during the hang you still have all the LRs busy even if you are getting maintenance interrupts that should cause them to be cleared. I see that I have free LRs during maintenance interrupt (XEN) gic.c:871:d0v0 maintenance interrupt (XEN) GICH_LRs (vcpu 0) mask=0 (XEN)HW_LR[0]=9a015856 (XEN)HW_LR[1]=0 (XEN)HW_LR[2]=0 (XEN)HW_LR[3]=0 (XEN) Inflight irq=86 lr=0 (XEN) Inflight irq=2 lr=255 (XEN) Pending irq=2 But I see that after I got hang - maintenance interrupts are generated continuously. Platform continues printing the same log till reboot. Exactly the same log? As in the one above you just pasted? That is very very suspicious. Yes exactly the
Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)
On Wed, 19 Nov 2014, Don Slutz wrote: I have posted the patch: Subject: [BUGFIX][PATCH for 2.2 1/1] hw/i386/pc_piix.c: Also pass vmport=off for xenfv machine Date: Wed, 19 Nov 2014 12:30:57 -0500 Message-ID: 1416418257-10166-1-git-send-email-dsl...@verizon.com Which fixes QEMU 2.2 for xenfv. However if you configure xen_platform_pci=0 you will still have this issue. The good news is that xen-4.5 currently does not have QEMU 2.2 and so does not have this issue. Only people (groups like spice?) that want QEMU 2.2.0 with xen 4.5.0 (or older xen versions) will hit this. I have changes to xen 4.6 which will fix the xen_platform_pci=0 case also. In order to get xen 4.5 to fully work with QEMU 2.2.0 (both in hard freeze) the 1st patch from Dr. David Alan Gilbert dgilb...@redhat.com would need to be applied to xen's qemu 2.0.2 (+ changes) so that vmport=off can be added to --machine. And a patch (yet to be written, subset of changes I have pending for 4.6) that adds vmport=off to QEMU args for --machine (it can be done in all cases). What happens if you pass vmport=off via --machine, without David Alan Gilbert's patch in QEMU? -Don Slutz On 11/19/14 10:52, Stefano Stabellini wrote: On Wed, 19 Nov 2014, Fabio Fantoni wrote: Il 19/11/2014 15:56, Don Slutz ha scritto: I think I know what is happening here. But you are pointing at the wrong change. commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4 Is what I am guessing at this time is the issue. I think that xen_enabled() is returning false in pc_machine_initfn. Where as in pc_init1 is is returning true. I am thinking that: diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 7bb97a4..3268c29 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = { .desc = Xen Fully-virtualized PC, .init = pc_xen_hvm_init, .max_cpus = HVM_MAX_VCPUS, -.default_machine_opts = accel=xen, +.default_machine_opts = accel=xen,vmport=off, .hot_add_cpu = pc_hot_add_cpu, }; #endif Will fix your issue. I have not tested this yet. Tested now and it solves regression of linux hvm domUs with qemu 2.2, thanks. I think that I'm not the only with this regression and that this patch (or a fix to the cause in vmport) should be applied before qemu 2.2 final. Don, please submit a proper patch with a Signed-off-by. Thanks! - Stefano -Don Slutz On 11/19/14 09:04, Fabio Fantoni wrote: Il 14/11/2014 12:25, Fabio Fantoni ha scritto: dom0 xen-unstable from staging git with x86/hvm: Extend HVM cpuid leaf with vcpu id and x86/hvm: Add per-vcpu evtchn upcalls patches, and qemu 2.2 from spice git (spice/next commit e779fa0a715530311e6f59fc8adb0f6eca914a89): https://github.com/Fantu/Xen/commits/rebase/m2r-staging I tried with qemu tag v2.2.0-rc2 and crash still happen, here the full backtrace of latest test: Program received signal SIGSEGV, Segmentation fault. 0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73 73 eax = env-regs[R_EAX]; (gdb) bt full #0 0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73 s = 0x564443a0 cs = 0x0 cpu = 0x0 __func__ = vmport_ioport_read env = 0x8250 command = 0 '\000' eax = 0 #1 0x55655fc4 in memory_region_read_accessor (mr=0x5628, addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410 tmp = 0 #2 0x556562b7 in access_with_adjusted_size (addr=0, value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4, access=0x55655f62 memory_region_read_accessor, mr=0x5628) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480 access_mask = 4294967295 access_size = 4 i = 0 #3 0x556590e9 in memory_region_dispatch_read1 (mr=0x5628, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077 data = 0 #4 0x556591b1 in memory_region_dispatch_read (mr=0x5628, addr=0, pval=0x7fffd9a8, size=4) ---Type return to continue, or q return to quit--- at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099 No locals. #5 0x5565cbbc in io_mem_read (mr=0x5628,
Re: [Xen-devel] Xen 4.5 random freeze question
On 11/19/2014 06:14 PM, Stefano Stabellini wrote: That's right, the maintenance interrupt handler is not called, but it doesn't do anything so we are fine. The important thing is that an interrupt is sent and git_clear_lrs gets called on hypervisor entry. It would be worth to write down this somewhere. Just in case someone decide to add code in maintenance interrupt later. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] xen/arm: clear UIE on hypervisor entry
On Wed, Nov 19, 2014 at 05:44:49PM +, Stefano Stabellini wrote: UIE being set can cause maintenance interrupts to occur when Xen writes to one or more LR registers. The effect is a busy loop around the interrupt handler in Xen (http://marc.info/?l=xen-develm=141597517132682): everything gets stuck. Konrad, this fixes an actual bug, at least on OMAP5. It should have no bad side effects on any other platforms as far as I can tell. It should go in 4.5. Have you checked (aka ran the tests) on the other platforms? Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Tested-by: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com ^^^ 'Reported-and-Tested-by' diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 70d10d6..df140b9 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -527,8 +529,6 @@ void gic_inject(void) if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1); -else -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); } static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
On Wed, 19 Nov 2014, Julien Grall wrote: On 11/19/2014 06:14 PM, Stefano Stabellini wrote: That's right, the maintenance interrupt handler is not called, but it doesn't do anything so we are fine. The important thing is that an interrupt is sent and git_clear_lrs gets called on hypervisor entry. It would be worth to write down this somewhere. Just in case someone decide to add code in maintenance interrupt later. Yes, I could add a comment in the handler ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] xen/arm: clear UIE on hypervisor entry
On Wed, 19 Nov 2014, Konrad Rzeszutek Wilk wrote: On Wed, Nov 19, 2014 at 05:44:49PM +, Stefano Stabellini wrote: UIE being set can cause maintenance interrupts to occur when Xen writes to one or more LR registers. The effect is a busy loop around the interrupt handler in Xen (http://marc.info/?l=xen-develm=141597517132682): everything gets stuck. Konrad, this fixes an actual bug, at least on OMAP5. It should have no bad side effects on any other platforms as far as I can tell. It should go in 4.5. Have you checked (aka ran the tests) on the other platforms? Yes, I tested on Midway and it runs fine. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Tested-by: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com ^^^ 'Reported-and-Tested-by' Good point diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 70d10d6..df140b9 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -403,6 +403,8 @@ void gic_clear_lrs(struct vcpu *v) if ( is_idle_vcpu(v) ) return; +gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); + spin_lock_irqsave(v-arch.vgic.lock, flags); while ((i = find_next_bit((const unsigned long *) this_cpu(lr_mask), @@ -527,8 +529,6 @@ void gic_inject(void) if ( !list_empty(current-arch.vgic.lr_pending) lr_all_full() ) gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 1); -else -gic_hw_ops-update_hcr_status(GICH_HCR_UIE, 0); } static void do_sgi(struct cpu_user_regs *regs, enum gic_sgi sgi) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.
On 19/11/2014 18:54, Sander Eikelenboom wrote: Wednesday, November 19, 2014, 6:31:39 PM, you wrote: Hey, This patch should fix the issue that Sander had seen. The full details are in the patch itself. Sander, if you could - please test origin/staging with this patch to make sure it does fix the issue. xen/drivers/passthrough/io.c | 27 +-- Konrad Rzeszutek Wilk (1): dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked. 1 file changed, 17 insertions(+), 10 deletions(-) Hi Konrad, Hmm just tested with a freshly cloned tree .. unfortunately it blew up again. (i must admit i also re-enabled stuff i had disabled in debugging like, cpuidle, cpufreq). (XEN) [2014-11-19 18:41:25.999] [ Xen-4.5.0-rc x86_64 debug=y Not tainted ] (XEN) [2014-11-19 18:41:25.999] CPU:5 (XEN) [2014-11-19 18:41:25.999] RIP:e008:[82d0801490ac] dpci_softirq+0x9c/0x23d (XEN) [2014-11-19 18:41:25.999] RFLAGS: 00010283 CONTEXT: hypervisor (XEN) [2014-11-19 18:41:25.999] rax: 0100100100100100 rbx: 8303bb688d90 rcx: 0001 (XEN) [2014-11-19 18:41:25.999] rdx: 83054ef18000 rsi: 0002 rdi: 83050b29e0b8 (XEN) [2014-11-19 18:41:25.999] rbp: 83054ef1feb0 rsp: 83054ef1fe50 r8: 8303bb688d60 (XEN) [2014-11-19 18:41:25.999] r9: 01d5f62fff63 r10: deadbeef r11: 0246 (XEN) [2014-11-19 18:41:25.999] r12: 8303bb688d38 r13: 83050b29e000 r14: 8303bb688d28 (XEN) [2014-11-19 18:41:25.999] r15: 8303bb688d28 cr0: 8005003b cr4: 06f0 (XEN) [2014-11-19 18:41:25.999] cr3: 00050b2c7000 cr2: ff600400 (XEN) [2014-11-19 18:41:25.999] ds: 002b es: 002b fs: gs: ss: e010 cs: e008 (XEN) [2014-11-19 18:41:25.999] Xen stack trace from rsp=83054ef1fe50: (XEN) [2014-11-19 18:41:25.999]0c23 83050b29e0b8 8303bb688d38 83054ef1fe70 (XEN) [2014-11-19 18:41:25.999]8303bb688d90 8303bb688d90 00fb 82d080300200 (XEN) [2014-11-19 18:41:25.999]82d0802fff80 83054ef18000 0002 (XEN) [2014-11-19 18:41:25.999]83054ef1fee0 82d08012be31 83054ef18000 83009fd2d000 (XEN) [2014-11-19 18:41:25.999] 83054ef28068 83054ef1fef0 82d08012be89 (XEN) [2014-11-19 18:41:25.999]83054ef1ff10 82d0801633e5 82d08012be89 83009ff8b000 (XEN) [2014-11-19 18:41:25.999]83054ef1fde8 880059bf8000 880059bf8000 (XEN) [2014-11-19 18:41:25.999] 880059bfbeb0 822f3ec0 0246 (XEN) [2014-11-19 18:41:25.999]0001 (XEN) [2014-11-19 18:41:25.999]810013aa 880059bde480 deadbeef deadbeef (XEN) [2014-11-19 18:41:25.999]0100 810013aa e033 0246 (XEN) [2014-11-19 18:41:25.999]880059bfbe98 e02b 1862060042c8beef 224d41480704beef (XEN) [2014-11-19 18:41:25.999]99171042639bbeef 74c88180108cbeef c0dc604c0005 83009ff8b000 (XEN) [2014-11-19 18:41:26.000]0034cebff280 ca836183a4020303 (XEN) [2014-11-19 18:41:26.000] Xen call trace: (XEN) [2014-11-19 18:41:26.000][82d0801490ac] dpci_softirq+0x9c/0x23d (XEN) [2014-11-19 18:41:26.000][82d08012be31] __do_softirq+0x81/0x8c (XEN) [2014-11-19 18:41:26.000][82d08012be89] do_softirq+0x13/0x15 (XEN) [2014-11-19 18:41:26.000][82d0801633e5] idle_loop+0x5e/0x6e (XEN) [2014-11-19 18:41:26.000] (XEN) [2014-11-19 18:41:26.778] (XEN) [2014-11-19 18:41:26.787] (XEN) [2014-11-19 18:41:26.806] Panic on CPU 5: (XEN) [2014-11-19 18:41:26.819] GENERAL PROTECTION FAULT (XEN) [2014-11-19 18:41:26.834] [error_code=] (XEN) [2014-11-19 18:41:26.847] (XEN) [2014-11-19 18:41:26.867] (XEN) [2014-11-19 18:41:26.876] Reboot in five seconds... (XEN) [2014-11-19 18:41:26.891] APIC error on CPU0: 00(08) (XEN) [2014-11-19 18:41:26.906] APIC error on CPU0: 08(08) For the avoidance of any confusion, this is still LIST_POISON1 (see %rax), but now a #GP fault following c/s 404227138 (now with 100% less chance of dereferencing into guest-controlled virtual address space) ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen 4.5 random freeze question
19 лист. 2014 20:32, користувач Stefano Stabellini stefano.stabell...@eu.citrix.com написав: On Wed, 19 Nov 2014, Julien Grall wrote: On 11/19/2014 06:14 PM, Stefano Stabellini wrote: That's right, the maintenance interrupt handler is not called, but it doesn't do anything so we are fine. The important thing is that an interrupt is sent and git_clear_lrs gets called on hypervisor entry. It would be worth to write down this somewhere. Just in case someone decide to add code in maintenance interrupt later. Yes, I could add a comment in the handler Maybe it wouldn't take a lot of effort to fix it? I am just worrying that we may hide some issue - typically spurious interrupt this not what is expected. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-4.3-testing test] 31670: regressions - FAIL
flight 31670 xen-4.3-testing real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/31670/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-qemut-winxpsp3 7 windows-install fail REGR. vs. 31536 Tests which did not succeed, but are not blocking: test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a build-amd64-rumpuserxen 6 xen-buildfail never pass build-i386-rumpuserxen6 xen-buildfail never pass test-amd64-amd64-xl-qemuu-ovmf-amd64 7 debian-hvm-install fail never pass test-amd64-i386-libvirt 9 guest-start fail never pass test-amd64-i386-xl-qemuu-ovmf-amd64 7 debian-hvm-install fail never pass test-amd64-amd64-libvirt 9 guest-start fail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-armhf-armhf-xl 5 xen-boot fail never pass test-armhf-armhf-libvirt 5 xen-boot fail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xend-winxpsp3 17 leak-check/check fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xend-qemut-winxpsp3 17 leak-check/checkfail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass version targeted for testing: xen 82fa0623454a52c7d1812a9419c4cc09567d243d baseline version: xen d6281e354393f1c8a02fac55f4f611b4d4856303 People who touched revisions under test: Jan Beulich jbeul...@suse.com Tim Deegan t...@xen.org jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass build-amd64-rumpuserxen fail build-i386-rumpuserxen fail test-amd64-amd64-xl pass test-armhf-armhf-xl fail test-amd64-i386-xl pass test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64 pass test-amd64-i386-freebsd10-amd64 pass test-amd64-amd64-xl-qemuu-ovmf-amd64 fail test-amd64-i386-xl-qemuu-ovmf-amd64 fail test-amd64-amd64-rumpuserxen-amd64 blocked test-amd64-amd64-xl-qemut-win7-amd64 fail test-amd64-i386-xl-qemut-win7-amd64 fail test-amd64-amd64-xl-qemuu-win7-amd64 fail test-amd64-i386-xl-qemuu-win7-amd64 fail test-amd64-amd64-xl-win7-amd64 fail test-amd64-i386-xl-win7-amd64fail test-amd64-i386-xl-credit2 pass test-amd64-i386-freebsd10-i386 pass
Re: [Xen-devel] [Qemu-devel] qemu 2.2 crash on linux hvm domU (full backtrace included)
On 11/19/14 13:18, Stefano Stabellini wrote: On Wed, 19 Nov 2014, Don Slutz wrote: I have posted the patch: Subject: [BUGFIX][PATCH for 2.2 1/1] hw/i386/pc_piix.c: Also pass vmport=off for xenfv machine Date: Wed, 19 Nov 2014 12:30:57 -0500 Message-ID: 1416418257-10166-1-git-send-email-dsl...@verizon.com Which fixes QEMU 2.2 for xenfv. However if you configure xen_platform_pci=0 you will still have this issue. The good news is that xen-4.5 currently does not have QEMU 2.2 and so does not have this issue. Only people (groups like spice?) that want QEMU 2.2.0 with xen 4.5.0 (or older xen versions) will hit this. I have changes to xen 4.6 which will fix the xen_platform_pci=0 case also. In order to get xen 4.5 to fully work with QEMU 2.2.0 (both in hard freeze) the 1st patch from Dr. David Alan Gilbert dgilb...@redhat.com would need to be applied to xen's qemu 2.0.2 (+ changes) so that vmport=off can be added to --machine. And a patch (yet to be written, subset of changes I have pending for 4.6) that adds vmport=off to QEMU args for --machine (it can be done in all cases). What happens if you pass vmport=off via --machine, without David Alan Gilbert's patch in QEMU? I am almost (99%) sure that QEMU will complain about a bad arg. gdb says: (gdb) r Starting program: /home/don/qemu/out/master/x86_64-softmmu/qemu-system-x86_64 -M pc -machine accel=xen,vmportport=1 [Thread debugging using libthread_db enabled] Using host libthread_db library /lib64/libthread_db.so.1. qemu-system-x86_64: -machine accel=xen,vmportport=1: Invalid parameter 'vmportport' In which case domU will fail to start. -Don Slutz -Don Slutz On 11/19/14 10:52, Stefano Stabellini wrote: On Wed, 19 Nov 2014, Fabio Fantoni wrote: Il 19/11/2014 15:56, Don Slutz ha scritto: I think I know what is happening here. But you are pointing at the wrong change. commit 9b23cfb76b3a5e9eb5cc899eaf2f46bc46d33ba4 Is what I am guessing at this time is the issue. I think that xen_enabled() is returning false in pc_machine_initfn. Where as in pc_init1 is is returning true. I am thinking that: diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c index 7bb97a4..3268c29 100644 --- a/hw/i386/pc_piix.c +++ b/hw/i386/pc_piix.c @@ -914,7 +914,7 @@ static QEMUMachine xenfv_machine = { .desc = Xen Fully-virtualized PC, .init = pc_xen_hvm_init, .max_cpus = HVM_MAX_VCPUS, -.default_machine_opts = accel=xen, +.default_machine_opts = accel=xen,vmport=off, .hot_add_cpu = pc_hot_add_cpu, }; #endif Will fix your issue. I have not tested this yet. Tested now and it solves regression of linux hvm domUs with qemu 2.2, thanks. I think that I'm not the only with this regression and that this patch (or a fix to the cause in vmport) should be applied before qemu 2.2 final. Don, please submit a proper patch with a Signed-off-by. Thanks! - Stefano -Don Slutz On 11/19/14 09:04, Fabio Fantoni wrote: Il 14/11/2014 12:25, Fabio Fantoni ha scritto: dom0 xen-unstable from staging git with x86/hvm: Extend HVM cpuid leaf with vcpu id and x86/hvm: Add per-vcpu evtchn upcalls patches, and qemu 2.2 from spice git (spice/next commit e779fa0a715530311e6f59fc8adb0f6eca914a89): https://github.com/Fantu/Xen/commits/rebase/m2r-staging I tried with qemu tag v2.2.0-rc2 and crash still happen, here the full backtrace of latest test: Program received signal SIGSEGV, Segmentation fault. 0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73 73 eax = env-regs[R_EAX]; (gdb) bt full #0 0x55689b07 in vmport_ioport_read (opaque=0x564443a0, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/hw/misc/vmport.c:73 s = 0x564443a0 cs = 0x0 cpu = 0x0 __func__ = vmport_ioport_read env = 0x8250 command = 0 '\000' eax = 0 #1 0x55655fc4 in memory_region_read_accessor (mr=0x5628, addr=0, value=0x7fffd8d0, size=4, shift=0, mask=4294967295) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:410 tmp = 0 #2 0x556562b7 in access_with_adjusted_size (addr=0, value=0x7fffd8d0, size=4, access_size_min=4, access_size_max=4, access=0x55655f62 memory_region_read_accessor, mr=0x5628) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:480 access_mask = 4294967295 access_size = 4 i = 0 #3 0x556590e9 in memory_region_dispatch_read1 (mr=0x5628, addr=0, size=4) at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1077 data = 0 #4 0x556591b1 in memory_region_dispatch_read (mr=0x5628, addr=0, pval=0x7fffd9a8, size=4) ---Type return to continue, or q return to quit--- at /mnt/vm/xen/Xen/tools/qemu-xen-dir/memory.c:1099 No locals. #5 0x5565cbbc in io_mem_read
Re: [Xen-devel] [PATCH V3 2/8] xen: Delay remapping memory of pv-domain
On Fri, Nov 14, 2014 at 06:14:06PM +0100, Juergen Gross wrote: On 11/14/2014 05:47 PM, Konrad Rzeszutek Wilk wrote: On Fri, Nov 14, 2014 at 05:53:19AM +0100, Juergen Gross wrote: On 11/13/2014 08:56 PM, Konrad Rzeszutek Wilk wrote: + mfn_save = virt_to_mfn(buf); + + while (xen_remap_mfn != INVALID_P2M_ENTRY) { So the 'list' is constructed by going forward - that is from low-numbered PFNs to higher numbered ones. But the 'xen_remap_mfn' is going the other way - from the highest PFN to the lowest PFN. Won't that mean we will restore the chunks of memory in the wrong order? That is we will still restore them in chunks size, but the chunks will be in descending order instead of ascending? No, the information where to put each chunk is contained in the chunk data. I can add a comment explaining this. Right, the MFNs in a chunks are going to be restored in the right order. I was thinking that the chunks (so a set of MFNs) will be restored in the opposite order that they are written to. And oddly enough the chunks are done in 512-3 = 509 MFNs at once? More don't fit on a single page due to the other info needed. So: yes. But you could use two pages - one for the structure and the other for the list of MFNs. That would fix the problem of having only 509 MFNs being contingous per chunk when restoring. That's no problem (see below). Anyhow the point I had that I am worried is that we do not restore the MFNs in the same order. We do it in chunk size which is OK (so the 509 MFNs at once)- but the order we traverse the restoration process is the opposite of the save process. Say we have 4MB of contingous MFNs, so two (err, three) chunks. The first one we iterate is from 0-509, the second is 510-1018, the last is 1019-1023. When we restore (remap) we start with the last 'chunk' so we end up restoring them: 1019-1023, 510-1018, 0-509 order. No. When building up the chunks we save in each chunk where to put it on remap. So in your example 0-509 should be mapped at dest+0, 510-1018 at dest+510, and 1019-1023 at dest+1019. When remapping we map 1019-1023 to dest+1019, 510-1018 at dest+510 and last 0-509 at dest+0. So we do the mapping in reverse order, but to the correct pfns. Excellent! Could a condensed version of that explanation be put in the code ? Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V3] Decouple SnadyBridge quirk form VTd timeout
Currently the quirk code for SandyBridge uses the VTd timeout value when writing to an IGD register. This is the wrong timeout to use and, at 1000 msec., is also much too large. This patch changes the quirk code to use a timeout that is specific to the IGD device and allows the user control of the timeout. Boolean settings for the boot parameter `snb_igd_quirk' keep their current meaning, enabling or disabling the quirk code with a timeout of 1000 msec. In addition specifying `snb_igd_quirk=default' will enable the code and set the timeout to the theoretical maximum of 670 msec. For finer control, specifying `snb_igd_quirk=n', where `n' is a decimal number, will enable the code and set the timeout to `n' msec. Signed-off-by: Don Dugger donald.d.dug...@intel.com -- diff -r 9d485e2c8339 xen/drivers/passthrough/vtd/quirks.c --- a/xen/drivers/passthrough/vtd/quirks.c Mon Nov 10 12:03:36 2014 + +++ b/xen/drivers/passthrough/vtd/quirks.c Wed Nov 19 09:49:31 2014 -0700 @@ -50,6 +50,10 @@ #define IS_ILK(id)(id == 0x00408086 || id == 0x00448086 || id== 0x00628086 || id == 0x006A8086) #define IS_CPT(id)(id == 0x01008086 || id == 0x01048086) +#define SNB_IGD_TIMEOUT_LEGACY MILLISECS(1000) +#define SNB_IGD_TIMEOUTMILLISECS( 670) +static u32 snb_igd_timeout = 0; + static u32 __read_mostly ioh_id; static u32 __initdata igd_id; bool_t __read_mostly rwbf_quirk; @@ -158,6 +162,16 @@ * Workaround is to prevent graphics get into RC6 * state when doing VT-d IOTLB operations, do the VT-d * IOTLB operation, and then re-enable RC6 state. + * + * This quirk is enabled with the snb_igd_quirk command + * line parameter. Specifying snb_igd_quirk with no value + * (or any of the standard boolean values) enables this + * quirk and sets the timeout to the legacy timeout of + * 1000 msec. Setting this parameter to the string + * default enables this quirk and sets the timeout to + * the theoretical maximum of 670 msec. Setting this + * parameter to a numerical value enables the quirk and + * sets the timeout to that numerical number of msecs. */ static void snb_vtd_ops_preamble(struct iommu* iommu) { @@ -177,7 +191,7 @@ start_time = NOW(); while ( (*(volatile u32 *)(igd_reg_va + 0x22AC) 0xF) != 0 ) { -if ( NOW() start_time + DMAR_OPERATION_TIMEOUT ) +if ( NOW() start_time + snb_igd_timeout ) { dprintk(XENLOG_INFO VTDPREFIX, snb_vtd_ops_preamble: failed to disable idle handshake\n); @@ -208,13 +222,10 @@ * call before VT-d translation enable and IOTLB flush operations. */ -static int snb_igd_quirk; -boolean_param(snb_igd_quirk, snb_igd_quirk); - void vtd_ops_preamble_quirk(struct iommu* iommu) { cantiga_vtd_ops_preamble(iommu); -if ( snb_igd_quirk ) +if ( snb_igd_timeout != 0 ) { spin_lock(igd_lock); @@ -228,7 +239,7 @@ */ void vtd_ops_postamble_quirk(struct iommu* iommu) { -if ( snb_igd_quirk ) +if ( snb_igd_timeout != 0 ) { snb_vtd_ops_postamble(iommu); @@ -237,6 +248,42 @@ } } +static void __init parse_snb_timeout(const char *s) +{ + int not; + + switch (*s) { + + case '\0': + snb_igd_timeout = SNB_IGD_TIMEOUT_LEGACY; + break; + + case '0': case '1': case '2': + case '3': case '4': case '5': + case '6': case '7': case '8': + case '9': + snb_igd_timeout = MILLISECS(simple_strtoul(s, s, 0)); + if ( snb_igd_timeout == MILLISECS(1) ) + snb_igd_timeout = SNB_IGD_TIMEOUT_LEGACY; + break; + + default: + if ( strncmp(default, s, 7) == 0 ) { + snb_igd_timeout = SNB_IGD_TIMEOUT; + break; + } + not = !strncmp(no-, s, 3); + if ( not ) + s += 3; + if ( not ^ parse_bool(s) ) + snb_igd_timeout = SNB_IGD_TIMEOUT_LEGACY; + break; + + } + return; +} +custom_param(snb_igd_quirk, parse_snb_timeout); + /* 5500/5520/X58 Chipset Interrupt remapping errata, for stepping B-3. * Fixed in stepping C-2. */ static void __init tylersburg_intremap_quirk(void) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V3 7/8] xen: switch to linear virtual mapped sparse p2m list
On Tue, Nov 11, 2014 at 06:43:45AM +0100, Juergen Gross wrote: At start of the day the Xen hypervisor presents a contiguous mfn list to a pv-domain. In order to support sparse memory this mfn list is accessed via a three level p2m tree built early in the boot process. Whenever the system needs the mfn associated with a pfn this tree is used to find the mfn. Instead of using a software walked tree for accessing a specific mfn list entry this patch is creating a virtual address area for the entire possible mfn list including memory holes. The holes are covered by mapping a pre-defined page consisting only of invalid mfn entries. Access to a mfn entry is possible by just using the virtual base address of the mfn list and the pfn as index into that list. This speeds up the (hot) path of determining the mfn of a pfn. Kernel build on a Dell Latitude E6440 (2 cores, HT) in 64 bit Dom0 showed following improvements: Elapsed time: 32:50 - 32:35 System: 18:07 - 17:47 User:104:00 - 103:30 Tested on 64 bit dom0 and 32 bit domU. Signed-off-by: Juergen Gross jgr...@suse.com --- arch/x86/include/asm/xen/page.h | 14 +- arch/x86/xen/mmu.c | 32 +- arch/x86/xen/p2m.c | 732 +--- arch/x86/xen/xen-ops.h | 2 +- 4 files changed, 342 insertions(+), 438 deletions(-) diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index 07d8a7b..4a227ec 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -72,7 +72,19 @@ extern unsigned long m2p_find_override_pfn(unsigned long mfn, unsigned long pfn) */ static inline unsigned long __pfn_to_mfn(unsigned long pfn) { - return get_phys_to_machine(pfn); + unsigned long mfn; + + if (pfn xen_p2m_size) + mfn = xen_p2m_addr[pfn]; + else if (unlikely(pfn xen_max_p2m_pfn)) + return get_phys_to_machine(pfn); + else + return IDENTITY_FRAME(pfn); + + if (unlikely(mfn == INVALID_P2M_ENTRY)) + return get_phys_to_machine(pfn); + + return mfn; } static inline unsigned long pfn_to_mfn(unsigned long pfn) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 31ca515..0b43c45 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1158,20 +1158,16 @@ static void __init xen_cleanhighmap(unsigned long vaddr, * instead of somewhere later and be confusing. */ xen_mc_flush(); } -static void __init xen_pagetable_p2m_copy(void) + +static void __init xen_pagetable_p2m_free(void) { unsigned long size; unsigned long addr; - unsigned long new_mfn_list; - - if (xen_feature(XENFEAT_auto_translated_physmap)) - return; size = PAGE_ALIGN(xen_start_info-nr_pages * sizeof(unsigned long)); - new_mfn_list = xen_revector_p2m_tree(); /* No memory or already called. */ - if (!new_mfn_list || new_mfn_list == xen_start_info-mfn_list) + if ((unsigned long)xen_p2m_addr == xen_start_info-mfn_list) return; /* using __ka address and sticking INVALID_P2M_ENTRY! */ @@ -1189,8 +1185,6 @@ static void __init xen_pagetable_p2m_copy(void) size = PAGE_ALIGN(xen_start_info-nr_pages * sizeof(unsigned long)); memblock_free(__pa(xen_start_info-mfn_list), size); - /* And revector! Bye bye old array */ - xen_start_info-mfn_list = new_mfn_list; /* At this stage, cleanup_highmap has already cleaned __ka space * from _brk_limit way up to the max_pfn_mapped (which is the end of @@ -1214,12 +1208,26 @@ static void __init xen_pagetable_p2m_copy(void) } #endif -static void __init xen_pagetable_init(void) +static void __init xen_pagetable_p2m_setup(void) { - paging_init(); + if (xen_feature(XENFEAT_auto_translated_physmap)) + return; + + xen_vmalloc_p2m_tree(); + #ifdef CONFIG_X86_64 - xen_pagetable_p2m_copy(); + xen_pagetable_p2m_free(); #endif + /* And revector! Bye bye old array */ + xen_start_info-mfn_list = (unsigned long)xen_p2m_addr; +} + +static void __init xen_pagetable_init(void) +{ + paging_init(); + + xen_pagetable_p2m_setup(); + /* Allocate and initialize top and mid mfn levels for p2m structure */ xen_build_mfn_list_list(); diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 328875a..7df446d 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -3,21 +3,22 @@ * guests themselves, but it must also access and update the p2m array * during suspend/resume when all the pages are reallocated. * - * The p2m table is logically a flat array, but we implement it as a - * three-level tree to allow the address space to be sparse. + * The logical flat p2m table is mapped to a linear kernel memory area. + * For accesses by Xen a three-level tree linked via
Re: [Xen-devel] [PATCH V3 0/8] xen: Switch to virtual mapped linear p2m list
On Tue, Nov 11, 2014 at 06:43:38AM +0100, Juergen Gross wrote: Paravirtualized kernels running on Xen use a three level tree for translation of guest specific physical addresses to machine global addresses. This p2m tree is used for construction of page table entries, so the p2m tree walk is performance critical. By using a linear virtual mapped p2m list accesses to p2m elements can be sped up while even simplifying code. To achieve this goal some p2m related initializations have to be performed later in the boot process, as the final p2m list can be set up only after basic memory management functions are available. Hey Juergen, I finially finished looking at the patchset. Had some comments, some questions that I hope can make it in the patch so that in six months or so when somebody looks at the code they can understand the subtle pieces. Looking forward to the v4! (Thought keep in mind that next week is Thanksgiving week so won't be able to look much after Wednesday) arch/x86/include/asm/pgtable_types.h |1 + arch/x86/include/asm/xen/page.h | 49 +- arch/x86/mm/pageattr.c | 20 + arch/x86/xen/mmu.c | 38 +- arch/x86/xen/p2m.c | 1315 ++ arch/x86/xen/setup.c | 460 ++-- arch/x86/xen/xen-ops.h |6 +- 7 files changed, 854 insertions(+), 1035 deletions(-) And best of - we are deleting more code! -- 2.1.2 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] libxl: remove existence check for PCI device hotplug
On Mon, Nov 17, 2014 at 12:10:34PM +, Wei Liu wrote: The existence check is to make sure a device is not added to a guest multiple times. PCI device backend path has different rules from vif, disk etc. For example: /local/domain/0/backend/pci/9/0/dev-1/:03:10.1 /local/domain/0/backend/pci/9/0/key-1/:03:10.1 /local/domain/0/backend/pci/9/0/dev-2/:03:10.2 /local/domain/0/backend/pci/9/0/key-2/:03:10.2 The devid for PCI devices is hardcoded 0. libxl__device_exists only checks up to /local/.../9/0 so it always returns true even the device is assignable. Remove invocation of libxl__device_exists. We're sure at this point that the PCI device is assignable (hence no xenstore entry or JSON entry). The check is done before hand. For HVM guest it's done by calling xc_test_assign_device and for PV guest it's done by calling pciback_dev_is_assigned. Reported-by: Li, Liang Z liang.z...@intel.com Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Konrad Wilk konrad.w...@oracle.com --- This patch fixes a regression in 4.5. Ouch! That needs then to be fixed. Is the version you would want to commit? I did test it - and it looked to do the right thing - thought the xen-pciback is stuck in the 7 state. However that is a seperate issue that I believe is due to Xen pciback not your patches. The risk is that I misunderstood semantics of xc_test_assign_device and pciback_dev_is_assigned and end up adding several entries to JSON config template. But if the assignable tests are incorrect I think we have a bigger problem to worry about than duplicated entries in JSON template. It would be good for someone to have PCI hotplug setup to run a quick test. I think Liang confirmed (indrectly) that xc_test_assign_device worked well for him so I think there's won't be multiple JSON template entries for HVM guests. However PV side still remains to be tested. --- tools/libxl/libxl_pci.c |8 1 file changed, 8 deletions(-) diff --git a/tools/libxl/libxl_pci.c b/tools/libxl/libxl_pci.c index 9f40100..316643c 100644 --- a/tools/libxl/libxl_pci.c +++ b/tools/libxl/libxl_pci.c @@ -175,14 +175,6 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc, uint32_t domid, libxl_d rc = libxl__xs_transaction_start(gc, t); if (rc) goto out; -rc = libxl__device_exists(gc, t, device); -if (rc 0) goto out; -if (rc == 1) { -LOG(ERROR, device already exists in xenstore); -rc = ERROR_DEVICE_EXISTS; -goto out; -} - rc = libxl__set_domain_configuration(gc, domid, d_config); if (rc) goto out; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v9 12/13] swiotlb-xen: pass dev_addr to xen_dma_unmap_page and xen_dma_sync_single_for_cpu
On Wed, Nov 12, 2014 at 11:40:53AM +, Stefano Stabellini wrote: xen_dma_unmap_page and xen_dma_sync_single_for_cpu take a dma_addr_t handle as argument, not a physical address. Ouch. Should this also go on stable tree? Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com Reviewed-by: Catalin Marinas catalin.mari...@arm.com --- drivers/xen/swiotlb-xen.c |6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c index 3725ee4..498b654 100644 --- a/drivers/xen/swiotlb-xen.c +++ b/drivers/xen/swiotlb-xen.c @@ -449,7 +449,7 @@ static void xen_unmap_single(struct device *hwdev, dma_addr_t dev_addr, BUG_ON(dir == DMA_NONE); - xen_dma_unmap_page(hwdev, paddr, size, dir, attrs); + xen_dma_unmap_page(hwdev, dev_addr, size, dir, attrs); /* NOTE: We use dev_addr here, not paddr! */ if (is_xen_swiotlb_buffer(dev_addr)) { @@ -497,14 +497,14 @@ xen_swiotlb_sync_single(struct device *hwdev, dma_addr_t dev_addr, BUG_ON(dir == DMA_NONE); if (target == SYNC_FOR_CPU) - xen_dma_sync_single_for_cpu(hwdev, paddr, size, dir); + xen_dma_sync_single_for_cpu(hwdev, dev_addr, size, dir); /* NOTE: We use dev_addr here, not paddr! */ if (is_xen_swiotlb_buffer(dev_addr)) swiotlb_tbl_sync_single(hwdev, paddr, size, dir, target); if (target == SYNC_FOR_DEVICE) - xen_dma_sync_single_for_cpu(hwdev, paddr, size, dir); + xen_dma_sync_single_for_cpu(hwdev, dev_addr, size, dir); if (dir != DMA_FROM_DEVICE) return; -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] set pv guest default video_memkb to 0
On Tue, Nov 18, 2014 at 03:57:08PM -0500, Zhigang Wang wrote: Before this patch, pv guest video_memkb is -1, which is an invalid value. And it will cause the xenstore 'memory/targe' calculation wrong: memory/target = info-target_memkb - info-video_memkb CC-ing the maintainers. Is this an regression as compared to Xen 4.4 or is this also in Xen 4.4? Thanks. Signed-off-by: Zhigang Wang zhigang.x.w...@oracle.com --- tools/libxl/libxl_create.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index b1ff5ae..1198225 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -357,6 +357,8 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc, break; case LIBXL_DOMAIN_TYPE_PV: libxl_defbool_setdefault(b_info-u.pv.e820_host, false); +if (b_info-video_memkb == LIBXL_MEMKB_DEFAULT) +b_info-video_memkb = 0; if (b_info-shadow_memkb == LIBXL_MEMKB_DEFAULT) b_info-shadow_memkb = 0; if (b_info-u.pv.slack_memkb == LIBXL_MEMKB_DEFAULT) -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] docs/commandline: Fix formatting issues
On Wed, Nov 19, 2014 at 11:22:18AM +, Ian Campbell wrote: On Wed, 2014-11-19 at 11:17 +, Andrew Cooper wrote: In both of these cases, markdown was interpreting the text as regular text, and reflowing it as a regular paragraph, leading to a single line as output. Reformat them as code blocks inside blockquote blocks, which causes them to take their precise whitespace layout. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com CC: Ian Jackson ian.jack...@eu.citrix.com CC: Wei Liu wei.l...@citrix.com CC: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- Konrad: this is a documentation fix, so requesting a 4.5 ack please. FWIW IMHO documentation fixes in general should have a very low bar to cross until very late in the release cycle... I concur, I updated the release criteria doc so that it will be expediated in the future. --- docs/misc/xen-command-line.markdown | 38 +-- 1 file changed, 19 insertions(+), 19 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index f054d4b..e3a5a15 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -475,13 +475,13 @@ defaults of 1 and unlimited respectively are used instead. For example, with `dom0_max_vcpus=4-8`: - Number of - PCPUs | Dom0 VCPUs - 2| 4 - 4| 4 - 6| 6 - 8| 8 - 10| 8 +Number of + PCPUs | Dom0 VCPUs + 2| 4 + 4| 4 + 6| 6 + 8| 8 + 10| 8 ### dom0\_mem `= List of ( min:size | max:size | size )` @@ -684,18 +684,18 @@ supported only when compiled with XSM\_ENABLE=y on x86. The specified value is a bit mask with the individual bits having the following meaning: -Bit 0 - debug level 0 (unused at present) -Bit 1 - debug level 1 (Control Register logging) -Bit 2 - debug level 2 (VMX logging of MSR restores when context switching) -Bit 3 - debug level 3 (unused at present) -Bit 4 - I/O operation logging -Bit 5 - vMMU logging -Bit 6 - vLAPIC general logging -Bit 7 - vLAPIC timer logging -Bit 8 - vLAPIC interrupt logging -Bit 9 - vIOAPIC logging -Bit 10 - hypercall logging -Bit 11 - MSR operation logging + Bit 0 - debug level 0 (unused at present) + Bit 1 - debug level 1 (Control Register logging) + Bit 2 - debug level 2 (VMX logging of MSR restores when context switching) + Bit 3 - debug level 3 (unused at present) + Bit 4 - I/O operation logging + Bit 5 - vMMU logging + Bit 6 - vLAPIC general logging + Bit 7 - vLAPIC timer logging + Bit 8 - vLAPIC interrupt logging + Bit 9 - vIOAPIC logging + Bit 10 - hypercall logging + Bit 11 - MSR operation logging Recognized in debug builds of the hypervisor only. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] set pv guest default video_memkb to 0
On Wed, Nov 19, 2014 at 04:08:46PM -0500, Konrad Rzeszutek Wilk wrote: On Tue, Nov 18, 2014 at 03:57:08PM -0500, Zhigang Wang wrote: Before this patch, pv guest video_memkb is -1, which is an invalid value. And it will cause the xenstore 'memory/targe' calculation wrong: memory/target = info-target_memkb - info-video_memkb CC-ing the maintainers. Is this an regression as compared to Xen 4.4 or is this also in Xen 4.4? I don't think this is a regression, it has been broken for quite a while. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for-4.5] libxl: remove existence check for PCI device hotplug
On Wed, Nov 19, 2014 at 09:21:23PM +, Wei Liu wrote: On Wed, Nov 19, 2014 at 04:01:54PM -0500, Konrad Rzeszutek Wilk wrote: On Mon, Nov 17, 2014 at 12:10:34PM +, Wei Liu wrote: The existence check is to make sure a device is not added to a guest multiple times. PCI device backend path has different rules from vif, disk etc. For example: /local/domain/0/backend/pci/9/0/dev-1/:03:10.1 /local/domain/0/backend/pci/9/0/key-1/:03:10.1 /local/domain/0/backend/pci/9/0/dev-2/:03:10.2 /local/domain/0/backend/pci/9/0/key-2/:03:10.2 The devid for PCI devices is hardcoded 0. libxl__device_exists only checks up to /local/.../9/0 so it always returns true even the device is assignable. Remove invocation of libxl__device_exists. We're sure at this point that the PCI device is assignable (hence no xenstore entry or JSON entry). The check is done before hand. For HVM guest it's done by calling xc_test_assign_device and for PV guest it's done by calling pciback_dev_is_assigned. Reported-by: Li, Liang Z liang.z...@intel.com Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Konrad Wilk konrad.w...@oracle.com --- This patch fixes a regression in 4.5. Ouch! That needs then to be fixed. Is the version you would want to commit? I did test it - and it Yes. Then Release-Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com looked to do the right thing - thought the xen-pciback is stuck in the 7 state. However that is a seperate issue that I believe is due to Xen pciback not your patches. Thanks for testing. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v0 RFC 0/2] xl/libxl support for PVUSB
On Sun, Nov 16, 2014 at 10:36:28AM +0800, Simon Cao wrote: Hi, I was working on the work. But I was busing preparing some job interviews in the last three months, sorry for this long delay. I will update my progress in a few days. OK, I put your name for this to be in Xen 4.6. Thanks! Thanks! Bo Cao On Mon, Nov 10, 2014 at 4:37 PM, Chun Yan Liu cy...@suse.com wrote: Is there any progress on this work? I didn't see new version after this. Anyone knows the status? Thanks, Chunyan On 8/11/2014 at 04:23 AM, in message 1407702234-22309-1-git-send-email-caobosi...@gmail.com, Bo Cao caobosi...@gmail.com wrote: Finally I have a workable version xl/libxl support for PVUSB. Most of its commands work property now, but there are still some probelm to be solved. Please take a loot and give me some advices. == What have been implemented ? == I have implemented libxl functions for PVUSB in libxl_usb.c. It mainly consists of two part: usbctrl_add/remove/list and usb_add/remove/list in which usbctrl denote usb controller in which usd device can be plugged in. I don't use ao_dev in libxl_deivce_usbctrl_add since we don't need to execute hotplug script for usbctrl and without ao_dev, adding default usbctrl for usb device would be easier. For the cammands to manipulate usb device such as xl usb-attach and xl usb-detach, this patch now only support to specify usb devices by their interface in sysfs. Using this interface, we can read usb device information through sysfs and bind/unbind usb device. (The support for mapping the lsusb bus:addr to the sysfs usb interface will come later). == What needs to do next ? == There are two main problems to be solved. 1. PVUSB Options in VM Guest's Configuration File The interface in VM Guest's configuration file to add usb device is: usb=[interface=1-1]. But the problem is now is that after the default usbctrl is added, the state of usbctrl is 2, e,g, XenbusStateInitWait, waiting for xen-usbfront to connect. The xen-usbfront in VM Guest isn't loaded. Therefore, sysfs_intf_write will report error. Does anyone have any clue how to solve this? 2. sysfs_intf_write In the process of xl usb-attach domid intf=1-1, after writing 1-1 to Xenstore entry, we need to bind the controller of this usb device to usbback driver so that it can be used by VM Guest. For exampele, for usb device 1-1, it's controller interface maybe 1-1:1.0, and we write this value to /sys/bus/usb/driver/usbback/bind. But for some devices, they have two controllers, for example 1-1:1.0 and 1-1:1.1. I think this means it has two functions, such as usbhid and usb-storage. So in this case, we bind the two controller to usbback? There maybe some errors or bugs in the codes. Feel free to tell me. Cheers, - Simon --- CC: George Dunlap george.dun...@eu.citrix.com CC: Ian Jackson ian.jack...@citrix.com CC: Ian Campbell ian.campb...@citrix.com CC: Pasi Kärkkäinen pa...@iki.fi CC: Lars Kurth lars.ku...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v1 for-xen-4.5] Fix list corruption in dpci_softirq.
On Wed, Nov 19, 2014 at 08:17:35PM +0100, Sander Eikelenboom wrote: Wednesday, November 19, 2014, 8:01:31 PM, you wrote: On Wed, Nov 19, 2014 at 07:54:39PM +0100, Sander Eikelenboom wrote: Wednesday, November 19, 2014, 6:31:39 PM, you wrote: Hey, This patch should fix the issue that Sander had seen. The full details are in the patch itself. Sander, if you could - please test origin/staging with this patch to make sure it does fix the issue. xen/drivers/passthrough/io.c | 27 +-- Konrad Rzeszutek Wilk (1): dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked. 1 file changed, 17 insertions(+), 10 deletions(-) Hi Konrad, Hmm just tested with a freshly cloned tree .. unfortunately it blew up again. (i must admit i also re-enabled stuff i had disabled in debugging like, cpuidle, cpufreq). Argh. Could you also try the first patch the STATE_ZOMBIE one? Building now .. (Attached and inline) Sander mentioned to me over IRC that with the STATE_ZOMBIE patch things work peachy for him. The patch in combination with the previous adds two extra paths: 1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is cleared. 2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the STATE_ZOMBIE. Lets follow the case without the zombie patch and with the zombie patch: w/o zombie: timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 0. pirq_dpci is still on dpci_list. dpci_sofitrq while (!list_emptry(our_list)) list_del, but has not yet done 'entry-next = LIST_POISON1;' [interrupt happens] raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list. [interrupt is done, back to dpci_softirq] finishes the entry-next = LIST_POISON1; .. test STATE_SCHED returns true, so executes the hvm_dirq_assist. ends the loop, exits. dpci_softirq while (!list_emtpry) list_del, but -next already has LIST_POISON1 and we blow up. w/ zombie: timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to STATE_ZOMBIE. pirq_dpci is still on dpci_list. dpci_sofitrq while (!list_emptry(our_list)) list_del, but has not yet done 'entry-next = LIST_POISON1;' [interrupt happens] raise_softirq checks state, it is STATE_ZOMBIE so returns. [interrupt is done, back to dpci_softirq] finishes the entry-next = LIST_POISON1; .. test STATE_SCHED returns true, so executes the hvm_dirq_assist. ends the loop, exits. So it seems that the STATE_ZOMBIE is needed, but for a different reason that Jan initially thought of: From c89a97f695fda245f5fcb16ddb36d3df7f6f28b9 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk konrad.w...@oracle.com Date: Fri, 14 Nov 2014 12:15:26 -0500 Subject: [PATCH] dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq. When we want to cancel an outstanding 'struct hvm_pirq_dpci' we perform and cmpxch on the state to set it to zero. That is OK on the teardown paths as it is guarnateed that the do_IRQ action handler has been removed. Hence no more interrupts can be scheduled. But with the introduction of dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked. we now utilize the pt_pirq_softirq_cancel when we want to cancel outstanding operations. However once we cancel them the do_IRQ is free to schedule them back in - even if said 'struct hvm_pirq_dpci' is still on the dpci_list. The code base before this patch could follow this race: \-timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to 0. pirq_dpci is still on dpci_list. \- dpci_sofitrq while (!list_emptry(our_list)) list_del, but has not yet done 'entry-next = LIST_POISON1;' [interrupt happens] raise_softirq checks state which is zero. Adds pirq_dpci to the dpci_list. [interrupt is done, back to dpci_softirq] finishes the entry-next = LIST_POISON1; .. test STATE_SCHED returns true, so executes the hvm_dirq_assist. ends the loop, exits. \- dpci_softirq while (!list_emtpry) list_del, but -next already has LIST_POISON1 and we blow up. This patch in combination adds two extra paths: 1) in raise_softirq, we do delay scheduling of dcpi_pirq until STATE_ZOMBIE is cleared. 2) dpci_softirq will pick up the cancelled dpci_pirq and then clear the STATE_ZOMBIE. Using the example above the code-paths would be now: \- timer_softirq_action pt_irq_time_out calls pt_pirq_softirq_cancel which cmpxchg the state to STATE_ZOMBIE. pirq_dpci is still on dpci_list. \- dpci_sofitrq while (!list_emptry(our_list))
[Xen-devel] [for xen-4.5 PATCH v2] Fix list corruption in dpci_softirq.
Hey, Attached are two patches that fix the dpci_softirq list corruption that Sander was observing. xen/drivers/passthrough/io.c | 55 +++- 1 file changed, 39 insertions(+), 16 deletions(-) Konrad Rzeszutek Wilk (2): dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked. dpci: Add ZOMBIE state to allow the softirq to finish with the dpci_pirq. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [for-xen-4.5 PATCH v2 1/2] dpci: Fix list corruption if INTx device is used and an IRQ timeout is invoked.
If we pass in INTx type devices to a guest on an over-subscribed machine - and in an over-worked guest - we can cause the pirq_dpci-softirq_list to become corrupted. The reason for this is that the 'pt_irq_guest_eoi' ends up setting the 'state' to zero value. However the 'state' value (STATE_SCHED, STATE_RUN) is used to communicate between 'raise_softirq_for' and 'dpci_softirq' to determine whether the 'struct hvm_pirq_dpci' can be re-scheduled. We are ignoring the teardown path for simplicity for right now. The 'pt_irq_guest_eoi' was not adhering to the proper dialogue and was not using locked cmpxchg or test_bit operations and ended setting 'state' set to zero. That meant 'raise_softirq_for' was free to schedule it while the 'struct hvm_pirq_dpci'' was still on an per-cpu list. The end result was list_del being called twice and the second call corrupting the per-cpu list. For this to occur one of the CPUs must be in the idle loop executing softirqs and the interrupt handler in the guest must not respond to the pending interrupt within 8ms, and we must receive another interrupt for this device on another CPU. CPU0: CPU1: timer_softirq_action \- pt_irq_time_out state = 0;do_IRQ [out of timer code, theraise_softirq pirq_dpci is on the CPU0 dpci_list] [adds the pirq_dpci to CPU1 dpci_list as state == 0] softirq_dpci:softirq_dpci: list_del [list entries are poisoned] list_del = BOOM The fix is simple - enroll 'pt_irq_guest_eoi' to use the locked semantics for 'state'. We piggyback on pt_pirq_softirq_cancel (was pt_pirq_softirq_reset) to use cmpxchg. We also expand said function to reset the '-dom' only on the teardown paths - but not on the timeouts. Reported-and-Tested-by: Sander Eikelenboom li...@eikelenboom.it Signed-off-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com --- xen/drivers/passthrough/io.c | 27 +-- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index efc66dc..2039d31 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -57,7 +57,7 @@ enum { * This can be called multiple times, but the softirq is only raised once. * That is until the STATE_SCHED state has been cleared. The state can be * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'), - * or by 'pt_pirq_softirq_reset' (which will try to clear the state before + * or by 'pt_pirq_softirq_cancel' (which will try to clear the state before * the softirq had a chance to run). */ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) @@ -97,13 +97,15 @@ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci) } /* - * Reset the pirq_dpci-dom parameter to NULL. + * Cancels an outstanding pirq_dpci (if scheduled). Also if clear is set, + * reset pirq_dpci-dom parameter to NULL (used for teardown). * * This function checks the different states to make sure it can do it * at the right time. If it unschedules the 'hvm_dirq_assist' from running * it also refcounts (which is what the softirq would have done) properly. */ -static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) +static void pt_pirq_softirq_cancel(struct hvm_pirq_dpci *pirq_dpci, + unsigned int clear) { struct domain *d = pirq_dpci-dom; @@ -125,8 +127,13 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) * to a shortcut the 'dpci_softirq' implements. It stashes the 'dom' * in local variable before it sets STATE_RUN - and therefore will not * dereference '-dom' which would crash. + * + * However, if this is called from 'pt_irq_time_out' we do not want to + * clear the '-dom' as we can re-use the 'pirq_dpci' after that and + * need '-dom'. */ -pirq_dpci-dom = NULL; +if ( clear ) +pirq_dpci-dom = NULL; break; } } @@ -142,7 +149,7 @@ static int pt_irq_guest_eoi(struct domain *d, struct hvm_pirq_dpci *pirq_dpci, if ( __test_and_clear_bit(_HVM_IRQ_DPCI_EOI_LATCH_SHIFT, pirq_dpci-flags) ) { -pirq_dpci-state = 0; +pt_pirq_softirq_cancel(pirq_dpci, 0 /* keep dom */); pirq_dpci-pending = 0; pirq_guest_eoi(dpci_pirq(pirq_dpci)); } @@ -285,7 +292,7 @@ int pt_irq_create_bind( * to be scheduled but we must deal with the one that may be * in the queue. */ -pt_pirq_softirq_reset(pirq_dpci); +pt_pirq_softirq_cancel(pirq_dpci, 1 /* reset dom */); } } if ( unlikely(rc) ) @@ -536,9
Re: [Xen-devel] [PATCH v9 05/13] arm: introduce is_device_dma_coherent
On Tue, Nov 18, 2014 at 04:49:21PM +, Stefano Stabellini wrote: ping? Sending something which wants my attention _To:_ me is always a good idea :) The patch is fine in itself, but I have a niggle about the is_device_dma_coherent() - provided this is only used in architecture specific code, that should be fine. It could probably do with a comment to that effect in an attempt to discourage drivers using it (thereby becoming less portable to other architectures.) -- FTTC broadband for 0.8mile line: currently at 9.5Mbps down 400kbps up according to speedtest.net. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Regression, host crash with 4.5rc1
On 11/17/2014 23:54, Jan Beulich wrote: On 17.11.14 at 20:21, sfl...@ihonk.com wrote: Okay, I did a bisection and was not able to correlate the above error message with the problem I'm seeing. Not saying it's not related, but I had plenty of successful test runs in the presence of that error. Took me about a week (sometimes it takes as much as 6 hours to produce the error), but bisect narrowed it down to this commit: http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=9a727a813e9b25003e433b3d c3fa47e621f9e238 What do you think? Thanks for narrowing this, even if this change didn't show any other bad effects so far (and it's been widely tested by now), and even if problems here would generally be expected to surface independent of the use of PCI pass-through. But a hang (rather than a crash) would indeed be the most natural result of something being wrong here. To double check the result, could you, in an up-to-date tree, simply make x86's arch_skip_send_event_check() return 0 unconditionally? Made this change and the host was happy. Plus, without said adjustment, first just disable the MWAIT CPU idle driver (mwait-idle=0) and then, if that didn't make a difference, use of C states altogether (cpuidle=0). If any of this does make a difference, limiting use of C states without fully excluding their use may need to be the next step. Will do this next. Another thing - now that serial logging appears to be working for you, did you try whether the host, once hung, still reacts to serial input (perhaps force input to go to Xen right at boot via the conswitch= option)? If so, 'd' debug-key output would likely be the piece of most interest. Here you go. Performed with a checkout of 9a727a81 (because it was handy), let me know if you'd rather see the results from 4.5-rc2 or any other Xen debugging info: (XEN) 'd' pressed - dumping registers (XEN) (XEN) *** Dumping CPU0 guest state (d1v2): *** (XEN) [ Xen-4.5-unstable x86_64 debug=y Not tainted ] (XEN) CPU:0 (XEN) RIP:0010:[f8000281e2c1] (XEN) RFLAGS: 0002 CONTEXT: hvm guest (XEN) rax: 3acd4939f3e7 rbx: 3acd493a0cce rcx: (XEN) rdx: 3acd rsi: rdi: 0057 (XEN) rbp: 645c rsp: f880033edf90 r8: f880033edff0 (XEN) r9: r10: f880033ee040 r11: 000342934690 (XEN) r12: f880033ee3c8 r13: 1000 r14: (XEN) r15: 0058 cr0: 80050031 cr4: 06f8 (XEN) cr3: 66aca000 cr2: f9800268 (XEN) ds: 002b es: 002b fs: 0053 gs: 002b ss: 0018 cs: 0010 (XEN) (XEN) *** Dumping CPU1 host state: *** (XEN) [ Xen-4.5-unstable x86_64 debug=y Not tainted ] (XEN) CPU:1 (XEN) RIP:e008:[82d08012a9a1] _spin_unlock_irq+0x30/0x31 (XEN) RFLAGS: 0246 CONTEXT: hypervisor (XEN) rax: rbx: 8300a943e000 rcx: 0001 (XEN) rdx: 830c3dc7 rsi: 0004 rdi: 830c3dc7a088 (XEN) rbp: 830c3dc77ec8 rsp: 830c3dc77e40 r8: 830c3dc7a0a0 (XEN) r9: r10: f88002fd82a0 r11: f88002fe2d70 (XEN) r12: 151cc8b48756 r13: 8300a943e000 r14: 830c3dc7a088 (XEN) r15: 01c9c380 cr0: 8005003b cr4: 26f0 (XEN) cr3: 000c18962000 cr2: ff331aa0 (XEN) ds: es: fs: gs: ss: cs: e008 (XEN) Xen stack trace from rsp=830c3dc77e40: (XEN)82d080126ec5 82d080321280 830c3dc7a0a0 000100c77e78 (XEN)830c3dc7a080 82d0801b5277 8300a943e000 f88002fe2d70 (XEN)8300a943e000 01c9c380 82d0801e0f00 830c3dc77f08 (XEN)82d0802f8080 82d0802f8000 830c3dc7 (XEN)0001 830c3dc77ef8 82d08012a1b3 8300a943e000 (XEN)f88002fe2d70 36d08fbeebe8 000f 830c3dc77f08 (XEN)82d08012a20b 000f 82d0801e3d2a 0001 (XEN)000f 36d08fbeebe8 f88002fe2d70 000f (XEN)f88002fd8180 f88002fe2d70 f88002fd82a0 34711df61755 (XEN)f88002fd82a0 0002 f88002fd81c0 0400 (XEN) f88002fe2eb0 beefbeef f8000298520c (XEN)00bfbeef 0046 f88002fe2c20 beef (XEN)c2c2c2c2c2c2beef c2c2c2c2c2c2beef c2c2c2c2c2c2beef c2c2c2c2c2c2beef (XEN)c2c2c2c20001 8300a943e000 003bbd958e00 c2c2c2c2c2c2c2c2 (XEN) Xen call trace: (XEN)[82d08012a9a1] _spin_unlock_irq+0x30/0x31 (XEN)[82d08012a1b3] __do_softirq+0x81/0x8c (XEN)[82d08012a20b] do_softirq+0x13/0x15 (XEN)[82d0801e3d2a] vmx_asm_do_vmentry+0x2a/0x45 (XEN) (XEN) *** Dumping CPU1 guest state (d1v5): *** (XEN) [ Xen-4.5-unstable x86_64 debug=y Not tainted ] (XEN) CPU:
Re: [Xen-devel] [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps
On 19.11.14 at 02:26, tiejun.c...@intel.com wrote: So without lookuping devices[i], how can we call func() for each sbdf as you mentioned? You've got both rmrr and bdf in the body of for_each_rmrr_device(). After all - as I said - you just open-coded it. Yeah, so change this again, int intel_iommu_get_reserved_device_memory(iommu_grdm_t *func, void *ctxt) { struct acpi_rmrr_unit *rmrr; int rc = 0; unsigned int i; u16 bdf; for_each_rmrr_device ( rmrr, bdf, i ) { rc = func(PFN_DOWN(rmrr-base_address), PFN_UP(rmrr-end_address) - PFN_DOWN(rmrr-base_address), PCI_SBDF(rmrr-segment, bdf), ctxt); /* Hit this entry so just go next. */ if ( rc == 1 ) i = rmrr-scope.devices_cnt; else if ( rc 0 ) return rc; } return rc; } Better. Another improvement would be make it not depend on the internal workings of for_each_rmrr_device()... And in any case you should not special case 1 - just return when rc is negative and skip the rest of the current RMRR when it's positive. And of course make the function's final return value predictable. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [v7][RFC][PATCH 06/13] hvmloader/ram: check if guest memory is out of reserved device memory maps
From: Tian, Kevin Sent: Wednesday, November 19, 2014 4:18 PM From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Wednesday, November 12, 2014 5:57 PM On 12.11.14 at 10:13, tiejun.c...@intel.com wrote: On 2014/11/12 17:02, Jan Beulich wrote: On 12.11.14 at 09:45, tiejun.c...@intel.com wrote: #2 flags field in each specific device of new domctl would control whether this device need to check/reserve its own RMRR range. But its not dependent on current device assignment domctl, so the user can use them to control which devices need to work as hotplug later, separately. And this could be left as a second step, in order for what needs to be done now to not get more complicated that necessary. Do you mean currently we still rely on the device assignment domctl to provide SBDF? So looks nothing should be changed in our policy. I can't connect your question to what I said. What I tried to tell you Something is misunderstanding to me. was that I don't currently see a need to make this overly complicated: Having the option to punch holes for all devices and (by default) dealing with just the devices assigned at boot may be sufficient as a first step. Yet (repeating just to avoid any misunderstanding) that makes things easier only if we decide to require device assignment to happen before memory getting populated (since in that case there's Here what do you mean, 'if we decide to require device assignment to happen before memory getting populated'? Because -quote- In the present the device assignment is always after memory population. And I also mentioned previously I double checked this sequence with printk. Or you already plan or deciede to change this sequence? So it is now the 3rd time that I'm telling you that part of your decision making as to which route to follow should be to re-consider whether the current sequence of operations shouldn't be changed. Please also consult with the VT-d maintainers (hint to them: participating in this discussion publicly would be really nice) on _all_ decisions to be made here. Yang and I did some discussion here. We understand your point to avoid introducing new interface if we can leverage existing code. However it's not a trivial effort to move device assignment before populating p2m, and there is no other benefit of doing so except for this purpose. So we'd not suggest this way. Current option sounds a reasonable one, i.e. passing a list of BDFs assigned to this VM before populating p2m, and then having hypervisor to filter out reserved regions associated with those BDFs. This way libxc teaches Xen to create reserved regions once, and then later the filtered info is returned upon query. The limitation of wasted memory due to confliction can be mitigated, and we considered further enhancement can be made later in libxc that when populating p2m, the reserved regions can be skipped explicitly at initial p2m creation phase and then there would be no waste at all. But this optimization takes some time and can be built incrementally on current patch and interface, post 4.5 release. For now let's focus on the very correctness first. If you agree, Tiejun will move forward to send another series for 4.5. So far lots of opens have been closed with your help, but it also means original v7 needs a serious update then (latest code is in deep discussion list) Thanks Kevin ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel