Bug#810379: [Xen-devel] [BUG] pci-passthrough generates "xen:events: Failed to obtain physical IRQ" for some devices
On Mon, Feb 08, 2016 at 06:39:17PM +0100, Marek Marczykowski-Górecki wrote: > On Wed, Feb 03, 2016 at 10:26:58AM -0500, Konrad Rzeszutek Wilk wrote: > > On Wed, Feb 03, 2016 at 03:22:30PM +0100, Marek Marczykowski-Górecki wrote: > > > On Mon, Feb 01, 2016 at 09:50:53AM -0500, Konrad Rzeszutek Wilk wrote: > > > > > The second bullet looks at first pretty interesting from this PoV, > > > > > see http://xenbits.xen.org/xsa/advisory-157.html for info on the XSA > > > > > and > > > > > the various patches. Konrad is on the CC already so hopefully he has > > > > > some > > > > > ideas. > > > > > > > > Thanks. I will try to reproduce this with the upstream kernel first as > > > > those patches are there. > > > > > > According to one Qubes OS user report[1], the bug was introduced between > > > version, which differs only by XSA-155 patches (including one for > > > pciback), especially not XSA-157. > > > Maybe on some code path, some value is not copied back to > > > pdev->sh_info->op? > > > > I found two bugs (attached the draft not-compiled patches). Upstream > > wise I seem to be tripping over another issue. > > > > There is also some more work required in there to fix the MSI-x enable op. > > What exactly do you have in mind here? That four patches in your next > email? Or something not yet fixed? I posted it at some point. It was that the MSI-X enable op stashes the error value in op->value. But 'op->value' is an unsigned int so the value ends up being 0xfe or such. And the other PV frontends only check for !0 - and manufacture their own value (-EINVAL). Hence I want to update the pciff.h .. Oh here is the patch: Oh man. A year?! Anyhow this can be posted as a cleanup patch seperately of the bug-fixes. commit 393be47782bca7a24d3e365448d4d3d1a303abfe Author: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> Date: Wed Apr 1 17:01:26 2015 -0400 xen/pcifront/pciback: Update pciif.h with ->err and ->result values. The '->err' should contain only the XEN_PCI_ERR_* type values. The '->result' may contain -EXX values or any other value that the XEN_PCI_OP_* deems appropiate. As such update the header and also the implementations. Signed-off-by: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> Conflicts: drivers/xen/xen-pciback/pciback_ops.c Conflicts: drivers/xen/xen-pciback/pciback_ops.c diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c index b1ffebe..353c8a2 100644 --- a/drivers/pci/xen-pcifront.c +++ b/drivers/pci/xen-pcifront.c @@ -297,7 +297,7 @@ static int pci_frontend_enable_msix(struct pci_dev *dev, } else { dev_err(>dev, "enable msix get err %x\n", err); } - return err; + return err ? -EINVAL : 0; } static void pci_frontend_disable_msix(struct pci_dev *dev) diff --git a/drivers/xen/xen-pciback/pciback_ops.c b/drivers/xen/xen-pciback/pciback_ops.c index fa2b222..4db6c19 100644 --- a/drivers/xen/xen-pciback/pciback_ops.c +++ b/drivers/xen/xen-pciback/pciback_ops.c @@ -266,7 +266,7 @@ error: pr_warn_ratelimited("%s: error enabling MSI-X for guest %u: err %d!\n", pci_name(dev), pdev->xdev->otherend_id, result); - return result > 0 ? 0 : result; + return result >= 0 ? 0 : XEN_PCI_ERR_op_failed; } #endif diff --git a/include/xen/interface/io/pciif.h b/include/xen/interface/io/pciif.h index d9922ae..c8b674f 100644 --- a/include/xen/interface/io/pciif.h +++ b/include/xen/interface/io/pciif.h @@ -70,7 +70,7 @@ struct xen_pci_op { /* IN: what action to perform: XEN_PCI_OP_* */ uint32_t cmd; - /* OUT: will contain an error number (if any) from errno.h */ + /* OUT: will contain an XEN_PCI_ERR_* number. */ int32_t err; /* IN: which device to touch */ @@ -82,7 +82,9 @@ struct xen_pci_op { int32_t offset; int32_t size; - /* IN/OUT: Contains the result after a READ or the value to WRITE */ + /* IN/OUT: Contains the result after a READ or the value to WRITE. +* If the err does not have XEN_PCI_ERR_success, depending on +* XEN_PCI_OP_* might have the errno value. */ uint32_t value; /* IN: Contains extra infor for this operation */ uint32_t info; > > -- > Best Regards, > Marek Marczykowski-Górecki > Invisible Things Lab > A: Because it messes up the order in which people normally read text. > Q: Why is top-posting such a bad thing?
Bug#785187: [Pkg-xen-devel] Bug#785187: xen-hypervisor-4.5-amd64: Option ucode=scan is not working
according to the documentation the option ucode=scan should tell XEN to look for a microcode update in an uncompressed initrd. While I don’t use the Debian kernel the tools to generate the initrd are part of Debian. The command „cpio -i /boot/initrd.img-4.0.2-Dom0” creates the directory structure „kernel/x86/microcode/GenuineIntel.bin”, so I think the initrd is allright. Is the initramfs compressed? The scanning code can't deal if the initramfs is compressed - so you either have to glue the compressed initramfs and the cpio with the microcode together (where the microcode has to go first). -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#676360: [Xen-devel] [PATCH] thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE\
Nice. Andrew, any chane you could test this patch on the affected Xen hypervisors? Was it as easy to reproduce this on a RHEL5 (U1?) hypervisor or is it really only on Linode and Amazon EC2? Originally, I was able to reproduce the issue easily with a RHEL5 host. Now, with this patch it's fixed. OK, so Tested-by: Andrew Jones.. and from my perspective it looks good - so Acked-by: Konrad Rzeszutek Wilk konrad.w...@oracle.com Andrea, any chance you can respin this patch and send it to Linus for 3.5 please? -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#676360: [PATCH] thp: avoid atomic64_read in pmd_read_atomic for 32bit PAE\
On Thu, Jun 07, 2012 at 11:00:33PM +0200, Andrea Arcangeli wrote: In the x86 32bit PAE CONFIG_TRANSPARENT_HUGEPAGE=y case while holding the mmap_sem for reading, cmpxchg8b cannot be used to read pmd contents under Xen. So instead of dealing only with consistent pmdvals in pmd_none_or_trans_huge_or_clear_bad() (which would be conceptually simpler) we let pmd_none_or_trans_huge_or_clear_bad() deal with pmdvals where the low 32bit and high 32bit could be inconsistent (to avoid having to use cmpxchg8b). nods The only guarantee we get from pmd_read_atomic is that if the low part of the pmd was found null, the high part will be null too (so the pmd will be considered unstable). And if the low part of the pmd is found stable later, then it means the whole pmd was read atomically (because after a pmd is stable, neither MADV_DONTNEED nor page faults can alter it anymore, and we read the high part after the low part). In the 32bit PAE x86 case, it is enough to read the low part of the pmdval atomically to declare the pmd as stable and that's true for THP and no THP, furthermore in the THP case we also have a barrier() that will prevent any inconsistent pmdvals to be cached by a later re-read of the *pmd. Nice. Andrew, any chane you could test this patch on the affected Xen hypervisors? Was it as easy to reproduce this on a RHEL5 (U1?) hypervisor or is it really only on Linode and Amazon EC2? Signed-off-by: Andrea Arcangeli aarca...@redhat.com --- arch/x86/include/asm/pgtable-3level.h | 30 +- include/asm-generic/pgtable.h | 10 ++ 2 files changed, 27 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/pgtable-3level.h b/arch/x86/include/asm/pgtable-3level.h index 43876f1..cb00ccc 100644 --- a/arch/x86/include/asm/pgtable-3level.h +++ b/arch/x86/include/asm/pgtable-3level.h @@ -47,16 +47,26 @@ static inline void native_set_pte(pte_t *ptep, pte_t pte) * they can run pmd_offset_map_lock or pmd_trans_huge or other pmd * operations. * - * Without THP if the mmap_sem is hold for reading, the - * pmd can only transition from null to not null while pmd_read_atomic runs. - * So there's no need of literally reading it atomically. + * Without THP if the mmap_sem is hold for reading, the pmd can only + * transition from null to not null while pmd_read_atomic runs. So + * we can always return atomic pmd values with this function. * * With THP if the mmap_sem is hold for reading, the pmd can become - * THP or null or point to a pte (and in turn become stable) at any - * time under pmd_read_atomic, so it's mandatory to read it atomically - * with cmpxchg8b. + * trans_huge or none or point to a pte (and in turn become stable) + * at any time under pmd_read_atomic. We could read it really + * atomically here with a atomic64_read for the THP enabled case (and + * it would be a whole lot simpler), but to avoid using cmpxchg8b we + * only return an atomic pmdval if the low part of the pmdval is later + * found stable (i.e. pointing to a pte). And we're returning a none + * pmdval if the low part of the pmd is none. In some cases the high + * and low part of the pmdval returned may not be consistent if THP is + * enabled (the low part may point to previously mapped hugepage, + * while the high part may point to a more recently mapped hugepage), + * but pmd_none_or_trans_huge_or_clear_bad() only needs the low part + * of the pmd to be read atomically to decide if the pmd is unstable + * or not, with the only exception of when the low part of the pmd is + * zero in which case we return a none pmd. */ -#ifndef CONFIG_TRANSPARENT_HUGEPAGE static inline pmd_t pmd_read_atomic(pmd_t *pmdp) { pmdval_t ret; @@ -74,12 +84,6 @@ static inline pmd_t pmd_read_atomic(pmd_t *pmdp) return (pmd_t) { ret }; } -#else /* CONFIG_TRANSPARENT_HUGEPAGE */ -static inline pmd_t pmd_read_atomic(pmd_t *pmdp) -{ - return (pmd_t) { atomic64_read((atomic64_t *)pmdp) }; -} -#endif /* CONFIG_TRANSPARENT_HUGEPAGE */ static inline void native_set_pte_atomic(pte_t *ptep, pte_t pte) { diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index ae39c4b..0ff87ec 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -484,6 +484,16 @@ static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) /* * The barrier will stabilize the pmdval in a register or on * the stack so that it will stop changing under the code. + * + * When CONFIG_TRANSPARENT_HUGEPAGE=y on x86 32bit PAE, + * pmd_read_atomic is allowed to return a not atomic pmdval + * (for example pointing to an hugepage that has never been + * mapped in the pmd). The below checks will only care about + * the low part of the pmd with 32bit PAE x86 anyway, with the + * exception of pmd_none(). So the important thing
Bug#676360: [Xen-devel] xen: oops at atomic64_read_cx8+0x4
On Thu, Jun 07, 2012 at 12:33:55PM +0200, Andrea Arcangeli wrote: On Thu, Jun 07, 2012 at 02:33:33AM -0500, Jonathan Nieder wrote: Sergio Gelato wrote[1]: That 3.4.1-1~experimental.1 build (3.4-trunk-686-pae #1 SMP Wed Jun 6 15:11:31 UTC 2012 i686 GNU/Linux) is even less well-behaved under Xen: I'm getting a kernel OOPS at EIP: [c1168e54] atomic64_read_cx8+0x4/0xc SS:ESP e021:ca853c6c The top of the trace message unfortunately scrolled off the console before I could see it, and the message doesn't have time to make it to syslog (either local or remote). [...] Non-Xen boots proceed normally. Yeah, apparently[2] that's caused by commit 26c191788f18 Author: Andrea Arcangeli aarca...@redhat.com Date: Tue May 29 15:06:49 2012 -0700 mm: pmd_read_atomic: fix 32bit PAE pmd walk vs pmd_populate SMP race condition which was also included in Debian kernel 3.2.19-1. [1] http://bugs.debian.org/676360 [2] https://bugzilla.redhat.com/show_bug.cgi?id=829016#c4 Oops, sorry I didn't imagine atomic64_read on a pmd would trip. Hmm, so it looks like it used to do this: pmd = pmd_offset(pud, addr); .. pmd_t pmdval = *pmd; but now you do: pmd_t ret = (pmd_val)((u32)*tmp); ret |= (*tmp+1) 32; which would read the low first and then the high one next (or is the other way around?). The 'pmd_offset' beforehand manufactures the pmd using the PFN to MFN lookup tree (so that there aren't any hypercall or traps). Hm, with your change, you are still looking at the 'pmd' and its contents, except that you are reading the low and then the high part. Why that would trip the hypervisor is not clear to me. Perhaps in the past it only read the low bits? If there was Xen hypervisor log that might give some ideas. Is there any chance that the Linode folks could send that over? Unfortunately to support pagetable walking with mmap_sem hold for reading, we need an atomic read on 32bit PAE if CONFIG_TRANSPARENT_HUGEPAGE=y. The only case requiring this is 32bit PAE with CONFIG_TRANSPARENT_HUGEPAGE=y at build time. If you set CONFIG_TRANSPARENT_HUGEPAGE=n temporarily you should be able to work around this as I optimized the code in a way to avoid an expensive cmpxchg8b. Ah, by just skipping the thing if the low bits are zero. I guess if Xen can't be updated to handle an atomic64_read on a pmd in the guest, we can add a pmd_read paravirt op? Or if we don't want to break the paravirt interface a loop like gup_fast with irq disabled should also work but looping + local_irq_disable()/enable() sounded worse and more complex than a atomic64_read (gup fast already disables irqs because it doesn't hold the mmap_sem so it's a different cost I am not really sure what is at foot. It sounds like the hypervisor didn't like somebody reading the high and low bit, but isn't the pmdval_t still 64-bit ? So I would have thought this would have been triggered? Or is that the code on pmd_val never actually read the high bits (before your addition to the atomic_read?)? looping there). AFIK Xen disables THP during boot, so a check on THP being enabled and falling back in the THP=n version of pmd_read_atomic, would also be safe, but it's not so nice to do it with a runtime check. The thing is that I did install a 32-bit PAE guest (a Fedora) on a Fedora 17 dom0. So it looks like this is reading high part is fixed on the newer hypervisors, but now with the older ones. And the older one is Amazon EC2 so some .. hack to workaround older hypervisors could be added. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#642154: [Xen-devel] Re: BUG: unable to handle kernel paging request at ffff8803bb6ad000
On Sat, Oct 08, 2011 at 10:13:14AM +0400, rush wrote: OK, I tried it again, but Oops didn't gone. .. snip.. echo'Loading Xen 4.0-amd64 ...' multiboot /boot/xen-4.0-amd64.gz placeholder xsave=0 .. snip.. Was it right? Yup. I think.. this is a bit embarrassing. It took a bit of time for Intel folks to get the xsave part right and I remember seeing this error about a year ago with xsave on a Dell Optiplex 780. Hence I wonder if the fixes that ultimately went in 4.1.1 did not get ported over to 4.0 and you are just hitting that. Can I ask you to do one more thing? Can you upgrade to the xen-4.1.1 in the testing and try with the xsave (or without) and see if it works? holds his fingers hoping it is the xsave feature -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#642154: BUG: unable to handle kernel paging request at ffff8803bb6ad000
echo'Loading Xen 4.0-amd64 ...' multiboot /boot/xen-4.0-amd64.gz placeholder Oops. I meant to try it in the hypervisor - so right after placeholder add xsave=0 echo'Loading Linux 3.0.0-1-amd64 ...' module /boot/vmlinuz-3.0.0-1-amd64 placeholder root=/dev/mapper/xen-system ro xsave=0 quiet -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#642154: [Xen-devel] Re: Bug#642154: BUG: unable to handle kernel paging request at ffff8803bb6ad000
There's been some similar looking threads on xen-devel recently but I haven't paid attention to the details, list Konrad CC'd. Full log is at http://bugs.debian.org/642154. Does xsave=0 make a difference? -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#637234: [Xen-devel] Re: Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen
On Wed, Sep 07, 2011 at 02:51:04AM +0100, Ben Hutchings wrote: On Mon, 2011-08-29 at 10:08 -0400, Konrad Rzeszutek Wilk wrote: [...] Oh, I think I know _exactly_ what bug that is: This git commit: 280802657fb95c52bb5a35d43fea60351883b2af xen/blkback: When writting barriers set the sector number to zero has to be reverted. Specifically: commit 3f963cae3ef35d26fdd899c08797a598c5ca3e9b Author: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Date: Tue Jul 19 16:44:42 2011 -0700 Revert xen/blkback: When writting barriers set the sector number to zero... [...] and this one added: 25266338a41470a21e9b3974445be09e0640dda7 xen/blkback: don't fail empty barrier requests [...] Which repository are these in? Jeremy's: git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen.git Ben. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#637308: xen-linux-system-2.6.32-5-xen-amd64: with kernel option 'nosmp', dom0 hangup while init PCI-Express Fusion-MPT SAS
Looking at this again: this problem only really applies to dom0, and the new code won't even build in a domU-only kernel config with CONFIG_X86_IO_APIC unset. I think we actually need something like: Ok, that is Ok I think? We don't care about domU for this? Or is it that it will cause bootup issues _with_ domU's that are built as UP? That is not the case - as the smp.c won't be even built I am not sure what the concern here is... -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#637308: xen-linux-system-2.6.32-5-xen-amd64: with kernel option 'nosmp', dom0 hangup while init PCI-Express Fusion-MPT SAS
As I understand it, the kernel won't work in dom0 if the (PV) IOAPIC is disabled. CONFIG_XEN_DOM0 depends on CONFIG_X86_IO_APIC and we're now trying to catch the case where IOAPIC support is disabled at boot. However, in domU, IOAPIC support is not required (right?). CONFIG_XEN Yup. does not depend on CONFIG_X86_IO_APIC, so the following configuration is possible: CONFIG_SMP=y CONFIG_XEN=y # CONFIG_XEN_DOM0 is not set # CONFIG_X86_IO_APIC is not set And with this configuration the test for disabled IOAPIC support would fail to compile. I see what you mean... except I can't get make to do this. Can you send me the .config where you get the failure please? Will prep a patch for this, which is just going to guard the usage of 'ioapic_setup' with '#ifdef CONFIG_X86_IO_APIC' -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#637308: xen-linux-system-2.6.32-5-xen-amd64: with kernel option 'nosmp', dom0 hangup while init PCI-Express Fusion-MPT SAS
On Wed, Aug 31, 2011 at 09:01:40AM +0100, Ian Campbell wrote: On Tue, 2011-08-30 at 10:22 -0400, Konrad Rzeszutek Wilk wrote: It might make sense to also use 'xen_raw_printk' as sometimes you don't get to see the panic - you end up with this unhelpfull message: (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: .. snip.. so something like this: Fine by me, although I do wonder if maybe we shouldn't be fixing panic() itself or our console driver or something, this isn't the first such patch I've noticed which doubles up on the panic message. Is the underlying issue just that earlyprintk isn't on by default? Yup. earlyprintk=xen would do the same thing. Added this patch on the 3.1-rcX train with your Acked-by. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#637308: xen-linux-system-2.6.32-5-xen-amd64: with kernel option 'nosmp', dom0 hangup while init PCI-Express Fusion-MPT SAS
On Tue, Aug 30, 2011 at 09:04:30AM +0100, Ian Campbell wrote: On Mon, 2011-08-29 at 12:55 +0100, Ben Hutchings wrote: On Mon, 2011-08-29 at 10:07 +0400, Константин Алексеев wrote: I think this bug may be closed. I posted it to xen devel list and get answer: It's really an unsupported configuration. If you want to limit dom0 vcpus then dom0_max_vcpus= on Xen command line is the correct way. http://lists.xensource.com/archives/html/xen-devel/2011-08/msg00665.html Maybe we should panic in this case? It's a bit sad but yes I think that would be better than leaving traps for the unwary given that the issue is unlikely to bubble up most Xen developers' todo list any time soon Your use of skip_ioapic_setup clued me into the probable difference between nosmp and dom0_max_vcpus=1 -- the disabling of IOAPIC most likely matters to Xen. Konrad does that sound right? Yes. Something like this (untested): Looks plausible to me. diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index e79dbb9..2671b96 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -21,6 +21,7 @@ #include asm/desc.h #include asm/pgtable.h #include asm/cpu.h +#include asm/io_apic.h #include xen/interface/xen.h #include xen/interface/vcpu.h @@ -207,6 +208,12 @@ static void __init xen_smp_prepare_cpus(unsigned int max_cpus) unsigned cpu; unsigned int i; + if (skip_ioapic_setup) + panic((max_cpus == 0) ? + The nosmp parameter is incompatible with Xen; + use Xen dom0_max_vcpus=1 parameter : + The noapic parameter is incompatible with Xen); + It might make sense to also use 'xen_raw_printk' as sometimes you don't get to see the panic - you end up with this unhelpfull message: (XEN) domain_crash_sync called from entry.S (XEN) Domain 0 (vcpu#0) crashed on cpu#0: .. snip.. so something like this: diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c index b4533a8..8424dd4 100644 --- a/arch/x86/xen/smp.c +++ b/arch/x86/xen/smp.c @@ -32,6 +32,7 @@ #include xen/page.h #include xen/events.h +#include xen/hvc-console.h #include xen-ops.h #include mmu.h @@ -207,6 +208,15 @@ static void __init xen_smp_prepare_cpus(unsigned int max_cpus) unsigned cpu; unsigned int i; + if (skip_ioapic_setup) { + char *m = (max_cpus == 0) ? + The nosmp parameter is incompatible with Xen; \ + use Xen dom0_max_vcpus=1 parameter : + The noapic parameter is incompatible with Xen; + + xen_raw_printk(m); + panic(m); + } xen_init_lock_cpu(0); smp_store_cpu_info(0); -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#637234: [Xen-devel] Re: Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen
On Fri, Aug 26, 2011 at 06:58:34PM -0400, Gedalya wrote: One way to make sure that is not the case is to disable barriers in the guest. Meaning in /etc/fstab have something like this: /dev/xvdc /blah ext4errors=remount-ro,barrier=0 0 1 That seems to fix it. It was remounting as read only either during the boot process or immediately after, and now it boots up and seems to stay up. I'll test laster with a DomU that actually has things running. Yeeey! This also fixes the reboot problem I noted earlier, init 6 now reboots the DomU rather than destory it. The other question is what version of Dom0 are you running? Is it 2.6.32? 2.6.39? squeeze, running linux-image-2.6.32-5-xen-amd64 2.6.32-35 Oh, I think I know _exactly_ what bug that is: This git commit: 280802657fb95c52bb5a35d43fea60351883b2af xen/blkback: When writting barriers set the sector number to zero has to be reverted. Specifically: commit 3f963cae3ef35d26fdd899c08797a598c5ca3e9b Author: Jeremy Fitzhardinge jeremy.fitzhardi...@citrix.com Date: Tue Jul 19 16:44:42 2011 -0700 Revert xen/blkback: When writting barriers set the sector number to zero... This reverts commit 280802657fb95c52bb5a35d43fea60351883b2af. This patch is reported to cause disk corruption: From: Huang2, Wei wei.hua...@amd.com We recently found a disk corruption issue with SLES11 SP1 guest. Basically the guest disk becomes non-bootable after guest shutdown. This is a SLES specific issue as we didn’t see on other Linux and Windows VMs. Here is the configuration: 1. Xen: xen-4.1-testing, changeset 23096 2. Dom0: Jeremy’s latest pvops 6d94b75 (June 1) 3. VM: SLES 11 SP1, installed as physical machine with raw disk format Regarding the disk before corruption, “file sles11sp1.img” command read: “/root/guests/sles11-sp1/sles11sp1.img: x86 boot sector; partition 1: ID=0x82, starthead 1, startsector 63, 4208967 sectors; partition 2: ID=0x83, active, starthead 0, startsector 4209030, 16755795 sectors”. After corruption, it became a data file: ““/root/guests/sles11-sp1/sles11sp1.img: data”. and this one added: 25266338a41470a21e9b3974445be09e0640dda7 xen/blkback: don't fail empty barrier requests The sector number on empty barrier requests may (will?) be -1, which, given that it's being treated as unsigned 64-bit quantity, will almost always exceed the actual (virtual) disk's size. Inspired by Konrad's When writting barriers set the sector number to zero -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#637234: [Xen-devel] Re: Bug#637234: linux-image-3.0.0-1-686-pae: I/O errors using ext4 under xen
On Thu, Aug 25, 2011 at 07:47:08AM +0100, Ian Campbell wrote: Hi Konrad, Does this look at all familiar? There is some more info in the full bug log at http://bugs.debian.org/637234 . In particular, contrary to the message below, the user subsequently confirmed that the issue appears to be Xen specific (doesn't happen on native or vmware) and that it arose between 2.6.39-2-686-pae and 3.0.0-1-686-pae. Could it be related to edf6ef59ec7e xen-blkfront: Introduce BLKIF_OP_FLUSH_DISKCACHE support? That looks like the only pertinent change between 2.6.39 and 3.0. It shouldn't - from the look of it: [0.529412] blkfront: xvdc: barrier: enabled it looks as if the 'feature-barrier' is used. Not 'feature-flush-cache' - otherwise you would have seen a message about that. But then.. 3.0 (and 2.6.39) don't do barriers anymore. However the backend seems to do it. And from my understanding is that the barrier request is a superset of a flush request so it should work. Put maybe that is an incorrect assumption. One way to make sure that is not the case is to disable barriers in the guest. Meaning in /etc/fstab have something like this: /dev/xvdc /blah ext4errors=remount-ro,barrier=0 0 1 The other question is what version of Dom0 are you running? Is it 2.6.32? 2.6.39? -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#638172: [Xen-devel] Re: Bug#638172: BUG: soft lockup - CPU#0 stuck for 61s! [qemu-dm:3205]
On Mon, Aug 22, 2011 at 10:00:11AM +0100, Ian Campbell wrote: @xen-devel: Does this look familiar to anyone, this is (I expect, hopefully Giuseppe will confirm) from Debian Squeeze which has a Xen 4.0.x with a PVops dom0 kernel based on xen.git from last summer (e73f4955a821) with more recent upstream longterm kernels (up to and including 2.6.32.41) merged in. While it does seem to have the switch from level to edge triggered interrupt the Debian kernel doesn't appear to have the switch to fasteoi for pirqs (0672fb44a111 plus a few followups) -- could that be related to this? (I'm not sure if that was a cleanup or a fix) It was a fix. We had some interrupts getting wedged - but I don't recall the stack exactly. But there are some follows - like e5ac0bda96c495321dbad9b57a4b1a93a5a72e7f 7e186bdd0098b34c69fb8067c67340ae610ea499 Might the tsc unstable message be relevant? Hm, not sure. I keep on getting those on my guests but life seems to go on. The interesting about the stack trace is that it looks similiar to: http://groups.google.com/group/linux.kernel/browse_thread/thread/39a397566cafc979 which has some fixes https://patchwork.kernel.org/patch/1091772/ but they may not help. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#601341: Bug#604096: Bug#601341: Bug#602418: #601341, #602418 and #604096 seem to be duplicates
Thanks. S since the Debian kernel has has DRM/TTM from 2.6.33 I assume I want the NEEDS_IOREMAP (95518271) version. I'm about to try my backport of devel/ttm.pci-api-v2 which contains: drm/ttm: Add ttm_tt_free_page ttm: Introduce a placeholder for DMA (bus) addresses. ttm: Utilize the dma_addr_t array for pages that are to in DMA32 pool. ttm: Expand (*populate) to support an array of DMA addresses. radeon/ttm/PCIe: Use dma_addr if TTM has set it. nouveau/ttm/PCIe: Use dma_addr if TTM has set it. radeon/PCIe: Use the correct index field. plus: 9551827190db ttm: Set VM_IO only on pages with TTM_MEMTYPE_FLAG_NEEDS_IOREMAP set. c54d5aa10b7a ttm: Change VMA flags if they != to the TTM flags. c07fbfd17e61 fbmem: VM_IO set, but not propagated Looks good. d541daf6b956 pvops: make pte_flags() go via pvops I've only hit that on a machine with a P4 Prescott with AGP. On nothing else - so it might not be required... If you don't have it you just get a bunch of WARN. In addition the Debian kernel already contains 25021c9 x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas 2eb6682 drm: recompute vma-vm_page_prot after changing vm_flags pvops: make pte_flags() go via pvops was the only bit of the patches which were omitted from the Debian kernel (the revert of bcf16b6b4f34) which didn't already appear to have been replaced by the other patches (ignoring all the AGP stuff) so I figured I may as well give it a go. My previous attempt (with all of the above except but make pte_flags() go via pvops) failed because I botched the backport of radeon/PCIe: Use the correct index field. and only fixed one of the wrong indexes. FWIW I think that patch should be folded down into the original patch for upstreaming. Yeah, good idea. And also actually put my SOB on them. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#601341: Bug#604096: Bug#601341: Bug#602418: #601341, #602418 and #604096 seem to be duplicates
FWIW I ran a patched kernel up on my home machine (radeon) and it didn't work. Without KMS the X server failed reasonably gracefuly (with some, presumably spurious, message about the keyboard driver) and with KMS it switched graphics mode and then hung on a black screen. I'll keep poking but I'm hampered a bit by my only suitable test machine actually being my home workstation. That was not the machine with the AGP card, right? Did you get these patches in too: 25021c9 x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas 2eb6682 drm: recompute vma-vm_page_prot after changing vm_flags dbbc947 ttm: Set VM_IO only on pages with TTM_MEMTYPE_FLAG_FIXED set. I suppose I should poke through 2.6.33..2.6.37-rc and see if anything jumps out for backporting. Did the series make any waves upstream? What are the chances that it will go upstream in something roughly like its current form? I hope so. I am putting the polishing touches on item c) to have it ready for upstream. Cool. Hrm, do I need some equivalent of c) in order to have a chance of this stuff working? Yes, and those three I mentioned earlier should suffice as a temporary solution. Or you can go straight ahead and look at devel/p2m-identity (however, there is a bug in them - ballooning in huge amounts of memory does not work right). -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#601341: Bug#604096: Bug#601341: Bug#602418: #601341, #602418 and #604096 seem to be duplicates
Did you get these patches in too: 25021c9 x86: define arch_vm_get_page_prot to set _PAGE_IOMAP on VM_IO vmas 2eb6682 drm: recompute vma-vm_page_prot after changing vm_flags dbbc947 ttm: Set VM_IO only on pages with TTM_MEMTYPE_FLAG_FIXED set. I seem to have 25021c9 and 2eb6682 but not dbbc947. Good. The first two are essential. I was experimenting with (from xen.git) 95518271 ttm: Set VM_IO only on pages with TTM_MEMTYPE_FLAG_NEEDS_IOREMAP set. c54d5aa1 ttm: Change VMA flags if they != to the TTM flags. e1687eae fb: propagate VM_IO to VMA. Is this a dead-end? So e1687eae is upstream, the other two become obsolete once devel/p2m-identity is flushed out. Hmm, did you also include: ttm: When TTM_PAGE_FLAG_DMA32 allocate pages under and in your tree, ah yes - you pulled the devel/ttm.pci-api-v2 which has an updated variant of that. dbbc947 and 95518271 seem to have a lot in common. Yup. The architecture of the ttm code changed from 2.6.34-2.6.37. .. snip.. Thanks, I'll try adding dbbc947. Should I ignore 95518271, c54d5aa1 and e1687eae for the time being? No, please do try those. So: ttm: Set VM_IO only on pages with TTM_MEMTYPE_FLAG_NEEDS_IOREMAP set without this, you would get these weird errors: (XEN) mm.c:1747:d0 Bad L1 flags c0 (XEN) mm.c:779:d0 Bad L1 flags c0 (XEN) mm.c:4659:d0 ptwr_emulate: could not get_page_from_l1e() [ 123.222339] BUG: unable to handle kernel paging request at 8800747382f8 [ 123.222339] IP: [8100e73a] xen_set_pte+0x31/0x36 [ 123.222339] PGD 1002067 PUD 2e4067 PMD 488067 PTE 1074738065 .. [ 123.385710] [8100e7e6] xen_set_pte_at+0xa7/0xb2 [ 123.385710] [8100c59d] ? __raw_callee_save_xen_make_pte+0x11/0x1e [ 123.385710] [810cd303] vm_insert_mixed+0x86/0xb0 [ 123.385710] [a003d68a] ttm_bo_vm_fault+0x201/0x26c [ttm] Or you can go straight ahead and look at devel/p2m-identity (however, there is a bug in them - ballooning in huge amounts of memory does not work right). I'd be very wary of taking an infrastructure change of that magnitude into Squeeze in its current frozen state. Good point. Don't take them. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#601341: Bug#602418: #601341, #602418 and #604096 seem to be duplicates
Then I got to eba164ec7e69 radeon/nouveau/ttm/AGP: Use dma_addr if TTM has set it. which complained: CC [M] drivers/gpu/drm/ttm/ttm_agp_backend.o drivers/gpu/drm/ttm/ttm_agp_backend.c: In function ‘ttm_agp_populate’: drivers/gpu/drm/ttm/ttm_agp_backend.c:66: error: ‘struct agp_memory’ has no member named ‘dma_addr’ and indeed the field is missing both in 2.6.32+drm33 and Linus' tree. Do I need to cherry pick something from another series or is this commit You can drop that patch. I've rebased the tree to: devel/ttm.pci-api-v2 which is exactly like the older except missing that patch. something which should be ignored per our previous discussion about PCIe vs AGP etc? (I'm going with the second option for now) Yup. I'll publish my backport in a git tree once I'm happy with it, I need to tidy it up and correct the cherry-picked from comments etc and then actually build something which uses it. I'll make Debian packages available for wider testing once I've done that (with Xmas coming up I don't know when that will actually be). Did the series make any waves upstream? What are the chances that it will go upstream in something roughly like its current form? I hope so. I am putting the polishing touches on item c) to have it ready for upstream. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#601341: Bug#602418: #601341, #602418 and #604096 seem to be duplicates
On Tue, Dec 07, 2010 at 11:49:14AM +, Ian Campbell wrote: On Mon, 2010-12-06 at 19:27 -0500, Konrad Rzeszutek Wilk wrote: a) Fix the GART/AGP backend (so drivers/char/agp/*.c) so they use the PCI API. Only the i915 and higher are using the PCI API and I've some of the older boxes with i860 so can actually test it. I've posted patches to address this (https://lkml.org/lkml/2010/12/6/480) and Dave question is why anyone cares about AGP in 2010. I was wondering if any folks could comment? His general principle of fixing the modern stuff first and then working backwards until nobody is complaining any more seems pretty sane to me. nods Is the series at https://lkml.org/lkml/2010/12/6/516 sufficient in its own right to make Nouveau and ATI work or is more needed? What about NV? Both Nouveau and ATI (PCIe) look to work. I did light testing (ATI ES1000, Radeon 3450, Nvidia 65.. something) and will need to do some more aggressive ones. Oh, and Intel GTT seems to work without any of these patches - but I've only tested it on a machine with 4GB so I need to add more memory to make sure. More generally if we were to take the series from https://lkml.org/lkml/2010/12/6/516 but not the series from https://lkml.org/lkml/2010/12/6/480 which sets of cards would we be including/excluding support for? PCIe = supported PCI = not supported AGP = not supported. Ian. -- Ian Campbell Current Noise: Mistress - 38 What's done to children, they will do to society. -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#601341: Bug#602418: #601341, #602418 and #604096 seem to be duplicates
Dave's concerns seemed mainly to be about AGP bits rather than PCI, are Yeah, his concerns are valid: why touch it if nobody is using it. And if truly there aren't enough folks being interested in AGP support, then I am fine dropping it. they independent(-ish)? e.g. is only a subset of the .../480 series is The PCI cards I am taking about are the .. PCI Matrox G400 or like, so even older than AGP cards. needed to enable PCI support? You know, I might have not actually posted a patch for this. It was one of those DRM scattergather code. shrugs -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#601341: Bug#602418: #601341, #602418 and #604096 seem to be duplicates
a) Fix the GART/AGP backend (so drivers/char/agp/*.c) so they use the PCI API. Only the i915 and higher are using the PCI API and I've some of the older boxes with i860 so can actually test it. I've posted patches to address this (https://lkml.org/lkml/2010/12/6/480) and Dave question is why anyone cares about AGP in 2010. I was wondering if any folks could comment? -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#604096: Bug#602418: #601341, #602418 and #604096 seem to be duplicates
.. snip of back-history.. Thanks for the pointers. I agree with Bastian that some of these changes are really quite nasty. Do you and other Xen developers have any plan for how to fix the GART and TTM mapping problems in a cleaner way as Xen dom0 support goes upstream? I know that there is a plan to get rid of the _PAGE_IOMAP stuff altogether by simply arranging for a 1-1 mapping for the relevant device PFNs in the P2M array. However I'm not sure whether or not this knocks-on into a fix for the GART/TTM stuff. Unfortunately it won't fit the whole bill. What some of those patches did was introduce a mechanism to use the PCI API to do virt_to_phys. And when I say use, I mean that really loosely. The solution I cobbled was to bypass using any API and just hard-coded the phys-bus address lookup. My plan for upstream is to actually work on those drivers (intel-agp.c, agpgart.c) to utilize the PCI API. Konrad, do you have an idea how you plan to solve the GART/TTM issues upstream? Yes, I am working on a set of patches that are cleaner and more upstream-able than the first revision. Hope to have most of a) and b) done in the next two weeks. And there are actually three distinct milestones here: a) Fix the GART/AGP backend (so drivers/char/agp/*.c) so they use the PCI API. Only the i915 and higher are using the PCI API and I've some of the older boxes with i860 so can actually test it. b) Fix the TTM to use the DMA API. c) Lastly, get rid of _PAGE_IOMAP so we don't have to depend on radeon/nouveau/etc to set the proper _PAGE_IOMAP on the PFNs/BARs.. If there is no such plan then I would rather disable these drivers than make them work temporarily with a hack. The shape of the stuff that I am going to propose upstream is more refined and much cleaner. Do you want me to send you an email when I am ready and had done my testing so you can take a look at it? -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#596419: Acknowledgement (xen-linux-system-2.6.32-5-xen-amd64: causes a system hangup by the shutdown of the system, aacraid (sw raid) involved in hangup)
So, it worked if I have specified in Dom0 in the baloon mode by omitting the specification of dom0_mem or, if dom0_mem is specified then also the swiotlb=65536 must be specified. Wow. That implies that AACRAID uses quite a lot of buffers, and looking at the driver there are a bunch of quirks where it can only do DMA up to 2GB, so that would explain why it relies on SWIOTLB that much. Based on what Ian analyzed it really looks that we just ran out of DMA buffers and the driver didn't try to retry but just bails out. We can narrow down who is using so many buffers by using the attached debug module that when loaded will print out who is using what buffers if CONFIG_DMA_API_DEBUG=y is set. But the proper workaround is the one you discovered - either raise the SWIOTLB buffer or raise the memory allocated for Dom0. I have noticed one interesting behavior - during the successfull suspension of the domains during the shutdown the first one which is beeing suspended writes very fast three dots, then it stops to write the dots for some time and then agfter some time very fast a lot of (possibly also all remaining) dots are written on the screen. By the next suspensions the suspension works continuously dot-by-dot smoothly without any delays. It looks like it waits for something during the first suspension (memory allocation?). That usually means that is stuck waiting for the disks to write out all the data. Generally, it is for me very surpsrising, how the aacraid module works, I am no C or kernel developer but I would expect something like this cannot happen - the module should allocate its necessary memory in the start or, I would understand there can fail some specific read or write operation if the sw raid has not enough memory to execute them, but I would never expect this will lead to the hangup and freeze of the whole system. The probability of Well, to be honest, we engineers aren't known for testing all of the failure paths as well as we should. That is why folks like you are quite helpful in finding bugs :-) /* * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License v2.0 as published by * the Free Software Foundation * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. */ #include linux/module.h #include linux/string.h #include linux/types.h #include linux/init.h #include linux/stat.h #include linux/err.h #include linux/ctype.h #include linux/slab.h #include linux/limits.h #include linux/device.h #include linux/pci.h #include linux/blkdev.h #include linux/device.h #include linux/init.h #include linux/mm.h #include linux/fcntl.h #include linux/slab.h #include linux/kmod.h #include linux/major.h #include linux/smp_lock.h #include linux/highmem.h #include linux/blkdev.h #include linux/module.h #include linux/blkpg.h #include linux/buffer_head.h #include linux/mpage.h #include linux/mount.h #include linux/uio.h #include linux/namei.h #include asm/uaccess.h #include linux/pagemap.h #include linux/pagevec.h #include linux/dma-debug.h #define DUMP_DMA_FUN 0.1 MODULE_AUTHOR(Konrad Rzeszutek Wilk kon...@virtualiron); MODULE_DESCRIPTION(dump dma); MODULE_LICENSE(GPL); MODULE_VERSION(DUMP_DMA_FUN); static int __init dump_dma_init(void) { debug_dma_dump_mappings(NULL); return 0; } static void __exit dump_dma_exit(void) { } module_init(dump_dma_init); module_exit(dump_dma_exit); # Comment/uncomment the following line to disable/enable debugging #DEBUG = y # Add your debugging flag (or not) to CFLAGS ifeq ($(DEBUG),y) DEBFLAGS = -O -g # -O is needed to expand inlines else DEBFLAGS = -O2 endif EXTRA_CFLAGS += $(DEBFLAGS) -I$(LDDINCDIR) ifneq ($(KERNELRELEASE),) # call from kernel build system obj-m := dump_dma.o else #KERNELDIR ?= /lib/modules/$(shell uname -r)/build KERNELDIR ?= /home/konrad/git/neb.64/linux-build PWD := $(shell pwd) default: $(MAKE) -C $(KERNELDIR) M=$(PWD) LDDINCDIR=$(PWD)/../include modules endif clean: rm -rf *.o *~ core .depend .*.cmd *.ko *.mod.c .tmp_versions depend .depend dep: $(CC) $(CFLAGS) -M *.c .depend ifeq (.depend,$(wildcard .depend)) include .depend endif