Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
At 16:31 +0100 on 05 May (1430843498), Jan Beulich wrote: On 05.05.15 at 17:17, t...@xen.org wrote: At 16:10 +0100 on 05 May (1430842206), Jan Beulich wrote: From what I can tell (and assuming other code works correctly) the fact that arch_iommu_populate_page_table() sets d-need_iommu to -1 first thing should make sure that any subsequent changes to the p2m get propagated to IOMMU code for setting up respective mappings. Yes, but might they then be overridden by the previous mapping when this new code calls map_page()? Ah, I see now. This seems like a case where we should be using get_gfn()/put_gfn(). Yes - provided these may be called at all with the page_alloc_lock held. IOW - is there lock ordering defined between this one and the various mm locks? Good point. The page_alloc lock nests inside the p2m lock, for PoD (see page_alloc_mm_pre_lock() in mm-locks.h). So we can't call p2m operations here. Also, if doing so, would I then need to check the result of the inverse (p2m) translation after having done get_gfn() to make sure this is still the MFN I'm after? If so, and if it ends up being a different one, I'd have to retry and presumably somehow limit the number of retries... Yes. Ideally this loop would be iterating over all gfns in the p2m rather than over all owned MFNs. As long as needs_iommu gets set first, such a loop could safely be paused and restarted without worrying about concurrent updates. The code sould even stay in this file, though exposing an iterator from the p2m code would be a lot more efficient. In the meantime the patch you linked to is an improvement, so it can have my ack. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
On 05.05.15 at 17:17, t...@xen.org wrote: At 16:10 +0100 on 05 May (1430842206), Jan Beulich wrote: From what I can tell (and assuming other code works correctly) the fact that arch_iommu_populate_page_table() sets d-need_iommu to -1 first thing should make sure that any subsequent changes to the p2m get propagated to IOMMU code for setting up respective mappings. Yes, but might they then be overridden by the previous mapping when this new code calls map_page()? Ah, I see now. This seems like a case where we should be using get_gfn()/put_gfn(). Yes - provided these may be called at all with the page_alloc_lock held. IOW - is there lock ordering defined between this one and the various mm locks? Also, if doing so, would I then need to check the result of the inverse (p2m) translation after having done get_gfn() to make sure this is still the MFN I'm after? If so, and if it ends up being a different one, I'd have to retry and presumably somehow limit the number of retries... Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
On 16.04.15 at 11:28, t...@xen.org wrote: At 22:35 +0100 on 11 Apr (1428791713), Andrew Cooper wrote: I am not certain that it is the correct way to fix the issue, nor that the ioreq server code is the only way to trigger it. There are several ways to shoot a gfn mapping from the guests physmap. At least we now understand why it happens. I will defer to others CC'd on this thread for their opinions in the matter. The patch semes like a pretty good check to me, though I'm not convinced it's race-free. At the least I'd cache the m2p lookup so we use the same value for the checks and the map_page() call. Did you have a chance to look at the patch Sander meanwhile successfully tested [1]? I'm trying to understand where you see possible races here, and hence whether anything else needs to be done to that patch before formally submitting it. From what I can tell (and assuming other code works correctly) the fact that arch_iommu_populate_page_table() sets d-need_iommu to -1 first thing should make sure that any subsequent changes to the p2m get propagated to IOMMU code for setting up respective mappings. Thanks, Jan [1] http://lists.xenproject.org/archives/html/xen-devel/2015-04/msg02253.html And IMO update_paging_mode() ought to log and reject bogus GFNs as well. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
At 16:10 +0100 on 05 May (1430842206), Jan Beulich wrote: On 16.04.15 at 11:28, t...@xen.org wrote: At 22:35 +0100 on 11 Apr (1428791713), Andrew Cooper wrote: I am not certain that it is the correct way to fix the issue, nor that the ioreq server code is the only way to trigger it. There are several ways to shoot a gfn mapping from the guests physmap. At least we now understand why it happens. I will defer to others CC'd on this thread for their opinions in the matter. The patch semes like a pretty good check to me, though I'm not convinced it's race-free. At the least I'd cache the m2p lookup so we use the same value for the checks and the map_page() call. Did you have a chance to look at the patch Sander meanwhile successfully tested [1]? Just looked at it now. I'm trying to understand where you see possible races here, and hence whether anything else needs to be done to that patch before formally submitting it. It caches the m2p mlookup, which I like, but there's still a race against concurrent p2m updates. From what I can tell (and assuming other code works correctly) the fact that arch_iommu_populate_page_table() sets d-need_iommu to -1 first thing should make sure that any subsequent changes to the p2m get propagated to IOMMU code for setting up respective mappings. Yes, but might they then be overridden by the previous mapping when this new code calls map_page()? This seems like a case where we should be using get_gfn()/put_gfn(). Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
On 21.04.15 at 10:24, li...@eikelenboom.it wrote: Tuesday, April 21, 2015, 10:11:07 AM, you wrote: Interesting - didn't you say that as a side effect of Andrew's patch you saw massive log spam? If you mean these: (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! gfn=001ed type:4 [...] Those were actually due to Konrad's kernel patch that was on the devel-4.1 branch that has already been dropped. (commit 22d8a8938407cb1342af763e937fdf9ee8daf24a 'xen/pciback: Don't disable PCI_COMMAND on PCI device reset.') Ah, okay. Iirc there was no progress towards a resolution there yet? For the rest there is some extra log spam now, since the memory maps now are done in very small chunks (the hypercall continuation stuff working?): (XEN) [2015-04-21 08:04:01.207] memory_map:add: dom20 gfn=ec780 mfn=cc780 nr=40 [...] Don't know if that makes much sense anymore (unless specifically enabled if you want such detail .. and the whole range with perhaps a start and finish message is not enough) The hypervisor can't really tell whether a re-invocation of said hypercall is a continuation or a new request. Hence we can only either drop the message altogether or live with it being spammy on large regions (it's a XENLOG_G_INFO one anyway, so not enabled by default, and if enabled usually rate limited). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
On 20.04.15 at 20:50, li...@eikelenboom.it wrote: Monday, April 20, 2015, 6:11:42 PM, you wrote: On 16.04.15 at 11:28, t...@xen.org wrote: At 22:35 +0100 on 11 Apr (1428791713), Andrew Cooper wrote: At least we now understand why it happens. I will defer to others CC'd on this thread for their opinions in the matter. The patch semes like a pretty good check to me, though I'm not convinced it's race-free. At the least I'd cache the m2p lookup so we use the same value for the checks and the map_page() call. And IMO update_paging_mode() ought to log and reject bogus GFNs as well. could you give the patch below a try, namely also in the context of seeing again the issue originally fixed by Andrew's initial patch? Please make sure you try a debug build and you have iommu=debug on the Xen command line. I'm running with it now, have seen no issues so far ! Interesting - didn't you say that as a side effect of Andrew's patch you saw massive log spam? Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
On 16.04.15 at 11:28, t...@xen.org wrote: At 22:35 +0100 on 11 Apr (1428791713), Andrew Cooper wrote: At least we now understand why it happens. I will defer to others CC'd on this thread for their opinions in the matter. The patch semes like a pretty good check to me, though I'm not convinced it's race-free. At the least I'd cache the m2p lookup so we use the same value for the checks and the map_page() call. And IMO update_paging_mode() ought to log and reject bogus GFNs as well. Sander, could you give the patch below a try, namely also in the context of seeing again the issue originally fixed by Andrew's initial patch? Please make sure you try a debug build and you have iommu=debug on the Xen command line. Jan --- unstable.orig/xen/drivers/passthrough/amd/iommu_map.c +++ unstable/xen/drivers/passthrough/amd/iommu_map.c @@ -557,6 +557,10 @@ static int update_paging_mode(struct dom unsigned long old_root_mfn; struct hvm_iommu *hd = domain_hvm_iommu(d); +if ( gfn == INVALID_MFN ) +return -EADDRNOTAVAIL; +ASSERT(!(gfn DEFAULT_DOMAIN_ADDRESS_WIDTH)); + level = hd-arch.paging_mode; old_root = hd-arch.root_table; offset = gfn (PTE_PER_TABLE_SHIFT * (level - 1)); @@ -729,12 +733,15 @@ int amd_iommu_unmap_page(struct domain * * we might need a deeper page table for lager gfn now */ if ( is_hvm_domain(d) ) { -if ( update_paging_mode(d, gfn) ) +int rc = update_paging_mode(d, gfn); + +if ( rc ) { spin_unlock(hd-arch.mapping_lock); AMD_IOMMU_DEBUG(Update page mode failed gfn = %lx\n, gfn); -domain_crash(d); -return -EFAULT; +if ( rc != -EADDRNOTAVAIL ) +domain_crash(d); +return rc; } } --- unstable.orig/xen/drivers/passthrough/x86/iommu.c +++ unstable/xen/drivers/passthrough/x86/iommu.c @@ -59,10 +59,17 @@ int arch_iommu_populate_page_table(struc if ( has_hvm_container_domain(d) || (page-u.inuse.type_info PGT_type_mask) == PGT_writable_page ) { -BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page; -rc = hd-platform_ops-map_page( -d, mfn_to_gmfn(d, page_to_mfn(page)), page_to_mfn(page), -IOMMUF_readable|IOMMUF_writable); +unsigned long mfn = page_to_mfn(page); +unsigned long gfn = mfn_to_gmfn(d, mfn); + +if ( gfn != INVALID_MFN ) +{ +ASSERT(!(gfn DEFAULT_DOMAIN_ADDRESS_WIDTH)); +BUG_ON(SHARED_M2P(gfn)); +rc = hd-platform_ops-map_page(d, gfn, mfn, +IOMMUF_readable | +IOMMUF_writable); +} if ( rc ) { page_list_add(page, d-page_list); --- unstable.orig/xen/drivers/passthrough/vtd/iommu.h +++ unstable/xen/drivers/passthrough/vtd/iommu.h @@ -482,7 +482,6 @@ struct qinval_entry { #define VTD_PAGE_TABLE_LEVEL_3 3 #define VTD_PAGE_TABLE_LEVEL_4 4 -#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 #define MAX_IOMMU_REGS 0xc0 extern struct list_head acpi_drhd_units; --- unstable.orig/xen/include/asm-x86/hvm/iommu.h +++ unstable/xen/include/asm-x86/hvm/iommu.h @@ -46,6 +46,8 @@ struct g2m_ioport { unsigned int np; }; +#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 + struct arch_hvm_iommu { u64 pgd_maddr; /* io page directory machine address */ --- unstable.orig/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h +++ unstable/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h @@ -464,8 +464,6 @@ #define IOMMU_CONTROL_DISABLED 0 #define IOMMU_CONTROL_ENABLED 1 -#define DEFAULT_DOMAIN_ADDRESS_WIDTH48 - /* interrupt remapping table */ #define INT_REMAP_ENTRY_REMAPEN_MASK0x0001 #define INT_REMAP_ENTRY_REMAPEN_SHIFT 0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
Monday, April 20, 2015, 6:11:42 PM, you wrote: On 16.04.15 at 11:28, t...@xen.org wrote: At 22:35 +0100 on 11 Apr (1428791713), Andrew Cooper wrote: At least we now understand why it happens. I will defer to others CC'd on this thread for their opinions in the matter. The patch semes like a pretty good check to me, though I'm not convinced it's race-free. At the least I'd cache the m2p lookup so we use the same value for the checks and the map_page() call. And IMO update_paging_mode() ought to log and reject bogus GFNs as well. Sander, could you give the patch below a try, namely also in the context of seeing again the issue originally fixed by Andrew's initial patch? Please make sure you try a debug build and you have iommu=debug on the Xen command line. Jan Hi Jan, Should this be applied on top of Andrew's initial patch, or instead of ? -- Sander --- unstable.orig/xen/drivers/passthrough/amd/iommu_map.c +++ unstable/xen/drivers/passthrough/amd/iommu_map.c @@ -557,6 +557,10 @@ static int update_paging_mode(struct dom unsigned long old_root_mfn; struct hvm_iommu *hd = domain_hvm_iommu(d); +if ( gfn == INVALID_MFN ) +return -EADDRNOTAVAIL; +ASSERT(!(gfn DEFAULT_DOMAIN_ADDRESS_WIDTH)); + level = hd-arch.paging_mode; old_root = hd-arch.root_table; offset = gfn (PTE_PER_TABLE_SHIFT * (level - 1)); @@ -729,12 +733,15 @@ int amd_iommu_unmap_page(struct domain * * we might need a deeper page table for lager gfn now */ if ( is_hvm_domain(d) ) { -if ( update_paging_mode(d, gfn) ) +int rc = update_paging_mode(d, gfn); + +if ( rc ) { spin_unlock(hd-arch.mapping_lock); AMD_IOMMU_DEBUG(Update page mode failed gfn = %lx\n, gfn); -domain_crash(d); -return -EFAULT; +if ( rc != -EADDRNOTAVAIL ) +domain_crash(d); +return rc; } } --- unstable.orig/xen/drivers/passthrough/x86/iommu.c +++ unstable/xen/drivers/passthrough/x86/iommu.c @@ -59,10 +59,17 @@ int arch_iommu_populate_page_table(struc if ( has_hvm_container_domain(d) || (page-u.inuse.type_info PGT_type_mask) == PGT_writable_page ) { -BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page; -rc = hd-platform_ops-map_page( -d, mfn_to_gmfn(d, page_to_mfn(page)), page_to_mfn(page), -IOMMUF_readable|IOMMUF_writable); +unsigned long mfn = page_to_mfn(page); +unsigned long gfn = mfn_to_gmfn(d, mfn); + +if ( gfn != INVALID_MFN ) +{ +ASSERT(!(gfn DEFAULT_DOMAIN_ADDRESS_WIDTH)); +BUG_ON(SHARED_M2P(gfn)); +rc = hd-platform_ops-map_page(d, gfn, mfn, +IOMMUF_readable | +IOMMUF_writable); +} if ( rc ) { page_list_add(page, d-page_list); --- unstable.orig/xen/drivers/passthrough/vtd/iommu.h +++ unstable/xen/drivers/passthrough/vtd/iommu.h @@ -482,7 +482,6 @@ struct qinval_entry { #define VTD_PAGE_TABLE_LEVEL_3 3 #define VTD_PAGE_TABLE_LEVEL_4 4 -#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 #define MAX_IOMMU_REGS 0xc0 extern struct list_head acpi_drhd_units; --- unstable.orig/xen/include/asm-x86/hvm/iommu.h +++ unstable/xen/include/asm-x86/hvm/iommu.h @@ -46,6 +46,8 @@ struct g2m_ioport { unsigned int np; }; +#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 + struct arch_hvm_iommu { u64 pgd_maddr; /* io page directory machine address */ --- unstable.orig/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h +++ unstable/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h @@ -464,8 +464,6 @@ #define IOMMU_CONTROL_DISABLED 0 #define IOMMU_CONTROL_ENABLED 1 -#define DEFAULT_DOMAIN_ADDRESS_WIDTH48 - /* interrupt remapping table */ #define INT_REMAP_ENTRY_REMAPEN_MASK0x0001 #define INT_REMAP_ENTRY_REMAPEN_SHIFT 0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
Monday, April 20, 2015, 6:11:42 PM, you wrote: On 16.04.15 at 11:28, t...@xen.org wrote: At 22:35 +0100 on 11 Apr (1428791713), Andrew Cooper wrote: At least we now understand why it happens. I will defer to others CC'd on this thread for their opinions in the matter. The patch semes like a pretty good check to me, though I'm not convinced it's race-free. At the least I'd cache the m2p lookup so we use the same value for the checks and the map_page() call. And IMO update_paging_mode() ought to log and reject bogus GFNs as well. Sander, could you give the patch below a try, namely also in the context of seeing again the issue originally fixed by Andrew's initial patch? Please make sure you try a debug build and you have iommu=debug on the Xen command line. Jan Hi Jan, I'm running with it now, have seen no issues so far ! -- Sander --- unstable.orig/xen/drivers/passthrough/amd/iommu_map.c +++ unstable/xen/drivers/passthrough/amd/iommu_map.c @@ -557,6 +557,10 @@ static int update_paging_mode(struct dom unsigned long old_root_mfn; struct hvm_iommu *hd = domain_hvm_iommu(d); +if ( gfn == INVALID_MFN ) +return -EADDRNOTAVAIL; +ASSERT(!(gfn DEFAULT_DOMAIN_ADDRESS_WIDTH)); + level = hd-arch.paging_mode; old_root = hd-arch.root_table; offset = gfn (PTE_PER_TABLE_SHIFT * (level - 1)); @@ -729,12 +733,15 @@ int amd_iommu_unmap_page(struct domain * * we might need a deeper page table for lager gfn now */ if ( is_hvm_domain(d) ) { -if ( update_paging_mode(d, gfn) ) +int rc = update_paging_mode(d, gfn); + +if ( rc ) { spin_unlock(hd-arch.mapping_lock); AMD_IOMMU_DEBUG(Update page mode failed gfn = %lx\n, gfn); -domain_crash(d); -return -EFAULT; +if ( rc != -EADDRNOTAVAIL ) +domain_crash(d); +return rc; } } --- unstable.orig/xen/drivers/passthrough/x86/iommu.c +++ unstable/xen/drivers/passthrough/x86/iommu.c @@ -59,10 +59,17 @@ int arch_iommu_populate_page_table(struc if ( has_hvm_container_domain(d) || (page-u.inuse.type_info PGT_type_mask) == PGT_writable_page ) { -BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page; -rc = hd-platform_ops-map_page( -d, mfn_to_gmfn(d, page_to_mfn(page)), page_to_mfn(page), -IOMMUF_readable|IOMMUF_writable); +unsigned long mfn = page_to_mfn(page); +unsigned long gfn = mfn_to_gmfn(d, mfn); + +if ( gfn != INVALID_MFN ) +{ +ASSERT(!(gfn DEFAULT_DOMAIN_ADDRESS_WIDTH)); +BUG_ON(SHARED_M2P(gfn)); +rc = hd-platform_ops-map_page(d, gfn, mfn, +IOMMUF_readable | +IOMMUF_writable); +} if ( rc ) { page_list_add(page, d-page_list); --- unstable.orig/xen/drivers/passthrough/vtd/iommu.h +++ unstable/xen/drivers/passthrough/vtd/iommu.h @@ -482,7 +482,6 @@ struct qinval_entry { #define VTD_PAGE_TABLE_LEVEL_3 3 #define VTD_PAGE_TABLE_LEVEL_4 4 -#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 #define MAX_IOMMU_REGS 0xc0 extern struct list_head acpi_drhd_units; --- unstable.orig/xen/include/asm-x86/hvm/iommu.h +++ unstable/xen/include/asm-x86/hvm/iommu.h @@ -46,6 +46,8 @@ struct g2m_ioport { unsigned int np; }; +#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48 + struct arch_hvm_iommu { u64 pgd_maddr; /* io page directory machine address */ --- unstable.orig/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h +++ unstable/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h @@ -464,8 +464,6 @@ #define IOMMU_CONTROL_DISABLED 0 #define IOMMU_CONTROL_ENABLED 1 -#define DEFAULT_DOMAIN_ADDRESS_WIDTH48 - /* interrupt remapping table */ #define INT_REMAP_ENTRY_REMAPEN_MASK0x0001 #define INT_REMAP_ENTRY_REMAPEN_SHIFT 0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
[Trimmed egregious quoting] At 22:35 +0100 on 11 Apr (1428791713), Andrew Cooper wrote: I am not certain that it is the correct way to fix the issue, nor that the ioreq server code is the only way to trigger it. There are several ways to shoot a gfn mapping from the guests physmap. At least we now understand why it happens. I will defer to others CC'd on this thread for their opinions in the matter. The patch semes like a pretty good check to me, though I'm not convinced it's race-free. At the least I'd cache the m2p lookup so we use the same value for the checks and the map_page() call. And IMO update_paging_mode() ought to log and reject bogus GFNs as well. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
On Sat, Apr 11, 2015 at 10:35:13PM +0100, Andrew Cooper wrote: On 11/04/2015 22:05, Sander Eikelenboom wrote: Saturday, April 11, 2015, 10:22:16 PM, you wrote: On 11/04/2015 20:33, Sander Eikelenboom wrote: Saturday, April 11, 2015, 8:25:52 PM, you wrote: On 11/04/15 18:42, Sander Eikelenboom wrote: Saturday, April 11, 2015, 7:35:57 PM, you wrote: On 11/04/15 18:25, Sander Eikelenboom wrote: Saturday, April 11, 2015, 6:38:17 PM, you wrote: On 11/04/15 17:32, Andrew Cooper wrote: On 11/04/15 17:21, Sander Eikelenboom wrote: Saturday, April 11, 2015, 4:21:56 PM, you wrote: On 11/04/15 15:11, Sander Eikelenboom wrote: Friday, April 10, 2015, 8:55:27 PM, you wrote: On 10/04/15 11:24, Sander Eikelenboom wrote: Hi Andrew, Finally got some time to figure this out .. and i have narrowed it down to: git://xenbits.xen.org/staging/qemu-upstream-unstable.git commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the ioreq-server API when available A straight revert of this commit prevents the issue from happening. The reason i had a hard time figuring this out was: - I wasn't aware of this earlier, since git pulling the main xen tree, doesn't auto update the qemu-* trees. This has caught me out so many times. It is very non-obvious behaviour. - So i happen to get this when i cloned a fresh tree to try to figure out the other issue i was seeing. - After that checking out previous versions of the main xen tree didn't resolve this new issue, because the qemu tree doesn't get auto updated and is set master. - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not master. *sigh* This is tested with xen main tree at last commit 3a28f760508fb35c430edac17a9efde5aff6d1d5 (normal xen-unstable, not the staging branch) Ok so i have added some extra debug info (see attached diff) and this is the output when it crashes due to something the commit above triggered, the level is out of bounds and the pfn looks fishy too. Complete serial log from both bad and good (specific commit reverted) are attached. Just to confirm, you are positively identifying a qemu changeset as causing this crash? If so, the qemu change has discovered a pre-existing issue in the toolstack pci-passthrough interface. Whatever qemu is or isn't doing, it should not be able to cause a crash like this. With this in mind, I need to brush up on my AMD-Vi details. In the meantime, can you run with the following patch to identify what is going on, domctl wise? I assume it is the assign_device which is failing, but it will be nice to observe the differences between the working and failing case, which might offer a hint. Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl: (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) [2015-04-11 14:03:31.857] CPU:5 (XEN) [2015-04-11 14:03:31.868] RIP: e008:[82d08014c52c] iommu_do_pci_domctl+0x2dc/0x740 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256 CONTEXT: hypervisor (XEN) [2015-04-11 14:03:31.915] rax: 0008 rbx: 0800 rcx: ffebe5ed (XEN) [2015-04-11 14:03:31.942] rdx: 0800 rsi: rdi: 830256ef7e38 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98 rsp: 830256ef7c08 r8: deadbeef (XEN) [2015-04-11 14:03:31.995] r9: deadbeef r10: 82d08024e500 r11: 0282 (XEN) [2015-04-11 14:03:32.022] r12: r13: 0008 r14: (XEN) [2015-04-11 14:03:32.049] r15: cr0: 80050033 cr4: 06f0 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000 cr2: (XEN) [2015-04-11 14:03:32.096] ds: es: fs: gs: ss: e010 cs: e008 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=830256ef7c08: (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 830256ef7c28 830256ef7c28 (XEN) [2015-04-11 14:03:32.168]0010 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 830256eb7790 83025cc6d300 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 7fe396bab004 (XEN) [2015-04-11 14:03:32.249] 0005 830256ef7ca8 82d08014900b (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 0010 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 82d08018b655 830256ef7d48 (XEN) [2015-04-11
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
Sunday, April 12, 2015, 5:15:58 PM, you wrote: Saturday, April 11, 2015, 11:35:13 PM, you wrote: On 11/04/2015 22:05, Sander Eikelenboom wrote: Saturday, April 11, 2015, 10:22:16 PM, you wrote: On 11/04/2015 20:33, Sander Eikelenboom wrote: Saturday, April 11, 2015, 8:25:52 PM, you wrote: On 11/04/15 18:42, Sander Eikelenboom wrote: Saturday, April 11, 2015, 7:35:57 PM, you wrote: On 11/04/15 18:25, Sander Eikelenboom wrote: Saturday, April 11, 2015, 6:38:17 PM, you wrote: On 11/04/15 17:32, Andrew Cooper wrote: On 11/04/15 17:21, Sander Eikelenboom wrote: Saturday, April 11, 2015, 4:21:56 PM, you wrote: On 11/04/15 15:11, Sander Eikelenboom wrote: Friday, April 10, 2015, 8:55:27 PM, you wrote: On 10/04/15 11:24, Sander Eikelenboom wrote: Hi Andrew, Finally got some time to figure this out .. and i have narrowed it down to: git://xenbits.xen.org/staging/qemu-upstream-unstable.git commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the ioreq-server API when available A straight revert of this commit prevents the issue from happening. The reason i had a hard time figuring this out was: - I wasn't aware of this earlier, since git pulling the main xen tree, doesn't auto update the qemu-* trees. This has caught me out so many times. It is very non-obvious behaviour. - So i happen to get this when i cloned a fresh tree to try to figure out the other issue i was seeing. - After that checking out previous versions of the main xen tree didn't resolve this new issue, because the qemu tree doesn't get auto updated and is set master. - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not master. *sigh* This is tested with xen main tree at last commit 3a28f760508fb35c430edac17a9efde5aff6d1d5 (normal xen-unstable, not the staging branch) Ok so i have added some extra debug info (see attached diff) and this is the output when it crashes due to something the commit above triggered, the level is out of bounds and the pfn looks fishy too. Complete serial log from both bad and good (specific commit reverted) are attached. Just to confirm, you are positively identifying a qemu changeset as causing this crash? If so, the qemu change has discovered a pre-existing issue in the toolstack pci-passthrough interface. Whatever qemu is or isn't doing, it should not be able to cause a crash like this. With this in mind, I need to brush up on my AMD-Vi details. In the meantime, can you run with the following patch to identify what is going on, domctl wise? I assume it is the assign_device which is failing, but it will be nice to observe the differences between the working and failing case, which might offer a hint. Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl: (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) [2015-04-11 14:03:31.857] CPU:5 (XEN) [2015-04-11 14:03:31.868] RIP: e008:[82d08014c52c] iommu_do_pci_domctl+0x2dc/0x740 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256 CONTEXT: hypervisor (XEN) [2015-04-11 14:03:31.915] rax: 0008 rbx: 0800 rcx: ffebe5ed (XEN) [2015-04-11 14:03:31.942] rdx: 0800 rsi: rdi: 830256ef7e38 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98 rsp: 830256ef7c08 r8: deadbeef (XEN) [2015-04-11 14:03:31.995] r9: deadbeef r10: 82d08024e500 r11: 0282 (XEN) [2015-04-11 14:03:32.022] r12: r13: 0008 r14: (XEN) [2015-04-11 14:03:32.049] r15: cr0: 80050033 cr4: 06f0 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000 cr2: (XEN) [2015-04-11 14:03:32.096] ds: es: fs: gs: ss: e010 cs: e008 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=830256ef7c08: (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 830256ef7c28 830256ef7c28 (XEN) [2015-04-11 14:03:32.168]0010 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 830256eb7790 83025cc6d300 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 7fe396bab004 (XEN) [2015-04-11 14:03:32.249] 0005 830256ef7ca8 82d08014900b (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 0010 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 82d08018b655 830256ef7d48 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 830256ef7d38
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
Saturday, April 11, 2015, 11:35:13 PM, you wrote: On 11/04/2015 22:05, Sander Eikelenboom wrote: Saturday, April 11, 2015, 10:22:16 PM, you wrote: On 11/04/2015 20:33, Sander Eikelenboom wrote: Saturday, April 11, 2015, 8:25:52 PM, you wrote: On 11/04/15 18:42, Sander Eikelenboom wrote: Saturday, April 11, 2015, 7:35:57 PM, you wrote: On 11/04/15 18:25, Sander Eikelenboom wrote: Saturday, April 11, 2015, 6:38:17 PM, you wrote: On 11/04/15 17:32, Andrew Cooper wrote: On 11/04/15 17:21, Sander Eikelenboom wrote: Saturday, April 11, 2015, 4:21:56 PM, you wrote: On 11/04/15 15:11, Sander Eikelenboom wrote: Friday, April 10, 2015, 8:55:27 PM, you wrote: On 10/04/15 11:24, Sander Eikelenboom wrote: Hi Andrew, Finally got some time to figure this out .. and i have narrowed it down to: git://xenbits.xen.org/staging/qemu-upstream-unstable.git commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the ioreq-server API when available A straight revert of this commit prevents the issue from happening. The reason i had a hard time figuring this out was: - I wasn't aware of this earlier, since git pulling the main xen tree, doesn't auto update the qemu-* trees. This has caught me out so many times. It is very non-obvious behaviour. - So i happen to get this when i cloned a fresh tree to try to figure out the other issue i was seeing. - After that checking out previous versions of the main xen tree didn't resolve this new issue, because the qemu tree doesn't get auto updated and is set master. - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not master. *sigh* This is tested with xen main tree at last commit 3a28f760508fb35c430edac17a9efde5aff6d1d5 (normal xen-unstable, not the staging branch) Ok so i have added some extra debug info (see attached diff) and this is the output when it crashes due to something the commit above triggered, the level is out of bounds and the pfn looks fishy too. Complete serial log from both bad and good (specific commit reverted) are attached. Just to confirm, you are positively identifying a qemu changeset as causing this crash? If so, the qemu change has discovered a pre-existing issue in the toolstack pci-passthrough interface. Whatever qemu is or isn't doing, it should not be able to cause a crash like this. With this in mind, I need to brush up on my AMD-Vi details. In the meantime, can you run with the following patch to identify what is going on, domctl wise? I assume it is the assign_device which is failing, but it will be nice to observe the differences between the working and failing case, which might offer a hint. Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl: (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) [2015-04-11 14:03:31.857] CPU:5 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] iommu_do_pci_domctl+0x2dc/0x740 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256 CONTEXT: hypervisor (XEN) [2015-04-11 14:03:31.915] rax: 0008 rbx: 0800 rcx: ffebe5ed (XEN) [2015-04-11 14:03:31.942] rdx: 0800 rsi: rdi: 830256ef7e38 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98 rsp: 830256ef7c08 r8: deadbeef (XEN) [2015-04-11 14:03:31.995] r9: deadbeef r10: 82d08024e500 r11: 0282 (XEN) [2015-04-11 14:03:32.022] r12: r13: 0008 r14: (XEN) [2015-04-11 14:03:32.049] r15: cr0: 80050033 cr4: 06f0 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000 cr2: (XEN) [2015-04-11 14:03:32.096] ds: es: fs: gs: ss: e010 cs: e008 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=830256ef7c08: (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 830256ef7c28 830256ef7c28 (XEN) [2015-04-11 14:03:32.168]0010 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 830256eb7790 83025cc6d300 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 7fe396bab004 (XEN) [2015-04-11 14:03:32.249] 0005 830256ef7ca8 82d08014900b (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 0010 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 82d08018b655 830256ef7d48 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 830256ef7d38 82d08012925e (XEN) [2015-04-11 14:03:32.357]
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
On 11/04/2015 20:33, Sander Eikelenboom wrote: Saturday, April 11, 2015, 8:25:52 PM, you wrote: On 11/04/15 18:42, Sander Eikelenboom wrote: Saturday, April 11, 2015, 7:35:57 PM, you wrote: On 11/04/15 18:25, Sander Eikelenboom wrote: Saturday, April 11, 2015, 6:38:17 PM, you wrote: On 11/04/15 17:32, Andrew Cooper wrote: On 11/04/15 17:21, Sander Eikelenboom wrote: Saturday, April 11, 2015, 4:21:56 PM, you wrote: On 11/04/15 15:11, Sander Eikelenboom wrote: Friday, April 10, 2015, 8:55:27 PM, you wrote: On 10/04/15 11:24, Sander Eikelenboom wrote: Hi Andrew, Finally got some time to figure this out .. and i have narrowed it down to: git://xenbits.xen.org/staging/qemu-upstream-unstable.git commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the ioreq-server API when available A straight revert of this commit prevents the issue from happening. The reason i had a hard time figuring this out was: - I wasn't aware of this earlier, since git pulling the main xen tree, doesn't auto update the qemu-* trees. This has caught me out so many times. It is very non-obvious behaviour. - So i happen to get this when i cloned a fresh tree to try to figure out the other issue i was seeing. - After that checking out previous versions of the main xen tree didn't resolve this new issue, because the qemu tree doesn't get auto updated and is set master. - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not master. *sigh* This is tested with xen main tree at last commit 3a28f760508fb35c430edac17a9efde5aff6d1d5 (normal xen-unstable, not the staging branch) Ok so i have added some extra debug info (see attached diff) and this is the output when it crashes due to something the commit above triggered, the level is out of bounds and the pfn looks fishy too. Complete serial log from both bad and good (specific commit reverted) are attached. Just to confirm, you are positively identifying a qemu changeset as causing this crash? If so, the qemu change has discovered a pre-existing issue in the toolstack pci-passthrough interface. Whatever qemu is or isn't doing, it should not be able to cause a crash like this. With this in mind, I need to brush up on my AMD-Vi details. In the meantime, can you run with the following patch to identify what is going on, domctl wise? I assume it is the assign_device which is failing, but it will be nice to observe the differences between the working and failing case, which might offer a hint. Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl: (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) [2015-04-11 14:03:31.857] CPU:5 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] iommu_do_pci_domctl+0x2dc/0x740 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256 CONTEXT: hypervisor (XEN) [2015-04-11 14:03:31.915] rax: 0008 rbx: 0800 rcx: ffebe5ed (XEN) [2015-04-11 14:03:31.942] rdx: 0800 rsi: rdi: 830256ef7e38 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98 rsp: 830256ef7c08 r8: deadbeef (XEN) [2015-04-11 14:03:31.995] r9: deadbeef r10: 82d08024e500 r11: 0282 (XEN) [2015-04-11 14:03:32.022] r12: r13: 0008 r14: (XEN) [2015-04-11 14:03:32.049] r15: cr0: 80050033 cr4: 06f0 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000 cr2: (XEN) [2015-04-11 14:03:32.096] ds: es: fs: gs: ss: e010 cs: e008 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=830256ef7c08: (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 830256ef7c28 830256ef7c28 (XEN) [2015-04-11 14:03:32.168]0010 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 830256eb7790 83025cc6d300 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 7fe396bab004 (XEN) [2015-04-11 14:03:32.249] 0005 830256ef7ca8 82d08014900b (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 0010 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 82d08018b655 830256ef7d48 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 830256ef7d38 82d08012925e (XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 80022e12c167 (XEN) [2015-04-11 14:03:32.384]0002 830256ef7e38 0008 80022e12c167
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
Saturday, April 11, 2015, 10:22:16 PM, you wrote: On 11/04/2015 20:33, Sander Eikelenboom wrote: Saturday, April 11, 2015, 8:25:52 PM, you wrote: On 11/04/15 18:42, Sander Eikelenboom wrote: Saturday, April 11, 2015, 7:35:57 PM, you wrote: On 11/04/15 18:25, Sander Eikelenboom wrote: Saturday, April 11, 2015, 6:38:17 PM, you wrote: On 11/04/15 17:32, Andrew Cooper wrote: On 11/04/15 17:21, Sander Eikelenboom wrote: Saturday, April 11, 2015, 4:21:56 PM, you wrote: On 11/04/15 15:11, Sander Eikelenboom wrote: Friday, April 10, 2015, 8:55:27 PM, you wrote: On 10/04/15 11:24, Sander Eikelenboom wrote: Hi Andrew, Finally got some time to figure this out .. and i have narrowed it down to: git://xenbits.xen.org/staging/qemu-upstream-unstable.git commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the ioreq-server API when available A straight revert of this commit prevents the issue from happening. The reason i had a hard time figuring this out was: - I wasn't aware of this earlier, since git pulling the main xen tree, doesn't auto update the qemu-* trees. This has caught me out so many times. It is very non-obvious behaviour. - So i happen to get this when i cloned a fresh tree to try to figure out the other issue i was seeing. - After that checking out previous versions of the main xen tree didn't resolve this new issue, because the qemu tree doesn't get auto updated and is set master. - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not master. *sigh* This is tested with xen main tree at last commit 3a28f760508fb35c430edac17a9efde5aff6d1d5 (normal xen-unstable, not the staging branch) Ok so i have added some extra debug info (see attached diff) and this is the output when it crashes due to something the commit above triggered, the level is out of bounds and the pfn looks fishy too. Complete serial log from both bad and good (specific commit reverted) are attached. Just to confirm, you are positively identifying a qemu changeset as causing this crash? If so, the qemu change has discovered a pre-existing issue in the toolstack pci-passthrough interface. Whatever qemu is or isn't doing, it should not be able to cause a crash like this. With this in mind, I need to brush up on my AMD-Vi details. In the meantime, can you run with the following patch to identify what is going on, domctl wise? I assume it is the assign_device which is failing, but it will be nice to observe the differences between the working and failing case, which might offer a hint. Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl: (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) [2015-04-11 14:03:31.857] CPU:5 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] iommu_do_pci_domctl+0x2dc/0x740 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256 CONTEXT: hypervisor (XEN) [2015-04-11 14:03:31.915] rax: 0008 rbx: 0800 rcx: ffebe5ed (XEN) [2015-04-11 14:03:31.942] rdx: 0800 rsi: rdi: 830256ef7e38 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98 rsp: 830256ef7c08 r8: deadbeef (XEN) [2015-04-11 14:03:31.995] r9: deadbeef r10: 82d08024e500 r11: 0282 (XEN) [2015-04-11 14:03:32.022] r12: r13: 0008 r14: (XEN) [2015-04-11 14:03:32.049] r15: cr0: 80050033 cr4: 06f0 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000 cr2: (XEN) [2015-04-11 14:03:32.096] ds: es: fs: gs: ss: e010 cs: e008 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=830256ef7c08: (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 830256ef7c28 830256ef7c28 (XEN) [2015-04-11 14:03:32.168]0010 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 830256eb7790 83025cc6d300 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 7fe396bab004 (XEN) [2015-04-11 14:03:32.249] 0005 830256ef7ca8 82d08014900b (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 0010 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 82d08018b655 830256ef7d48 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 830256ef7d38 82d08012925e (XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 80022e12c167 (XEN) [2015-04-11 14:03:32.384]
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
Saturday, April 11, 2015, 8:25:52 PM, you wrote: On 11/04/15 18:42, Sander Eikelenboom wrote: Saturday, April 11, 2015, 7:35:57 PM, you wrote: On 11/04/15 18:25, Sander Eikelenboom wrote: Saturday, April 11, 2015, 6:38:17 PM, you wrote: On 11/04/15 17:32, Andrew Cooper wrote: On 11/04/15 17:21, Sander Eikelenboom wrote: Saturday, April 11, 2015, 4:21:56 PM, you wrote: On 11/04/15 15:11, Sander Eikelenboom wrote: Friday, April 10, 2015, 8:55:27 PM, you wrote: On 10/04/15 11:24, Sander Eikelenboom wrote: Hi Andrew, Finally got some time to figure this out .. and i have narrowed it down to: git://xenbits.xen.org/staging/qemu-upstream-unstable.git commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the ioreq-server API when available A straight revert of this commit prevents the issue from happening. The reason i had a hard time figuring this out was: - I wasn't aware of this earlier, since git pulling the main xen tree, doesn't auto update the qemu-* trees. This has caught me out so many times. It is very non-obvious behaviour. - So i happen to get this when i cloned a fresh tree to try to figure out the other issue i was seeing. - After that checking out previous versions of the main xen tree didn't resolve this new issue, because the qemu tree doesn't get auto updated and is set master. - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not master. *sigh* This is tested with xen main tree at last commit 3a28f760508fb35c430edac17a9efde5aff6d1d5 (normal xen-unstable, not the staging branch) Ok so i have added some extra debug info (see attached diff) and this is the output when it crashes due to something the commit above triggered, the level is out of bounds and the pfn looks fishy too. Complete serial log from both bad and good (specific commit reverted) are attached. Just to confirm, you are positively identifying a qemu changeset as causing this crash? If so, the qemu change has discovered a pre-existing issue in the toolstack pci-passthrough interface. Whatever qemu is or isn't doing, it should not be able to cause a crash like this. With this in mind, I need to brush up on my AMD-Vi details. In the meantime, can you run with the following patch to identify what is going on, domctl wise? I assume it is the assign_device which is failing, but it will be nice to observe the differences between the working and failing case, which might offer a hint. Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl: (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) [2015-04-11 14:03:31.857] CPU:5 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] iommu_do_pci_domctl+0x2dc/0x740 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256 CONTEXT: hypervisor (XEN) [2015-04-11 14:03:31.915] rax: 0008 rbx: 0800 rcx: ffebe5ed (XEN) [2015-04-11 14:03:31.942] rdx: 0800 rsi: rdi: 830256ef7e38 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98 rsp: 830256ef7c08 r8: deadbeef (XEN) [2015-04-11 14:03:31.995] r9: deadbeef r10: 82d08024e500 r11: 0282 (XEN) [2015-04-11 14:03:32.022] r12: r13: 0008 r14: (XEN) [2015-04-11 14:03:32.049] r15: cr0: 80050033 cr4: 06f0 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000 cr2: (XEN) [2015-04-11 14:03:32.096] ds: es: fs: gs: ss: e010 cs: e008 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=830256ef7c08: (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 830256ef7c28 830256ef7c28 (XEN) [2015-04-11 14:03:32.168]0010 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 830256eb7790 83025cc6d300 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 7fe396bab004 (XEN) [2015-04-11 14:03:32.249] 0005 830256ef7ca8 82d08014900b (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 0010 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 82d08018b655 830256ef7d48 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 830256ef7d38 82d08012925e (XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 80022e12c167 (XEN) [2015-04-11 14:03:32.384]0002 830256ef7e38 0008 80022e12c167 (XEN) [2015-04-11 14:03:32.411]
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
On 11/04/15 15:11, Sander Eikelenboom wrote: Friday, April 10, 2015, 8:55:27 PM, you wrote: On 10/04/15 11:24, Sander Eikelenboom wrote: Hi Andrew, Finally got some time to figure this out .. and i have narrowed it down to: git://xenbits.xen.org/staging/qemu-upstream-unstable.git commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the ioreq-server API when available A straight revert of this commit prevents the issue from happening. The reason i had a hard time figuring this out was: - I wasn't aware of this earlier, since git pulling the main xen tree, doesn't auto update the qemu-* trees. This has caught me out so many times. It is very non-obvious behaviour. - So i happen to get this when i cloned a fresh tree to try to figure out the other issue i was seeing. - After that checking out previous versions of the main xen tree didn't resolve this new issue, because the qemu tree doesn't get auto updated and is set master. - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not master. *sigh* This is tested with xen main tree at last commit 3a28f760508fb35c430edac17a9efde5aff6d1d5 (normal xen-unstable, not the staging branch) Ok so i have added some extra debug info (see attached diff) and this is the output when it crashes due to something the commit above triggered, the level is out of bounds and the pfn looks fishy too. Complete serial log from both bad and good (specific commit reverted) are attached. Just to confirm, you are positively identifying a qemu changeset as causing this crash? If so, the qemu change has discovered a pre-existing issue in the toolstack pci-passthrough interface. Whatever qemu is or isn't doing, it should not be able to cause a crash like this. With this in mind, I need to brush up on my AMD-Vi details. In the meantime, can you run with the following patch to identify what is going on, domctl wise? I assume it is the assign_device which is failing, but it will be nice to observe the differences between the working and failing case, which might offer a hint. Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl: (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) [2015-04-11 14:03:31.857] CPU:5 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] iommu_do_pci_domctl+0x2dc/0x740 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256 CONTEXT: hypervisor (XEN) [2015-04-11 14:03:31.915] rax: 0008 rbx: 0800 rcx: ffebe5ed (XEN) [2015-04-11 14:03:31.942] rdx: 0800 rsi: rdi: 830256ef7e38 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98 rsp: 830256ef7c08 r8: deadbeef (XEN) [2015-04-11 14:03:31.995] r9: deadbeef r10: 82d08024e500 r11: 0282 (XEN) [2015-04-11 14:03:32.022] r12: r13: 0008 r14: (XEN) [2015-04-11 14:03:32.049] r15: cr0: 80050033 cr4: 06f0 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000 cr2: (XEN) [2015-04-11 14:03:32.096] ds: es: fs: gs: ss: e010 cs: e008 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=830256ef7c08: (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 830256ef7c28 830256ef7c28 (XEN) [2015-04-11 14:03:32.168]0010 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 830256eb7790 83025cc6d300 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 7fe396bab004 (XEN) [2015-04-11 14:03:32.249] 0005 830256ef7ca8 82d08014900b (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 0010 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 82d08018b655 830256ef7d48 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 830256ef7d38 82d08012925e (XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 80022e12c167 (XEN) [2015-04-11 14:03:32.384]0002 830256ef7e38 0008 80022e12c167 (XEN) [2015-04-11 14:03:32.411]0003 830256ef7db8 7fe396780eb0 (XEN) [2015-04-11 14:03:32.439]0202 7fe396bab004 (XEN) [2015-04-11 14:03:32.466] 0005 830256ef7ef8 82d08010497f (XEN) [2015-04-11 14:03:32.493]0001 0011 80022e12c167 88001f7ecc00 (XEN) [2015-04-11 14:03:32.520]
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
Saturday, April 11, 2015, 7:35:57 PM, you wrote: On 11/04/15 18:25, Sander Eikelenboom wrote: Saturday, April 11, 2015, 6:38:17 PM, you wrote: On 11/04/15 17:32, Andrew Cooper wrote: On 11/04/15 17:21, Sander Eikelenboom wrote: Saturday, April 11, 2015, 4:21:56 PM, you wrote: On 11/04/15 15:11, Sander Eikelenboom wrote: Friday, April 10, 2015, 8:55:27 PM, you wrote: On 10/04/15 11:24, Sander Eikelenboom wrote: Hi Andrew, Finally got some time to figure this out .. and i have narrowed it down to: git://xenbits.xen.org/staging/qemu-upstream-unstable.git commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the ioreq-server API when available A straight revert of this commit prevents the issue from happening. The reason i had a hard time figuring this out was: - I wasn't aware of this earlier, since git pulling the main xen tree, doesn't auto update the qemu-* trees. This has caught me out so many times. It is very non-obvious behaviour. - So i happen to get this when i cloned a fresh tree to try to figure out the other issue i was seeing. - After that checking out previous versions of the main xen tree didn't resolve this new issue, because the qemu tree doesn't get auto updated and is set master. - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not master. *sigh* This is tested with xen main tree at last commit 3a28f760508fb35c430edac17a9efde5aff6d1d5 (normal xen-unstable, not the staging branch) Ok so i have added some extra debug info (see attached diff) and this is the output when it crashes due to something the commit above triggered, the level is out of bounds and the pfn looks fishy too. Complete serial log from both bad and good (specific commit reverted) are attached. Just to confirm, you are positively identifying a qemu changeset as causing this crash? If so, the qemu change has discovered a pre-existing issue in the toolstack pci-passthrough interface. Whatever qemu is or isn't doing, it should not be able to cause a crash like this. With this in mind, I need to brush up on my AMD-Vi details. In the meantime, can you run with the following patch to identify what is going on, domctl wise? I assume it is the assign_device which is failing, but it will be nice to observe the differences between the working and failing case, which might offer a hint. Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl: (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) [2015-04-11 14:03:31.857] CPU:5 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] iommu_do_pci_domctl+0x2dc/0x740 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256 CONTEXT: hypervisor (XEN) [2015-04-11 14:03:31.915] rax: 0008 rbx: 0800 rcx: ffebe5ed (XEN) [2015-04-11 14:03:31.942] rdx: 0800 rsi: rdi: 830256ef7e38 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98 rsp: 830256ef7c08 r8: deadbeef (XEN) [2015-04-11 14:03:31.995] r9: deadbeef r10: 82d08024e500 r11: 0282 (XEN) [2015-04-11 14:03:32.022] r12: r13: 0008 r14: (XEN) [2015-04-11 14:03:32.049] r15: cr0: 80050033 cr4: 06f0 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000 cr2: (XEN) [2015-04-11 14:03:32.096] ds: es: fs: gs: ss: e010 cs: e008 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=830256ef7c08: (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 830256ef7c28 830256ef7c28 (XEN) [2015-04-11 14:03:32.168]0010 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 830256eb7790 83025cc6d300 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 7fe396bab004 (XEN) [2015-04-11 14:03:32.249] 0005 830256ef7ca8 82d08014900b (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 0010 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 82d08018b655 830256ef7d48 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 830256ef7d38 82d08012925e (XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 80022e12c167 (XEN) [2015-04-11 14:03:32.384]0002 830256ef7e38 0008 80022e12c167 (XEN) [2015-04-11 14:03:32.411]0003 830256ef7db8 7fe396780eb0 (XEN) [2015-04-11 14:03:32.439]
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
On 11/04/15 18:25, Sander Eikelenboom wrote: Saturday, April 11, 2015, 6:38:17 PM, you wrote: On 11/04/15 17:32, Andrew Cooper wrote: On 11/04/15 17:21, Sander Eikelenboom wrote: Saturday, April 11, 2015, 4:21:56 PM, you wrote: On 11/04/15 15:11, Sander Eikelenboom wrote: Friday, April 10, 2015, 8:55:27 PM, you wrote: On 10/04/15 11:24, Sander Eikelenboom wrote: Hi Andrew, Finally got some time to figure this out .. and i have narrowed it down to: git://xenbits.xen.org/staging/qemu-upstream-unstable.git commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the ioreq-server API when available A straight revert of this commit prevents the issue from happening. The reason i had a hard time figuring this out was: - I wasn't aware of this earlier, since git pulling the main xen tree, doesn't auto update the qemu-* trees. This has caught me out so many times. It is very non-obvious behaviour. - So i happen to get this when i cloned a fresh tree to try to figure out the other issue i was seeing. - After that checking out previous versions of the main xen tree didn't resolve this new issue, because the qemu tree doesn't get auto updated and is set master. - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not master. *sigh* This is tested with xen main tree at last commit 3a28f760508fb35c430edac17a9efde5aff6d1d5 (normal xen-unstable, not the staging branch) Ok so i have added some extra debug info (see attached diff) and this is the output when it crashes due to something the commit above triggered, the level is out of bounds and the pfn looks fishy too. Complete serial log from both bad and good (specific commit reverted) are attached. Just to confirm, you are positively identifying a qemu changeset as causing this crash? If so, the qemu change has discovered a pre-existing issue in the toolstack pci-passthrough interface. Whatever qemu is or isn't doing, it should not be able to cause a crash like this. With this in mind, I need to brush up on my AMD-Vi details. In the meantime, can you run with the following patch to identify what is going on, domctl wise? I assume it is the assign_device which is failing, but it will be nice to observe the differences between the working and failing case, which might offer a hint. Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl: (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) [2015-04-11 14:03:31.857] CPU:5 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] iommu_do_pci_domctl+0x2dc/0x740 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256 CONTEXT: hypervisor (XEN) [2015-04-11 14:03:31.915] rax: 0008 rbx: 0800 rcx: ffebe5ed (XEN) [2015-04-11 14:03:31.942] rdx: 0800 rsi: rdi: 830256ef7e38 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98 rsp: 830256ef7c08 r8: deadbeef (XEN) [2015-04-11 14:03:31.995] r9: deadbeef r10: 82d08024e500 r11: 0282 (XEN) [2015-04-11 14:03:32.022] r12: r13: 0008 r14: (XEN) [2015-04-11 14:03:32.049] r15: cr0: 80050033 cr4: 06f0 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000 cr2: (XEN) [2015-04-11 14:03:32.096] ds: es: fs: gs: ss: e010 cs: e008 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=830256ef7c08: (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 830256ef7c28 830256ef7c28 (XEN) [2015-04-11 14:03:32.168]0010 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 830256eb7790 83025cc6d300 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 7fe396bab004 (XEN) [2015-04-11 14:03:32.249] 0005 830256ef7ca8 82d08014900b (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 0010 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 82d08018b655 830256ef7d48 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 830256ef7d38 82d08012925e (XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 80022e12c167 (XEN) [2015-04-11 14:03:32.384]0002 830256ef7e38 0008 80022e12c167 (XEN) [2015-04-11 14:03:32.411]0003 830256ef7db8 7fe396780eb0 (XEN) [2015-04-11 14:03:32.439]0202
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
On 11/04/15 17:21, Sander Eikelenboom wrote: Saturday, April 11, 2015, 4:21:56 PM, you wrote: On 11/04/15 15:11, Sander Eikelenboom wrote: Friday, April 10, 2015, 8:55:27 PM, you wrote: On 10/04/15 11:24, Sander Eikelenboom wrote: Hi Andrew, Finally got some time to figure this out .. and i have narrowed it down to: git://xenbits.xen.org/staging/qemu-upstream-unstable.git commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the ioreq-server API when available A straight revert of this commit prevents the issue from happening. The reason i had a hard time figuring this out was: - I wasn't aware of this earlier, since git pulling the main xen tree, doesn't auto update the qemu-* trees. This has caught me out so many times. It is very non-obvious behaviour. - So i happen to get this when i cloned a fresh tree to try to figure out the other issue i was seeing. - After that checking out previous versions of the main xen tree didn't resolve this new issue, because the qemu tree doesn't get auto updated and is set master. - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not master. *sigh* This is tested with xen main tree at last commit 3a28f760508fb35c430edac17a9efde5aff6d1d5 (normal xen-unstable, not the staging branch) Ok so i have added some extra debug info (see attached diff) and this is the output when it crashes due to something the commit above triggered, the level is out of bounds and the pfn looks fishy too. Complete serial log from both bad and good (specific commit reverted) are attached. Just to confirm, you are positively identifying a qemu changeset as causing this crash? If so, the qemu change has discovered a pre-existing issue in the toolstack pci-passthrough interface. Whatever qemu is or isn't doing, it should not be able to cause a crash like this. With this in mind, I need to brush up on my AMD-Vi details. In the meantime, can you run with the following patch to identify what is going on, domctl wise? I assume it is the assign_device which is failing, but it will be nice to observe the differences between the working and failing case, which might offer a hint. Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl: (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) [2015-04-11 14:03:31.857] CPU:5 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] iommu_do_pci_domctl+0x2dc/0x740 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256 CONTEXT: hypervisor (XEN) [2015-04-11 14:03:31.915] rax: 0008 rbx: 0800 rcx: ffebe5ed (XEN) [2015-04-11 14:03:31.942] rdx: 0800 rsi: rdi: 830256ef7e38 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98 rsp: 830256ef7c08 r8: deadbeef (XEN) [2015-04-11 14:03:31.995] r9: deadbeef r10: 82d08024e500 r11: 0282 (XEN) [2015-04-11 14:03:32.022] r12: r13: 0008 r14: (XEN) [2015-04-11 14:03:32.049] r15: cr0: 80050033 cr4: 06f0 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000 cr2: (XEN) [2015-04-11 14:03:32.096] ds: es: fs: gs: ss: e010 cs: e008 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=830256ef7c08: (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 830256ef7c28 830256ef7c28 (XEN) [2015-04-11 14:03:32.168]0010 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 830256eb7790 83025cc6d300 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 7fe396bab004 (XEN) [2015-04-11 14:03:32.249] 0005 830256ef7ca8 82d08014900b (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 0010 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 82d08018b655 830256ef7d48 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 830256ef7d38 82d08012925e (XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 80022e12c167 (XEN) [2015-04-11 14:03:32.384]0002 830256ef7e38 0008 80022e12c167 (XEN) [2015-04-11 14:03:32.411]0003 830256ef7db8 7fe396780eb0 (XEN) [2015-04-11 14:03:32.439]0202 7fe396bab004 (XEN) [2015-04-11 14:03:32.466] 0005 830256ef7ef8 82d08010497f (XEN) [2015-04-11
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
Friday, April 10, 2015, 8:55:27 PM, you wrote: On 10/04/15 11:24, Sander Eikelenboom wrote: Hi Andrew, Finally got some time to figure this out .. and i have narrowed it down to: git://xenbits.xen.org/staging/qemu-upstream-unstable.git commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the ioreq-server API when available A straight revert of this commit prevents the issue from happening. The reason i had a hard time figuring this out was: - I wasn't aware of this earlier, since git pulling the main xen tree, doesn't auto update the qemu-* trees. This has caught me out so many times. It is very non-obvious behaviour. - So i happen to get this when i cloned a fresh tree to try to figure out the other issue i was seeing. - After that checking out previous versions of the main xen tree didn't resolve this new issue, because the qemu tree doesn't get auto updated and is set master. - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not master. *sigh* This is tested with xen main tree at last commit 3a28f760508fb35c430edac17a9efde5aff6d1d5 (normal xen-unstable, not the staging branch) Ok so i have added some extra debug info (see attached diff) and this is the output when it crashes due to something the commit above triggered, the level is out of bounds and the pfn looks fishy too. Complete serial log from both bad and good (specific commit reverted) are attached. Just to confirm, you are positively identifying a qemu changeset as causing this crash? If so, the qemu change has discovered a pre-existing issue in the toolstack pci-passthrough interface. Whatever qemu is or isn't doing, it should not be able to cause a crash like this. With this in mind, I need to brush up on my AMD-Vi details. In the meantime, can you run with the following patch to identify what is going on, domctl wise? I assume it is the assign_device which is failing, but it will be nice to observe the differences between the working and failing case, which might offer a hint. Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl: (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable x86_64 debug=y Tainted:C ] (XEN) [2015-04-11 14:03:31.857] CPU:5 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] iommu_do_pci_domctl+0x2dc/0x740 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256 CONTEXT: hypervisor (XEN) [2015-04-11 14:03:31.915] rax: 0008 rbx: 0800 rcx: ffebe5ed (XEN) [2015-04-11 14:03:31.942] rdx: 0800 rsi: rdi: 830256ef7e38 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98 rsp: 830256ef7c08 r8: deadbeef (XEN) [2015-04-11 14:03:31.995] r9: deadbeef r10: 82d08024e500 r11: 0282 (XEN) [2015-04-11 14:03:32.022] r12: r13: 0008 r14: (XEN) [2015-04-11 14:03:32.049] r15: cr0: 80050033 cr4: 06f0 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000 cr2: (XEN) [2015-04-11 14:03:32.096] ds: es: fs: gs: ss: e010 cs: e008 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=830256ef7c08: (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 830256ef7c28 830256ef7c28 (XEN) [2015-04-11 14:03:32.168]0010 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 830256eb7790 83025cc6d300 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 7fe396bab004 (XEN) [2015-04-11 14:03:32.249] 0005 830256ef7ca8 82d08014900b (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 0010 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 82d08018b655 830256ef7d48 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 830256ef7d38 82d08012925e (XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 80022e12c167 (XEN) [2015-04-11 14:03:32.384]0002 830256ef7e38 0008 80022e12c167 (XEN) [2015-04-11 14:03:32.411]0003 830256ef7db8 7fe396780eb0 (XEN) [2015-04-11 14:03:32.439]0202 7fe396bab004 (XEN) [2015-04-11 14:03:32.466] 0005 830256ef7ef8 82d08010497f (XEN) [2015-04-11 14:03:32.493]0001 0011 80022e12c167 88001f7ecc00 (XEN) [2015-04-11 14:03:32.520]7fe396780eb0 88001c849508 000e0007 8105594a (XEN) [2015-04-11
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
Wednesday, April 1, 2015, 1:38:34 AM, you wrote: On 31/03/2015 22:11, Sander Eikelenboom wrote: Hi all, I just tested xen-unstable staging (changeset: git:0522407-dirty) with revert of commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622 (due to an already reported but not yet resolved issue) and build with qemu xen from git://xenbits.xen.org/staging/qemu-upstream-unstable.git (to include the pci command register patch from Jan) and now came across this new splat when starting an HVM with PCI passtrhough: Wow - you are getting all the fun bugs at the moment! Nothing has changed in the AMD IOMMU driver for a while, but the BUG_ON() is particularly unhelpful at identifying what went wrong. As a first pass triage, can you rerun with diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c index 495ff5c..f15c324 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -451,8 +451,9 @@ static int iommu_pde_from_gfn(struct domain *d, unsigned long pfn, table = hd-arch.root_table; level = hd-arch.paging_mode; -BUG_ON( table == NULL || level IOMMU_PAGING_MODE_LEVEL_1 || -level IOMMU_PAGING_MODE_LEVEL_6 ); +BUG_ON(table == NULL); +BUG_ON(level IOMMU_PAGING_MODE_LEVEL_1); +BUG_ON(level IOMMU_PAGING_MODE_LEVEL_6); next_table_mfn = page_to_mfn(table); which will help identify which of the conditions is failing. Can you please also provide the full serial log, including iommu=debug? ~Andrew Hmm this was very weird .. i tried to get back to a previously working config (kernel + xen version) but to no avail .. ran memtest .. removed the cmos battery .. warm boots .. cold boots .. nothing seemed to help, always this same crash when starting the HVM guest with pci passthrough. PV guests with pci passthrough worked fine though .. Now finally it's working again .. by going back to 4.5.0 release .. and doing a baremetal linux boot in between. No idea what helped .. but it was a very strange day. But i need the box for the foreseeable future. Will see late next week if i have time (and the guts) to try again :-) -- Sander ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
Hi all, I just tested xen-unstable staging (changeset: git:0522407-dirty) with revert of commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622 (due to an already reported but not yet resolved issue) and build with qemu xen from git://xenbits.xen.org/staging/qemu-upstream-unstable.git (to include the pci command register patch from Jan) and now came across this new splat when starting an HVM with PCI passtrhough: (XEN) [2015-03-31 20:58:20.710] io.c:429: d17: bind: m_gsi=37 g_gsi=36 dev=00.00.5 intx=0 (XEN) [2015-03-31 20:58:21.100] Xen BUG at iommu_map.c:455 (XEN) [2015-03-31 20:58:21.100] [ Xen-4.6-unstable x86_64 debug=y Not tainted ] (XEN) [2015-03-31 20:58:21.100] CPU:0 (XEN) [2015-03-31 20:58:21.100] RIP:e008:[82d080155bb1] iommu_pde_from_gfn+0x38/0x430 (XEN) [2015-03-31 20:58:21.100] RFLAGS: 00010202 CONTEXT: hypervisor (XEN) [2015-03-31 20:58:21.100] rax: 0008 rbx: 0003 rcx: 82c000802000 (XEN) [2015-03-31 20:58:21.100] rdx: 82e007d56740 rsi: rdi: 8305167dd000 (XEN) [2015-03-31 20:58:21.100] rbp: 82d0802efad8 rsp: 82d0802efa78 r8: 83054eb755b0 (XEN) [2015-03-31 20:58:21.100] r9: 0003 r10: 0200 r11: 82d0802fc0d0 (XEN) [2015-03-31 20:58:21.100] r12: 82e0075527e0 r13: 05e9 r14: (XEN) [2015-03-31 20:58:21.100] r15: 7d20 cr0: 80050033 cr4: 06f0 (XEN) [2015-03-31 20:58:21.100] cr3: 00051a197000 cr2: 7efdd5ee1d48 (XEN) [2015-03-31 20:58:21.100] ds: es: fs: gs: ss: e010 cs: e008 (XEN) [2015-03-31 20:58:21.100] Xen stack trace from rsp=82d0802efa78: (XEN) [2015-03-31 20:58:21.100]8305167dd000 82d0802efb30 8305167dd190 (XEN) [2015-03-31 20:58:21.100]0286 82e007d56740 82e007552800 0003 (XEN) [2015-03-31 20:58:21.100]82e0075527e0 05e9 7d20 (XEN) [2015-03-31 20:58:21.100]82d0802efb98 82d0801560b6 7d2f7fd104e7 0001802351d2 (XEN) [2015-03-31 20:58:21.100]003aa93f 00020001 8305167dd938 (XEN) [2015-03-31 20:58:21.100]82004ff8 8305167dd000 0020941c (XEN) [2015-03-31 20:58:21.100] (XEN) [2015-03-31 20:58:21.100] 8305167dd938 8305167dd000 (XEN) [2015-03-31 20:58:21.100]82e0075527e0 05e9 7d20 (XEN) [2015-03-31 20:58:21.100]82d0802efbf8 82d08015a54d 8305167dd020 (XEN) [2015-03-31 20:58:21.100]82d0802e8000 003aa93f 82d0802efbf8 (XEN) [2015-03-31 20:58:21.100]8305167dd000 0800 8305167dd000 (XEN) [2015-03-31 20:58:21.100]82d0802efc98 82d08014c6c1 82d0802efc78 82d08012c298 (XEN) [2015-03-31 20:58:21.100]0286 82d0802efc28 0020 (XEN) [2015-03-31 20:58:21.100] 0008 7f6525ed2004 (XEN) [2015-03-31 20:58:21.100]83054eb1ab60 83055cc6c300 0282 7f6525ed2004 (XEN) [2015-03-31 20:58:21.100]8305167dd000 7f6525ed2004 8305167dd000 0005 (XEN) [2015-03-31 20:58:21.100]82d0802efca8 82d08014908b 82d0802efd98 82d080161f2d (XEN) [2015-03-31 20:58:21.100]0020 0005 0001 (XEN) [2015-03-31 20:58:21.100]82d080331bb8 0001 82d0802efde8 82d080120d00 (XEN) [2015-03-31 20:58:21.100] Xen call trace: (XEN) [2015-03-31 20:58:21.100][82d080155bb1] iommu_pde_from_gfn+0x38/0x430 (XEN) [2015-03-31 20:58:21.100][82d0801560b6] amd_iommu_map_page+0x10d/0x4e6 (XEN) [2015-03-31 20:58:21.100][82d08015a54d] arch_iommu_populate_page_table+0x179/0x4d8 (XEN) [2015-03-31 20:58:21.100][82d08014c6c1] iommu_do_pci_domctl+0x395/0x604 (XEN) [2015-03-31 20:58:21.100][82d08014908b] iommu_do_domctl+0x17/0x1a (XEN) [2015-03-31 20:58:21.100][82d080161f2d] arch_do_domctl+0x2469/0x26e1 (XEN) [2015-03-31 20:58:21.100][82d080104a6f] do_domctl+0x1a1f/0x1d60 (XEN) [2015-03-31 20:58:21.100][82d080234c9b] syscall_enter+0xeb/0x145 (XEN) [2015-03-31 20:58:21.100] (XEN) [2015-03-31 20:58:22.167] (XEN) [2015-03-31 20:58:22.176] (XEN) [2015-03-31 20:58:22.195] Panic on CPU 0: (XEN) [2015-03-31 20:58:22.208] Xen BUG at iommu_map.c:455 (XEN) [2015-03-31 20:58:22.223] (XEN) [2015-03-31 20:58:22.243] (XEN) [2015-03-31 20:58:22.252] Manual reset required ('noreboot' specified) Haven't tried
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
On 31/03/2015 22:11, Sander Eikelenboom wrote: Hi all, I just tested xen-unstable staging (changeset: git:0522407-dirty) with revert of commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622 (due to an already reported but not yet resolved issue) and build with qemu xen from git://xenbits.xen.org/staging/qemu-upstream-unstable.git (to include the pci command register patch from Jan) and now came across this new splat when starting an HVM with PCI passtrhough: Wow - you are getting all the fun bugs at the moment! Nothing has changed in the AMD IOMMU driver for a while, but the BUG_ON() is particularly unhelpful at identifying what went wrong. As a first pass triage, can you rerun with diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c index 495ff5c..f15c324 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -451,8 +451,9 @@ static int iommu_pde_from_gfn(struct domain *d, unsigned long pfn, table = hd-arch.root_table; level = hd-arch.paging_mode; -BUG_ON( table == NULL || level IOMMU_PAGING_MODE_LEVEL_1 || -level IOMMU_PAGING_MODE_LEVEL_6 ); +BUG_ON(table == NULL); +BUG_ON(level IOMMU_PAGING_MODE_LEVEL_1); +BUG_ON(level IOMMU_PAGING_MODE_LEVEL_6); next_table_mfn = page_to_mfn(table); which will help identify which of the conditions is failing. Can you please also provide the full serial log, including iommu=debug? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455
Wednesday, April 1, 2015, 1:38:34 AM, you wrote: On 31/03/2015 22:11, Sander Eikelenboom wrote: Hi all, I just tested xen-unstable staging (changeset: git:0522407-dirty) with revert of commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622 (due to an already reported but not yet resolved issue) and build with qemu xen from git://xenbits.xen.org/staging/qemu-upstream-unstable.git (to include the pci command register patch from Jan) and now came across this new splat when starting an HVM with PCI passtrhough: Wow - you are getting all the fun bugs at the moment! Hrmm i'm not so sure at the moment .. could also be a stale tree or is it just that it's april 1st .. *sigh* tried to git reset --hard to a known good changeset .. but it still seems to fail, even with cold boot. So sorry for the noise and please ignore for the moment while i'm trying to figure out what is fooling me :-) -- sander Nothing has changed in the AMD IOMMU driver for a while, but the BUG_ON() is particularly unhelpful at identifying what went wrong. As a first pass triage, can you rerun with diff --git a/xen/drivers/passthrough/amd/iommu_map.c b/xen/drivers/passthrough/amd/iommu_map.c index 495ff5c..f15c324 100644 --- a/xen/drivers/passthrough/amd/iommu_map.c +++ b/xen/drivers/passthrough/amd/iommu_map.c @@ -451,8 +451,9 @@ static int iommu_pde_from_gfn(struct domain *d, unsigned long pfn, table = hd-arch.root_table; level = hd-arch.paging_mode; -BUG_ON( table == NULL || level IOMMU_PAGING_MODE_LEVEL_1 || -level IOMMU_PAGING_MODE_LEVEL_6 ); +BUG_ON(table == NULL); +BUG_ON(level IOMMU_PAGING_MODE_LEVEL_1); +BUG_ON(level IOMMU_PAGING_MODE_LEVEL_6); next_table_mfn = page_to_mfn(table); which will help identify which of the conditions is failing. Can you please also provide the full serial log, including iommu=debug? ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel