Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-05-07 Thread Tim Deegan
At 16:31 +0100 on 05 May (1430843498), Jan Beulich wrote:
  On 05.05.15 at 17:17, t...@xen.org wrote:
  At 16:10 +0100 on 05 May (1430842206), Jan Beulich wrote:
  From what I
  can tell (and assuming other code works correctly) the fact that
  arch_iommu_populate_page_table() sets d-need_iommu to -1
  first thing should make sure that any subsequent changes to the
  p2m get propagated to IOMMU code for setting up respective
  mappings.
  
  Yes, but might they then be overridden by the previous mapping when
  this new code calls map_page()?
 
 Ah, I see now.
 
  This seems like a case where we should be using get_gfn()/put_gfn().
 
 Yes - provided these may be called at all with the page_alloc_lock
 held. IOW - is there lock ordering defined between this one and
 the various mm locks?

Good point.  The page_alloc lock nests inside the p2m lock, for PoD
(see page_alloc_mm_pre_lock() in mm-locks.h).  So we can't call p2m
operations here.

 Also, if doing so, would I then need to check the result of the
 inverse (p2m) translation after having done get_gfn() to make
 sure this is still the MFN I'm after? If so, and if it ends up being
 a different one, I'd have to retry and presumably somehow limit
 the number of retries...

Yes.  Ideally this loop would be iterating over all gfns in the p2m
rather than over all owned MFNs.  As long as needs_iommu gets set
first, such a loop could safely be paused and restarted without
worrying about concurrent updates.  The code sould even stay in this
file, though exposing an iterator from the p2m code would be a lot
more efficient.

In the meantime the patch you linked to is an improvement, so it can
have my ack.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-05-05 Thread Jan Beulich
 On 05.05.15 at 17:17, t...@xen.org wrote:
 At 16:10 +0100 on 05 May (1430842206), Jan Beulich wrote:
 From what I
 can tell (and assuming other code works correctly) the fact that
 arch_iommu_populate_page_table() sets d-need_iommu to -1
 first thing should make sure that any subsequent changes to the
 p2m get propagated to IOMMU code for setting up respective
 mappings.
 
 Yes, but might they then be overridden by the previous mapping when
 this new code calls map_page()?

Ah, I see now.

 This seems like a case where we should be using get_gfn()/put_gfn().

Yes - provided these may be called at all with the page_alloc_lock
held. IOW - is there lock ordering defined between this one and
the various mm locks?

Also, if doing so, would I then need to check the result of the
inverse (p2m) translation after having done get_gfn() to make
sure this is still the MFN I'm after? If so, and if it ends up being
a different one, I'd have to retry and presumably somehow limit
the number of retries...

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-05-05 Thread Jan Beulich
 On 16.04.15 at 11:28, t...@xen.org wrote:
 At 22:35 +0100 on 11 Apr (1428791713), Andrew Cooper wrote:
 I am not certain that it is the correct way to fix the issue, nor that
 the ioreq server code is the only way to trigger it.  There are several
 ways to shoot a gfn mapping from the guests physmap.
 
 At least we now understand why it happens.  I will defer to others CC'd
 on this thread for their opinions in the matter.
 
 The patch semes like a pretty good check to me, though I'm not
 convinced it's race-free.  At the least I'd cache the m2p lookup so we
 use the same value for the checks and the map_page() call. 

Did you have a chance to look at the patch Sander meanwhile
successfully tested [1]? I'm trying to understand where you see
possible races here, and hence whether anything else needs to
be done to that patch before formally submitting it. From what I
can tell (and assuming other code works correctly) the fact that
arch_iommu_populate_page_table() sets d-need_iommu to -1
first thing should make sure that any subsequent changes to the
p2m get propagated to IOMMU code for setting up respective
mappings.

Thanks, Jan

[1] http://lists.xenproject.org/archives/html/xen-devel/2015-04/msg02253.html

 And IMO update_paging_mode() ought to log and reject bogus GFNs as
 well.
 
 Cheers,
 
 Tim.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-05-05 Thread Tim Deegan
At 16:10 +0100 on 05 May (1430842206), Jan Beulich wrote:
  On 16.04.15 at 11:28, t...@xen.org wrote:
  At 22:35 +0100 on 11 Apr (1428791713), Andrew Cooper wrote:
  I am not certain that it is the correct way to fix the issue, nor that
  the ioreq server code is the only way to trigger it.  There are several
  ways to shoot a gfn mapping from the guests physmap.
  
  At least we now understand why it happens.  I will defer to others CC'd
  on this thread for their opinions in the matter.
  
  The patch semes like a pretty good check to me, though I'm not
  convinced it's race-free.  At the least I'd cache the m2p lookup so we
  use the same value for the checks and the map_page() call. 
 
 Did you have a chance to look at the patch Sander meanwhile
 successfully tested [1]?

Just looked at it now.

 I'm trying to understand where you see
 possible races here, and hence whether anything else needs to
 be done to that patch before formally submitting it.

It caches the m2p mlookup, which I like, but there's still a race
against concurrent p2m updates.

 From what I
 can tell (and assuming other code works correctly) the fact that
 arch_iommu_populate_page_table() sets d-need_iommu to -1
 first thing should make sure that any subsequent changes to the
 p2m get propagated to IOMMU code for setting up respective
 mappings.

Yes, but might they then be overridden by the previous mapping when
this new code calls map_page()?

This seems like a case where we should be using get_gfn()/put_gfn().

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-21 Thread Jan Beulich
 On 21.04.15 at 10:24, li...@eikelenboom.it wrote:
 Tuesday, April 21, 2015, 10:11:07 AM, you wrote:
 Interesting - didn't you say that as a side effect of Andrew's patch
 you saw massive log spam?
 
 If you mean these:
 
 (XEN) [2015-04-12 14:55:20.226] p2m.c:884:d0v0 gfn_to_mfn failed! 
 gfn=001ed type:4
 [...]
 
 Those were actually due to Konrad's kernel patch that was on the devel-4.1 
 branch that has already been dropped. 
 (commit 22d8a8938407cb1342af763e937fdf9ee8daf24a
  'xen/pciback: Don't disable PCI_COMMAND on PCI device reset.')

Ah, okay. Iirc there was no progress towards a resolution there yet?

 For the rest there is some extra log spam now, since the memory maps now are 
 done 
 in very small chunks (the hypercall continuation stuff working?):
 (XEN) [2015-04-21 08:04:01.207] memory_map:add: dom20 gfn=ec780 mfn=cc780 
 nr=40
 [...]
 Don't know if that makes much sense anymore (unless specifically enabled if 
 you 
 want such detail .. and the whole range with perhaps a start and finish 
 message 
 is not enough)

The hypervisor can't really tell whether a re-invocation of said
hypercall is a continuation or a new request. Hence we can only
either drop the message altogether or live with it being spammy
on large regions (it's a XENLOG_G_INFO one anyway, so not
enabled by default, and if enabled usually rate limited).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-21 Thread Jan Beulich
 On 20.04.15 at 20:50, li...@eikelenboom.it wrote:
 Monday, April 20, 2015, 6:11:42 PM, you wrote:
 On 16.04.15 at 11:28, t...@xen.org wrote:
 At 22:35 +0100 on 11 Apr (1428791713), Andrew Cooper wrote:
 At least we now understand why it happens.  I will defer to others CC'd
 on this thread for their opinions in the matter.
 
 The patch semes like a pretty good check to me, though I'm not
 convinced it's race-free.  At the least I'd cache the m2p lookup so we
 use the same value for the checks and the map_page() call. 
 
 And IMO update_paging_mode() ought to log and reject bogus GFNs as
 well.
 
 could you give the patch below a try, namely also in the context
 of seeing again the issue originally fixed by Andrew's initial patch?
 Please make sure you try a debug build and you have
 iommu=debug on the Xen command line.
 
 I'm running with it now, have seen no issues so far !

Interesting - didn't you say that as a side effect of Andrew's patch
you saw massive log spam?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-20 Thread Jan Beulich
 On 16.04.15 at 11:28, t...@xen.org wrote:
 At 22:35 +0100 on 11 Apr (1428791713), Andrew Cooper wrote:
 At least we now understand why it happens.  I will defer to others CC'd
 on this thread for their opinions in the matter.
 
 The patch semes like a pretty good check to me, though I'm not
 convinced it's race-free.  At the least I'd cache the m2p lookup so we
 use the same value for the checks and the map_page() call. 
 
 And IMO update_paging_mode() ought to log and reject bogus GFNs as
 well.

Sander,

could you give the patch below a try, namely also in the context
of seeing again the issue originally fixed by Andrew's initial patch?
Please make sure you try a debug build and you have
iommu=debug on the Xen command line.

Jan

--- unstable.orig/xen/drivers/passthrough/amd/iommu_map.c
+++ unstable/xen/drivers/passthrough/amd/iommu_map.c
@@ -557,6 +557,10 @@ static int update_paging_mode(struct dom
 unsigned long old_root_mfn;
 struct hvm_iommu *hd = domain_hvm_iommu(d);
 
+if ( gfn == INVALID_MFN )
+return -EADDRNOTAVAIL;
+ASSERT(!(gfn  DEFAULT_DOMAIN_ADDRESS_WIDTH));
+
 level = hd-arch.paging_mode;
 old_root = hd-arch.root_table;
 offset = gfn  (PTE_PER_TABLE_SHIFT * (level - 1));
@@ -729,12 +733,15 @@ int amd_iommu_unmap_page(struct domain *
  * we might need a deeper page table for lager gfn now */
 if ( is_hvm_domain(d) )
 {
-if ( update_paging_mode(d, gfn) )
+int rc = update_paging_mode(d, gfn);
+
+if ( rc )
 {
 spin_unlock(hd-arch.mapping_lock);
 AMD_IOMMU_DEBUG(Update page mode failed gfn = %lx\n, gfn);
-domain_crash(d);
-return -EFAULT;
+if ( rc != -EADDRNOTAVAIL )
+domain_crash(d);
+return rc;
 }
 }
 
--- unstable.orig/xen/drivers/passthrough/x86/iommu.c
+++ unstable/xen/drivers/passthrough/x86/iommu.c
@@ -59,10 +59,17 @@ int arch_iommu_populate_page_table(struc
 if ( has_hvm_container_domain(d) ||
 (page-u.inuse.type_info  PGT_type_mask) == PGT_writable_page )
 {
-BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page;
-rc = hd-platform_ops-map_page(
-d, mfn_to_gmfn(d, page_to_mfn(page)), page_to_mfn(page),
-IOMMUF_readable|IOMMUF_writable);
+unsigned long mfn = page_to_mfn(page);
+unsigned long gfn = mfn_to_gmfn(d, mfn);
+
+if ( gfn != INVALID_MFN )
+{
+ASSERT(!(gfn  DEFAULT_DOMAIN_ADDRESS_WIDTH));
+BUG_ON(SHARED_M2P(gfn));
+rc = hd-platform_ops-map_page(d, gfn, mfn,
+IOMMUF_readable |
+IOMMUF_writable);
+}
 if ( rc )
 {
 page_list_add(page, d-page_list);
--- unstable.orig/xen/drivers/passthrough/vtd/iommu.h
+++ unstable/xen/drivers/passthrough/vtd/iommu.h
@@ -482,7 +482,6 @@ struct qinval_entry {
 #define VTD_PAGE_TABLE_LEVEL_3  3
 #define VTD_PAGE_TABLE_LEVEL_4  4
 
-#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
 #define MAX_IOMMU_REGS 0xc0
 
 extern struct list_head acpi_drhd_units;
--- unstable.orig/xen/include/asm-x86/hvm/iommu.h
+++ unstable/xen/include/asm-x86/hvm/iommu.h
@@ -46,6 +46,8 @@ struct g2m_ioport {
 unsigned int np;
 };
 
+#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
+
 struct arch_hvm_iommu
 {
 u64 pgd_maddr; /* io page directory machine address */
--- unstable.orig/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
+++ unstable/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
@@ -464,8 +464,6 @@
 #define IOMMU_CONTROL_DISABLED 0
 #define IOMMU_CONTROL_ENABLED  1
 
-#define DEFAULT_DOMAIN_ADDRESS_WIDTH48
-
 /* interrupt remapping table */
 #define INT_REMAP_ENTRY_REMAPEN_MASK0x0001
 #define INT_REMAP_ENTRY_REMAPEN_SHIFT   0



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-20 Thread Sander Eikelenboom

Monday, April 20, 2015, 6:11:42 PM, you wrote:

 On 16.04.15 at 11:28, t...@xen.org wrote:
 At 22:35 +0100 on 11 Apr (1428791713), Andrew Cooper wrote:
 At least we now understand why it happens.  I will defer to others CC'd
 on this thread for their opinions in the matter.
 
 The patch semes like a pretty good check to me, though I'm not
 convinced it's race-free.  At the least I'd cache the m2p lookup so we
 use the same value for the checks and the map_page() call. 
 
 And IMO update_paging_mode() ought to log and reject bogus GFNs as
 well.

 Sander,

 could you give the patch below a try, namely also in the context
 of seeing again the issue originally fixed by Andrew's initial patch?
 Please make sure you try a debug build and you have
 iommu=debug on the Xen command line.

 Jan

Hi Jan,

Should this be applied on top of Andrew's initial patch, or instead of ?

--
Sander


 --- unstable.orig/xen/drivers/passthrough/amd/iommu_map.c
 +++ unstable/xen/drivers/passthrough/amd/iommu_map.c
 @@ -557,6 +557,10 @@ static int update_paging_mode(struct dom
  unsigned long old_root_mfn;
  struct hvm_iommu *hd = domain_hvm_iommu(d);
  
 +if ( gfn == INVALID_MFN )
 +return -EADDRNOTAVAIL;
+ASSERT(!(gfn  DEFAULT_DOMAIN_ADDRESS_WIDTH));
 +
  level = hd-arch.paging_mode;
  old_root = hd-arch.root_table;
  offset = gfn  (PTE_PER_TABLE_SHIFT * (level - 1));
 @@ -729,12 +733,15 @@ int amd_iommu_unmap_page(struct domain *
   * we might need a deeper page table for lager gfn now */
  if ( is_hvm_domain(d) )
  {
 -if ( update_paging_mode(d, gfn) )
 +int rc = update_paging_mode(d, gfn);
 +
 +if ( rc )
  {
  spin_unlock(hd-arch.mapping_lock);
  AMD_IOMMU_DEBUG(Update page mode failed gfn = %lx\n, gfn);
 -domain_crash(d);
 -return -EFAULT;
 +if ( rc != -EADDRNOTAVAIL )
 +domain_crash(d);
 +return rc;
  }
  }
  
 --- unstable.orig/xen/drivers/passthrough/x86/iommu.c
 +++ unstable/xen/drivers/passthrough/x86/iommu.c
 @@ -59,10 +59,17 @@ int arch_iommu_populate_page_table(struc
  if ( has_hvm_container_domain(d) ||
  (page-u.inuse.type_info  PGT_type_mask) == PGT_writable_page )
  {
 -BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page;
 -rc = hd-platform_ops-map_page(
 -d, mfn_to_gmfn(d, page_to_mfn(page)), page_to_mfn(page),
 -IOMMUF_readable|IOMMUF_writable);
 +unsigned long mfn = page_to_mfn(page);
 +unsigned long gfn = mfn_to_gmfn(d, mfn);
 +
 +if ( gfn != INVALID_MFN )
 +{
 +ASSERT(!(gfn  DEFAULT_DOMAIN_ADDRESS_WIDTH));
 +BUG_ON(SHARED_M2P(gfn));
 +rc = hd-platform_ops-map_page(d, gfn, mfn,
 +IOMMUF_readable |
 +IOMMUF_writable);
 +}
  if ( rc )
  {
  page_list_add(page, d-page_list);
 --- unstable.orig/xen/drivers/passthrough/vtd/iommu.h
 +++ unstable/xen/drivers/passthrough/vtd/iommu.h
 @@ -482,7 +482,6 @@ struct qinval_entry {
  #define VTD_PAGE_TABLE_LEVEL_3  3
  #define VTD_PAGE_TABLE_LEVEL_4  4
  
 -#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
  #define MAX_IOMMU_REGS 0xc0
  
  extern struct list_head acpi_drhd_units;
 --- unstable.orig/xen/include/asm-x86/hvm/iommu.h
 +++ unstable/xen/include/asm-x86/hvm/iommu.h
 @@ -46,6 +46,8 @@ struct g2m_ioport {
  unsigned int np;
  };
  
 +#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
 +
  struct arch_hvm_iommu
  {
  u64 pgd_maddr; /* io page directory machine address */
 --- unstable.orig/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
 +++ unstable/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
 @@ -464,8 +464,6 @@
  #define IOMMU_CONTROL_DISABLED   0
  #define IOMMU_CONTROL_ENABLED  1
  
 -#define DEFAULT_DOMAIN_ADDRESS_WIDTH48
 -
  /* interrupt remapping table */
  #define INT_REMAP_ENTRY_REMAPEN_MASK0x0001
  #define INT_REMAP_ENTRY_REMAPEN_SHIFT   0





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-20 Thread Sander Eikelenboom

Monday, April 20, 2015, 6:11:42 PM, you wrote:

 On 16.04.15 at 11:28, t...@xen.org wrote:
 At 22:35 +0100 on 11 Apr (1428791713), Andrew Cooper wrote:
 At least we now understand why it happens.  I will defer to others CC'd
 on this thread for their opinions in the matter.
 
 The patch semes like a pretty good check to me, though I'm not
 convinced it's race-free.  At the least I'd cache the m2p lookup so we
 use the same value for the checks and the map_page() call. 
 
 And IMO update_paging_mode() ought to log and reject bogus GFNs as
 well.

 Sander,

 could you give the patch below a try, namely also in the context
 of seeing again the issue originally fixed by Andrew's initial patch?
 Please make sure you try a debug build and you have
 iommu=debug on the Xen command line.

 Jan

Hi Jan,

I'm running with it now, have seen no issues so far !

--
Sander

 --- unstable.orig/xen/drivers/passthrough/amd/iommu_map.c
 +++ unstable/xen/drivers/passthrough/amd/iommu_map.c
 @@ -557,6 +557,10 @@ static int update_paging_mode(struct dom
  unsigned long old_root_mfn;
  struct hvm_iommu *hd = domain_hvm_iommu(d);
  
 +if ( gfn == INVALID_MFN )
 +return -EADDRNOTAVAIL;
+ASSERT(!(gfn  DEFAULT_DOMAIN_ADDRESS_WIDTH));
 +
  level = hd-arch.paging_mode;
  old_root = hd-arch.root_table;
  offset = gfn  (PTE_PER_TABLE_SHIFT * (level - 1));
 @@ -729,12 +733,15 @@ int amd_iommu_unmap_page(struct domain *
   * we might need a deeper page table for lager gfn now */
  if ( is_hvm_domain(d) )
  {
 -if ( update_paging_mode(d, gfn) )
 +int rc = update_paging_mode(d, gfn);
 +
 +if ( rc )
  {
  spin_unlock(hd-arch.mapping_lock);
  AMD_IOMMU_DEBUG(Update page mode failed gfn = %lx\n, gfn);
 -domain_crash(d);
 -return -EFAULT;
 +if ( rc != -EADDRNOTAVAIL )
 +domain_crash(d);
 +return rc;
  }
  }
  
 --- unstable.orig/xen/drivers/passthrough/x86/iommu.c
 +++ unstable/xen/drivers/passthrough/x86/iommu.c
 @@ -59,10 +59,17 @@ int arch_iommu_populate_page_table(struc
  if ( has_hvm_container_domain(d) ||
  (page-u.inuse.type_info  PGT_type_mask) == PGT_writable_page )
  {
 -BUG_ON(SHARED_M2P(mfn_to_gmfn(d, page_to_mfn(page;
 -rc = hd-platform_ops-map_page(
 -d, mfn_to_gmfn(d, page_to_mfn(page)), page_to_mfn(page),
 -IOMMUF_readable|IOMMUF_writable);
 +unsigned long mfn = page_to_mfn(page);
 +unsigned long gfn = mfn_to_gmfn(d, mfn);
 +
 +if ( gfn != INVALID_MFN )
 +{
 +ASSERT(!(gfn  DEFAULT_DOMAIN_ADDRESS_WIDTH));
 +BUG_ON(SHARED_M2P(gfn));
 +rc = hd-platform_ops-map_page(d, gfn, mfn,
 +IOMMUF_readable |
 +IOMMUF_writable);
 +}
  if ( rc )
  {
  page_list_add(page, d-page_list);
 --- unstable.orig/xen/drivers/passthrough/vtd/iommu.h
 +++ unstable/xen/drivers/passthrough/vtd/iommu.h
 @@ -482,7 +482,6 @@ struct qinval_entry {
  #define VTD_PAGE_TABLE_LEVEL_3  3
  #define VTD_PAGE_TABLE_LEVEL_4  4
  
 -#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
  #define MAX_IOMMU_REGS 0xc0
  
  extern struct list_head acpi_drhd_units;
 --- unstable.orig/xen/include/asm-x86/hvm/iommu.h
 +++ unstable/xen/include/asm-x86/hvm/iommu.h
 @@ -46,6 +46,8 @@ struct g2m_ioport {
  unsigned int np;
  };
  
 +#define DEFAULT_DOMAIN_ADDRESS_WIDTH 48
 +
  struct arch_hvm_iommu
  {
  u64 pgd_maddr; /* io page directory machine address */
 --- unstable.orig/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
 +++ unstable/xen/include/asm-x86/hvm/svm/amd-iommu-defs.h
 @@ -464,8 +464,6 @@
  #define IOMMU_CONTROL_DISABLED   0
  #define IOMMU_CONTROL_ENABLED  1
  
 -#define DEFAULT_DOMAIN_ADDRESS_WIDTH48
 -
  /* interrupt remapping table */
  #define INT_REMAP_ENTRY_REMAPEN_MASK0x0001
  #define INT_REMAP_ENTRY_REMAPEN_SHIFT   0





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-16 Thread Tim Deegan
[Trimmed egregious quoting]

At 22:35 +0100 on 11 Apr (1428791713), Andrew Cooper wrote:
 I am not certain that it is the correct way to fix the issue, nor that
 the ioreq server code is the only way to trigger it.  There are several
 ways to shoot a gfn mapping from the guests physmap.
 
 At least we now understand why it happens.  I will defer to others CC'd
 on this thread for their opinions in the matter.

The patch semes like a pretty good check to me, though I'm not
convinced it's race-free.  At the least I'd cache the m2p lookup so we
use the same value for the checks and the map_page() call. 

And IMO update_paging_mode() ought to log and reject bogus GFNs as
well.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-15 Thread Konrad Rzeszutek Wilk
On Sat, Apr 11, 2015 at 10:35:13PM +0100, Andrew Cooper wrote:
 On 11/04/2015 22:05, Sander Eikelenboom wrote:
  Saturday, April 11, 2015, 10:22:16 PM, you wrote:
 
  On 11/04/2015 20:33, Sander Eikelenboom wrote:
  Saturday, April 11, 2015, 8:25:52 PM, you wrote:
 
  On 11/04/15 18:42, Sander Eikelenboom wrote:
  Saturday, April 11, 2015, 7:35:57 PM, you wrote:
 
  On 11/04/15 18:25, Sander Eikelenboom wrote:
  Saturday, April 11, 2015, 6:38:17 PM, you wrote:
 
  On 11/04/15 17:32, Andrew Cooper wrote:
  On 11/04/15 17:21, Sander Eikelenboom wrote:
  Saturday, April 11, 2015, 4:21:56 PM, you wrote:
 
  On 11/04/15 15:11, Sander Eikelenboom wrote:
  Friday, April 10, 2015, 8:55:27 PM, you wrote:
 
  On 10/04/15 11:24, Sander Eikelenboom wrote:
  Hi Andrew,
 
  Finally got some time to figure this out .. and i have 
  narrowed it down to:
  git://xenbits.xen.org/staging/qemu-upstream-unstable.git
  commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the 
  ioreq-server API when available
  A straight revert of this commit prevents the issue from 
  happening.
 
  The reason i had a hard time figuring this out was:
  - I wasn't aware of this earlier, since git pulling the main 
  xen tree, doesn't 
auto update the qemu-* trees.
  This has caught me out so many times.  It is very non-obvious 
  behaviour.
  - So i happen to get this when i cloned a fresh tree to try to 
  figure out the 
other issue i was seeing.
  - After that checking out previous versions of the main xen 
  tree didn't resolve 
this new issue, because the qemu tree doesn't get auto 
  updated and is set 
master.
  - Cloning a xen-stable-4.5.0 made it go away .. because that 
  has a specific 
git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag 
  which is not 
master.
 
  *sigh* 
 
  This is tested with xen main tree at last commit 
  3a28f760508fb35c430edac17a9efde5aff6d1d5
  (normal xen-unstable, not the staging branch)
 
  Ok so i have added some extra debug info (see attached diff) 
  and this is the 
  output when it crashes due to something the commit above 
  triggered, the 
  level is out of bounds and the pfn looks fishy too.
  Complete serial log from both bad and good (specific commit 
  reverted) are 
  attached.
  Just to confirm, you are positively identifying a qemu 
  changeset as
  causing this crash?
  If so, the qemu change has discovered a pre-existing issue in 
  the
  toolstack pci-passthrough interface.  Whatever qemu is or isn't 
  doing,
  it should not be able to cause a crash like this.
  With this in mind, I need to brush up on my AMD-Vi details.
  In the meantime, can you run with the following patch to 
  identify what
  is going on, domctl wise?  I assume it is the assign_device 
  which is
  failing, but it will be nice to observe the differences between 
  the
  working and failing case, which might offer a hint.
  Hrrm with your patch i end up with a fatal page fault in 
  iommu_do_pci_domctl:
 
  (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable  x86_64  
  debug=y  Tainted:C ]
  (XEN) [2015-04-11 14:03:31.857] CPU:5
  (XEN) [2015-04-11 14:03:31.868] RIP:
  e008:[82d08014c52c] iommu_do_pci_domctl+0x2dc/0x740
  (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256   
  CONTEXT: hypervisor
  (XEN) [2015-04-11 14:03:31.915] rax: 0008   rbx: 
  0800   rcx: ffebe5ed
  (XEN) [2015-04-11 14:03:31.942] rdx: 0800   rsi: 
     rdi: 830256ef7e38
  (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98   rsp: 
  830256ef7c08   r8:  deadbeef
  (XEN) [2015-04-11 14:03:31.995] r9:  deadbeef   r10: 
  82d08024e500   r11: 0282
  (XEN) [2015-04-11 14:03:32.022] r12:    r13: 
  0008   r14: 
  (XEN) [2015-04-11 14:03:32.049] r15:    cr0: 
  80050033   cr4: 06f0
  (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000   cr2: 
  
  (XEN) [2015-04-11 14:03:32.096] ds:    es:    fs:    
  gs:    ss: e010   cs: e008
  (XEN) [2015-04-11 14:03:32.121] Xen stack trace from 
  rsp=830256ef7c08:
  (XEN) [2015-04-11 14:03:32.141]830256ef7c78 
  82d08012c178 830256ef7c28 830256ef7c28
  (XEN) [2015-04-11 14:03:32.168]0010 
    
  (XEN) [2015-04-11 14:03:32.195]06f0 
  7fe3 830256eb7790 83025cc6d300
  (XEN) [2015-04-11 14:03:32.222]82d080330c60 
  7fe396bab004  7fe396bab004
  (XEN) [2015-04-11 14:03:32.249] 
  0005 830256ef7ca8 82d08014900b
  (XEN) [2015-04-11 14:03:32.276]830256ef7d98 
  82d080161f2d 0010 
  (XEN) [2015-04-11 14:03:32.303] 
  830256ef7ce8 82d08018b655 830256ef7d48
  (XEN) [2015-04-11 

Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-12 Thread Sander Eikelenboom

Sunday, April 12, 2015, 5:15:58 PM, you wrote:


 Saturday, April 11, 2015, 11:35:13 PM, you wrote:

 On 11/04/2015 22:05, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 10:22:16 PM, you wrote:

 On 11/04/2015 20:33, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 8:25:52 PM, you wrote:

 On 11/04/15 18:42, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 7:35:57 PM, you wrote:

 On 11/04/15 18:25, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 6:38:17 PM, you wrote:

 On 11/04/15 17:32, Andrew Cooper wrote:
 On 11/04/15 17:21, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 4:21:56 PM, you wrote:

 On 11/04/15 15:11, Sander Eikelenboom wrote:
 Friday, April 10, 2015, 8:55:27 PM, you wrote:

 On 10/04/15 11:24, Sander Eikelenboom wrote:
 Hi Andrew,

 Finally got some time to figure this out .. and i have 
 narrowed it down to:
 git://xenbits.xen.org/staging/qemu-upstream-unstable.git
 commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the 
 ioreq-server API when available
 A straight revert of this commit prevents the issue from 
 happening.

 The reason i had a hard time figuring this out was:
 - I wasn't aware of this earlier, since git pulling the main 
 xen tree, doesn't 
   auto update the qemu-* trees.
 This has caught me out so many times.  It is very non-obvious 
 behaviour.
 - So i happen to get this when i cloned a fresh tree to try to 
 figure out the 
   other issue i was seeing.
 - After that checking out previous versions of the main xen 
 tree didn't resolve 
   this new issue, because the qemu tree doesn't get auto 
 updated and is set 
   master.
 - Cloning a xen-stable-4.5.0 made it go away .. because that 
 has a specific 
   git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag 
 which is not 
   master.

 *sigh* 

 This is tested with xen main tree at last commit 
 3a28f760508fb35c430edac17a9efde5aff6d1d5
 (normal xen-unstable, not the staging branch)

 Ok so i have added some extra debug info (see attached diff) 
 and this is the 
 output when it crashes due to something the commit above 
 triggered, the 
 level is out of bounds and the pfn looks fishy too.
 Complete serial log from both bad and good (specific commit 
 reverted) are 
 attached.
 Just to confirm, you are positively identifying a qemu 
 changeset as
 causing this crash?
 If so, the qemu change has discovered a pre-existing issue in 
 the
 toolstack pci-passthrough interface.  Whatever qemu is or isn't 
 doing,
 it should not be able to cause a crash like this.
 With this in mind, I need to brush up on my AMD-Vi details.
 In the meantime, can you run with the following patch to 
 identify what
 is going on, domctl wise?  I assume it is the assign_device 
 which is
 failing, but it will be nice to observe the differences between 
 the
 working and failing case, which might offer a hint.
 Hrrm with your patch i end up with a fatal page fault in 
 iommu_do_pci_domctl:

 (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable  x86_64  
 debug=y  Tainted:C ]
 (XEN) [2015-04-11 14:03:31.857] CPU:5
 (XEN) [2015-04-11 14:03:31.868] RIP:
 e008:[82d08014c52c] iommu_do_pci_domctl+0x2dc/0x740
 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256   
 CONTEXT: hypervisor
 (XEN) [2015-04-11 14:03:31.915] rax: 0008   rbx: 
 0800   rcx: ffebe5ed
 (XEN) [2015-04-11 14:03:31.942] rdx: 0800   rsi: 
    rdi: 830256ef7e38
 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98   rsp: 
 830256ef7c08   r8:  deadbeef
 (XEN) [2015-04-11 14:03:31.995] r9:  deadbeef   r10: 
 82d08024e500   r11: 0282
 (XEN) [2015-04-11 14:03:32.022] r12:    r13: 
 0008   r14: 
 (XEN) [2015-04-11 14:03:32.049] r15:    cr0: 
 80050033   cr4: 06f0
 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000   cr2: 
 
 (XEN) [2015-04-11 14:03:32.096] ds:    es:    fs:    
 gs:    ss: e010   cs: e008
 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from 
 rsp=830256ef7c08:
 (XEN) [2015-04-11 14:03:32.141]830256ef7c78 
 82d08012c178 830256ef7c28 830256ef7c28
 (XEN) [2015-04-11 14:03:32.168]0010 
   
 (XEN) [2015-04-11 14:03:32.195]06f0 
 7fe3 830256eb7790 83025cc6d300
 (XEN) [2015-04-11 14:03:32.222]82d080330c60 
 7fe396bab004  7fe396bab004
 (XEN) [2015-04-11 14:03:32.249] 
 0005 830256ef7ca8 82d08014900b
 (XEN) [2015-04-11 14:03:32.276]830256ef7d98 
 82d080161f2d 0010 
 (XEN) [2015-04-11 14:03:32.303] 
 830256ef7ce8 82d08018b655 830256ef7d48
 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 
 82d08018b66a 830256ef7d38 

Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-12 Thread Sander Eikelenboom

Saturday, April 11, 2015, 11:35:13 PM, you wrote:

 On 11/04/2015 22:05, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 10:22:16 PM, you wrote:

 On 11/04/2015 20:33, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 8:25:52 PM, you wrote:

 On 11/04/15 18:42, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 7:35:57 PM, you wrote:

 On 11/04/15 18:25, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 6:38:17 PM, you wrote:

 On 11/04/15 17:32, Andrew Cooper wrote:
 On 11/04/15 17:21, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 4:21:56 PM, you wrote:

 On 11/04/15 15:11, Sander Eikelenboom wrote:
 Friday, April 10, 2015, 8:55:27 PM, you wrote:

 On 10/04/15 11:24, Sander Eikelenboom wrote:
 Hi Andrew,

 Finally got some time to figure this out .. and i have narrowed 
 it down to:
 git://xenbits.xen.org/staging/qemu-upstream-unstable.git
 commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the 
 ioreq-server API when available
 A straight revert of this commit prevents the issue from 
 happening.

 The reason i had a hard time figuring this out was:
 - I wasn't aware of this earlier, since git pulling the main 
 xen tree, doesn't 
   auto update the qemu-* trees.
 This has caught me out so many times.  It is very non-obvious 
 behaviour.
 - So i happen to get this when i cloned a fresh tree to try to 
 figure out the 
   other issue i was seeing.
 - After that checking out previous versions of the main xen 
 tree didn't resolve 
   this new issue, because the qemu tree doesn't get auto 
 updated and is set 
   master.
 - Cloning a xen-stable-4.5.0 made it go away .. because that 
 has a specific 
   git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag 
 which is not 
   master.

 *sigh* 

 This is tested with xen main tree at last commit 
 3a28f760508fb35c430edac17a9efde5aff6d1d5
 (normal xen-unstable, not the staging branch)

 Ok so i have added some extra debug info (see attached diff) 
 and this is the 
 output when it crashes due to something the commit above 
 triggered, the 
 level is out of bounds and the pfn looks fishy too.
 Complete serial log from both bad and good (specific commit 
 reverted) are 
 attached.
 Just to confirm, you are positively identifying a qemu changeset 
 as
 causing this crash?
 If so, the qemu change has discovered a pre-existing issue in the
 toolstack pci-passthrough interface.  Whatever qemu is or isn't 
 doing,
 it should not be able to cause a crash like this.
 With this in mind, I need to brush up on my AMD-Vi details.
 In the meantime, can you run with the following patch to 
 identify what
 is going on, domctl wise?  I assume it is the assign_device 
 which is
 failing, but it will be nice to observe the differences between 
 the
 working and failing case, which might offer a hint.
 Hrrm with your patch i end up with a fatal page fault in 
 iommu_do_pci_domctl:

 (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable  x86_64  
 debug=y  Tainted:C ]
 (XEN) [2015-04-11 14:03:31.857] CPU:5
 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] 
 iommu_do_pci_domctl+0x2dc/0x740
 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256   
 CONTEXT: hypervisor
 (XEN) [2015-04-11 14:03:31.915] rax: 0008   rbx: 
 0800   rcx: ffebe5ed
 (XEN) [2015-04-11 14:03:31.942] rdx: 0800   rsi: 
    rdi: 830256ef7e38
 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98   rsp: 
 830256ef7c08   r8:  deadbeef
 (XEN) [2015-04-11 14:03:31.995] r9:  deadbeef   r10: 
 82d08024e500   r11: 0282
 (XEN) [2015-04-11 14:03:32.022] r12:    r13: 
 0008   r14: 
 (XEN) [2015-04-11 14:03:32.049] r15:    cr0: 
 80050033   cr4: 06f0
 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000   cr2: 
 
 (XEN) [2015-04-11 14:03:32.096] ds:    es:    fs:    
 gs:    ss: e010   cs: e008
 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from 
 rsp=830256ef7c08:
 (XEN) [2015-04-11 14:03:32.141]830256ef7c78 
 82d08012c178 830256ef7c28 830256ef7c28
 (XEN) [2015-04-11 14:03:32.168]0010 
   
 (XEN) [2015-04-11 14:03:32.195]06f0 
 7fe3 830256eb7790 83025cc6d300
 (XEN) [2015-04-11 14:03:32.222]82d080330c60 
 7fe396bab004  7fe396bab004
 (XEN) [2015-04-11 14:03:32.249] 
 0005 830256ef7ca8 82d08014900b
 (XEN) [2015-04-11 14:03:32.276]830256ef7d98 
 82d080161f2d 0010 
 (XEN) [2015-04-11 14:03:32.303] 
 830256ef7ce8 82d08018b655 830256ef7d48
 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 
 82d08018b66a 830256ef7d38 82d08012925e
 (XEN) [2015-04-11 14:03:32.357]

Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-11 Thread Andrew Cooper
On 11/04/2015 20:33, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 8:25:52 PM, you wrote:

 On 11/04/15 18:42, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 7:35:57 PM, you wrote:

 On 11/04/15 18:25, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 6:38:17 PM, you wrote:

 On 11/04/15 17:32, Andrew Cooper wrote:
 On 11/04/15 17:21, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 4:21:56 PM, you wrote:

 On 11/04/15 15:11, Sander Eikelenboom wrote:
 Friday, April 10, 2015, 8:55:27 PM, you wrote:

 On 10/04/15 11:24, Sander Eikelenboom wrote:
 Hi Andrew,

 Finally got some time to figure this out .. and i have narrowed it 
 down to:
 git://xenbits.xen.org/staging/qemu-upstream-unstable.git
 commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the 
 ioreq-server API when available
 A straight revert of this commit prevents the issue from happening.

 The reason i had a hard time figuring this out was:
 - I wasn't aware of this earlier, since git pulling the main xen 
 tree, doesn't 
   auto update the qemu-* trees.
 This has caught me out so many times.  It is very non-obvious 
 behaviour.
 - So i happen to get this when i cloned a fresh tree to try to 
 figure out the 
   other issue i was seeing.
 - After that checking out previous versions of the main xen tree 
 didn't resolve 
   this new issue, because the qemu tree doesn't get auto updated 
 and is set 
   master.
 - Cloning a xen-stable-4.5.0 made it go away .. because that has a 
 specific 
   git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag 
 which is not 
   master.

 *sigh* 

 This is tested with xen main tree at last commit 
 3a28f760508fb35c430edac17a9efde5aff6d1d5
 (normal xen-unstable, not the staging branch)

 Ok so i have added some extra debug info (see attached diff) and 
 this is the 
 output when it crashes due to something the commit above 
 triggered, the 
 level is out of bounds and the pfn looks fishy too.
 Complete serial log from both bad and good (specific commit 
 reverted) are 
 attached.
 Just to confirm, you are positively identifying a qemu changeset as
 causing this crash?
 If so, the qemu change has discovered a pre-existing issue in the
 toolstack pci-passthrough interface.  Whatever qemu is or isn't 
 doing,
 it should not be able to cause a crash like this.
 With this in mind, I need to brush up on my AMD-Vi details.
 In the meantime, can you run with the following patch to identify 
 what
 is going on, domctl wise?  I assume it is the assign_device which is
 failing, but it will be nice to observe the differences between the
 working and failing case, which might offer a hint.
 Hrrm with your patch i end up with a fatal page fault in 
 iommu_do_pci_domctl:

 (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable  x86_64  
 debug=y  Tainted:C ]
 (XEN) [2015-04-11 14:03:31.857] CPU:5
 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] 
 iommu_do_pci_domctl+0x2dc/0x740
 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256   CONTEXT: 
 hypervisor
 (XEN) [2015-04-11 14:03:31.915] rax: 0008   rbx: 
 0800   rcx: ffebe5ed
 (XEN) [2015-04-11 14:03:31.942] rdx: 0800   rsi: 
    rdi: 830256ef7e38
 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98   rsp: 
 830256ef7c08   r8:  deadbeef
 (XEN) [2015-04-11 14:03:31.995] r9:  deadbeef   r10: 
 82d08024e500   r11: 0282
 (XEN) [2015-04-11 14:03:32.022] r12:    r13: 
 0008   r14: 
 (XEN) [2015-04-11 14:03:32.049] r15:    cr0: 
 80050033   cr4: 06f0
 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000   cr2: 
 
 (XEN) [2015-04-11 14:03:32.096] ds:    es:    fs:    gs: 
    ss: e010   cs: e008
 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from 
 rsp=830256ef7c08:
 (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 
 830256ef7c28 830256ef7c28
 (XEN) [2015-04-11 14:03:32.168]0010  
  
 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 
 830256eb7790 83025cc6d300
 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 
  7fe396bab004
 (XEN) [2015-04-11 14:03:32.249] 0005 
 830256ef7ca8 82d08014900b
 (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 
 0010 
 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 
 82d08018b655 830256ef7d48
 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 
 830256ef7d38 82d08012925e
 (XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 
 80022e12c167 
 (XEN) [2015-04-11 14:03:32.384]0002 830256ef7e38 
 0008 80022e12c167

Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-11 Thread Sander Eikelenboom

Saturday, April 11, 2015, 10:22:16 PM, you wrote:

 On 11/04/2015 20:33, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 8:25:52 PM, you wrote:

 On 11/04/15 18:42, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 7:35:57 PM, you wrote:

 On 11/04/15 18:25, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 6:38:17 PM, you wrote:

 On 11/04/15 17:32, Andrew Cooper wrote:
 On 11/04/15 17:21, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 4:21:56 PM, you wrote:

 On 11/04/15 15:11, Sander Eikelenboom wrote:
 Friday, April 10, 2015, 8:55:27 PM, you wrote:

 On 10/04/15 11:24, Sander Eikelenboom wrote:
 Hi Andrew,

 Finally got some time to figure this out .. and i have narrowed 
 it down to:
 git://xenbits.xen.org/staging/qemu-upstream-unstable.git
 commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the 
 ioreq-server API when available
 A straight revert of this commit prevents the issue from 
 happening.

 The reason i had a hard time figuring this out was:
 - I wasn't aware of this earlier, since git pulling the main xen 
 tree, doesn't 
   auto update the qemu-* trees.
 This has caught me out so many times.  It is very non-obvious 
 behaviour.
 - So i happen to get this when i cloned a fresh tree to try to 
 figure out the 
   other issue i was seeing.
 - After that checking out previous versions of the main xen tree 
 didn't resolve 
   this new issue, because the qemu tree doesn't get auto updated 
 and is set 
   master.
 - Cloning a xen-stable-4.5.0 made it go away .. because that has 
 a specific 
   git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag 
 which is not 
   master.

 *sigh* 

 This is tested with xen main tree at last commit 
 3a28f760508fb35c430edac17a9efde5aff6d1d5
 (normal xen-unstable, not the staging branch)

 Ok so i have added some extra debug info (see attached diff) and 
 this is the 
 output when it crashes due to something the commit above 
 triggered, the 
 level is out of bounds and the pfn looks fishy too.
 Complete serial log from both bad and good (specific commit 
 reverted) are 
 attached.
 Just to confirm, you are positively identifying a qemu changeset as
 causing this crash?
 If so, the qemu change has discovered a pre-existing issue in the
 toolstack pci-passthrough interface.  Whatever qemu is or isn't 
 doing,
 it should not be able to cause a crash like this.
 With this in mind, I need to brush up on my AMD-Vi details.
 In the meantime, can you run with the following patch to identify 
 what
 is going on, domctl wise?  I assume it is the assign_device which 
 is
 failing, but it will be nice to observe the differences between the
 working and failing case, which might offer a hint.
 Hrrm with your patch i end up with a fatal page fault in 
 iommu_do_pci_domctl:

 (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable  x86_64  
 debug=y  Tainted:C ]
 (XEN) [2015-04-11 14:03:31.857] CPU:5
 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] 
 iommu_do_pci_domctl+0x2dc/0x740
 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256   CONTEXT: 
 hypervisor
 (XEN) [2015-04-11 14:03:31.915] rax: 0008   rbx: 
 0800   rcx: ffebe5ed
 (XEN) [2015-04-11 14:03:31.942] rdx: 0800   rsi: 
    rdi: 830256ef7e38
 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98   rsp: 
 830256ef7c08   r8:  deadbeef
 (XEN) [2015-04-11 14:03:31.995] r9:  deadbeef   r10: 
 82d08024e500   r11: 0282
 (XEN) [2015-04-11 14:03:32.022] r12:    r13: 
 0008   r14: 
 (XEN) [2015-04-11 14:03:32.049] r15:    cr0: 
 80050033   cr4: 06f0
 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000   cr2: 
 
 (XEN) [2015-04-11 14:03:32.096] ds:    es:    fs:    
 gs:    ss: e010   cs: e008
 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from 
 rsp=830256ef7c08:
 (XEN) [2015-04-11 14:03:32.141]830256ef7c78 
 82d08012c178 830256ef7c28 830256ef7c28
 (XEN) [2015-04-11 14:03:32.168]0010 
   
 (XEN) [2015-04-11 14:03:32.195]06f0 
 7fe3 830256eb7790 83025cc6d300
 (XEN) [2015-04-11 14:03:32.222]82d080330c60 
 7fe396bab004  7fe396bab004
 (XEN) [2015-04-11 14:03:32.249] 
 0005 830256ef7ca8 82d08014900b
 (XEN) [2015-04-11 14:03:32.276]830256ef7d98 
 82d080161f2d 0010 
 (XEN) [2015-04-11 14:03:32.303] 
 830256ef7ce8 82d08018b655 830256ef7d48
 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 
 82d08018b66a 830256ef7d38 82d08012925e
 (XEN) [2015-04-11 14:03:32.357]830256efc068 
 00080001 80022e12c167 
 (XEN) [2015-04-11 14:03:32.384]

Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-11 Thread Sander Eikelenboom

Saturday, April 11, 2015, 8:25:52 PM, you wrote:

 On 11/04/15 18:42, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 7:35:57 PM, you wrote:

 On 11/04/15 18:25, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 6:38:17 PM, you wrote:

 On 11/04/15 17:32, Andrew Cooper wrote:
 On 11/04/15 17:21, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 4:21:56 PM, you wrote:

 On 11/04/15 15:11, Sander Eikelenboom wrote:
 Friday, April 10, 2015, 8:55:27 PM, you wrote:

 On 10/04/15 11:24, Sander Eikelenboom wrote:
 Hi Andrew,

 Finally got some time to figure this out .. and i have narrowed it 
 down to:
 git://xenbits.xen.org/staging/qemu-upstream-unstable.git
 commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the 
 ioreq-server API when available
 A straight revert of this commit prevents the issue from happening.

 The reason i had a hard time figuring this out was:
 - I wasn't aware of this earlier, since git pulling the main xen 
 tree, doesn't 
   auto update the qemu-* trees.
 This has caught me out so many times.  It is very non-obvious 
 behaviour.
 - So i happen to get this when i cloned a fresh tree to try to 
 figure out the 
   other issue i was seeing.
 - After that checking out previous versions of the main xen tree 
 didn't resolve 
   this new issue, because the qemu tree doesn't get auto updated 
 and is set 
   master.
 - Cloning a xen-stable-4.5.0 made it go away .. because that has a 
 specific 
   git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag 
 which is not 
   master.

 *sigh* 

 This is tested with xen main tree at last commit 
 3a28f760508fb35c430edac17a9efde5aff6d1d5
 (normal xen-unstable, not the staging branch)

 Ok so i have added some extra debug info (see attached diff) and 
 this is the 
 output when it crashes due to something the commit above triggered, 
 the 
 level is out of bounds and the pfn looks fishy too.
 Complete serial log from both bad and good (specific commit 
 reverted) are 
 attached.
 Just to confirm, you are positively identifying a qemu changeset as
 causing this crash?
 If so, the qemu change has discovered a pre-existing issue in the
 toolstack pci-passthrough interface.  Whatever qemu is or isn't 
 doing,
 it should not be able to cause a crash like this.
 With this in mind, I need to brush up on my AMD-Vi details.
 In the meantime, can you run with the following patch to identify 
 what
 is going on, domctl wise?  I assume it is the assign_device which is
 failing, but it will be nice to observe the differences between the
 working and failing case, which might offer a hint.
 Hrrm with your patch i end up with a fatal page fault in 
 iommu_do_pci_domctl:

 (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable  x86_64  
 debug=y  Tainted:C ]
 (XEN) [2015-04-11 14:03:31.857] CPU:5
 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] 
 iommu_do_pci_domctl+0x2dc/0x740
 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256   CONTEXT: 
 hypervisor
 (XEN) [2015-04-11 14:03:31.915] rax: 0008   rbx: 
 0800   rcx: ffebe5ed
 (XEN) [2015-04-11 14:03:31.942] rdx: 0800   rsi: 
    rdi: 830256ef7e38
 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98   rsp: 
 830256ef7c08   r8:  deadbeef
 (XEN) [2015-04-11 14:03:31.995] r9:  deadbeef   r10: 
 82d08024e500   r11: 0282
 (XEN) [2015-04-11 14:03:32.022] r12:    r13: 
 0008   r14: 
 (XEN) [2015-04-11 14:03:32.049] r15:    cr0: 
 80050033   cr4: 06f0
 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000   cr2: 
 
 (XEN) [2015-04-11 14:03:32.096] ds:    es:    fs:    gs: 
    ss: e010   cs: e008
 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from 
 rsp=830256ef7c08:
 (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 
 830256ef7c28 830256ef7c28
 (XEN) [2015-04-11 14:03:32.168]0010  
  
 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 
 830256eb7790 83025cc6d300
 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 
  7fe396bab004
 (XEN) [2015-04-11 14:03:32.249] 0005 
 830256ef7ca8 82d08014900b
 (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 
 0010 
 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 
 82d08018b655 830256ef7d48
 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 
 830256ef7d38 82d08012925e
 (XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 
 80022e12c167 
 (XEN) [2015-04-11 14:03:32.384]0002 830256ef7e38 
 0008 80022e12c167
 (XEN) [2015-04-11 14:03:32.411]

Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-11 Thread Andrew Cooper
On 11/04/15 15:11, Sander Eikelenboom wrote:
 Friday, April 10, 2015, 8:55:27 PM, you wrote:

 On 10/04/15 11:24, Sander Eikelenboom wrote:
 Hi Andrew,

 Finally got some time to figure this out .. and i have narrowed it down to:
 git://xenbits.xen.org/staging/qemu-upstream-unstable.git
 commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the ioreq-server 
 API when available
 A straight revert of this commit prevents the issue from happening.

 The reason i had a hard time figuring this out was:
 - I wasn't aware of this earlier, since git pulling the main xen tree, 
 doesn't 
   auto update the qemu-* trees.
 This has caught me out so many times.  It is very non-obvious behaviour.
 - So i happen to get this when i cloned a fresh tree to try to figure out 
 the 
   other issue i was seeing.
 - After that checking out previous versions of the main xen tree didn't 
 resolve 
   this new issue, because the qemu tree doesn't get auto updated and is set 
   master.
 - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific 
   git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not 
   master.

 *sigh* 

 This is tested with xen main tree at last commit 
 3a28f760508fb35c430edac17a9efde5aff6d1d5
 (normal xen-unstable, not the staging branch)

 Ok so i have added some extra debug info (see attached diff) and this is 
 the 
 output when it crashes due to something the commit above triggered, the 
 level is out of bounds and the pfn looks fishy too.
 Complete serial log from both bad and good (specific commit reverted) are 
 attached.
 Just to confirm, you are positively identifying a qemu changeset as
 causing this crash?
 If so, the qemu change has discovered a pre-existing issue in the
 toolstack pci-passthrough interface.  Whatever qemu is or isn't doing,
 it should not be able to cause a crash like this.
 With this in mind, I need to brush up on my AMD-Vi details.
 In the meantime, can you run with the following patch to identify what
 is going on, domctl wise?  I assume it is the assign_device which is
 failing, but it will be nice to observe the differences between the
 working and failing case, which might offer a hint.
 Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl:

 (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable  x86_64  debug=y  
 Tainted:C ]
 (XEN) [2015-04-11 14:03:31.857] CPU:5
 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] 
 iommu_do_pci_domctl+0x2dc/0x740
 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256   CONTEXT: hypervisor
 (XEN) [2015-04-11 14:03:31.915] rax: 0008   rbx: 0800 
   rcx: ffebe5ed
 (XEN) [2015-04-11 14:03:31.942] rdx: 0800   rsi:  
   rdi: 830256ef7e38
 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98   rsp: 830256ef7c08 
   r8:  deadbeef
 (XEN) [2015-04-11 14:03:31.995] r9:  deadbeef   r10: 82d08024e500 
   r11: 0282
 (XEN) [2015-04-11 14:03:32.022] r12:    r13: 0008 
   r14: 
 (XEN) [2015-04-11 14:03:32.049] r15:    cr0: 80050033 
   cr4: 06f0
 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000   cr2: 
 (XEN) [2015-04-11 14:03:32.096] ds:    es:    fs:    gs:    
 ss: e010   cs: e008
 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=830256ef7c08:
 (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 
 830256ef7c28 830256ef7c28
 (XEN) [2015-04-11 14:03:32.168]0010  
  
 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 
 830256eb7790 83025cc6d300
 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 
  7fe396bab004
 (XEN) [2015-04-11 14:03:32.249] 0005 
 830256ef7ca8 82d08014900b
 (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 
 0010 
 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 
 82d08018b655 830256ef7d48
 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 
 830256ef7d38 82d08012925e
 (XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 
 80022e12c167 
 (XEN) [2015-04-11 14:03:32.384]0002 830256ef7e38 
 0008 80022e12c167
 (XEN) [2015-04-11 14:03:32.411]0003 830256ef7db8 
  7fe396780eb0
 (XEN) [2015-04-11 14:03:32.439]0202  
  7fe396bab004
 (XEN) [2015-04-11 14:03:32.466] 0005 
 830256ef7ef8 82d08010497f
 (XEN) [2015-04-11 14:03:32.493]0001 0011 
 80022e12c167 88001f7ecc00
 (XEN) [2015-04-11 14:03:32.520] 

Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-11 Thread Sander Eikelenboom

Saturday, April 11, 2015, 7:35:57 PM, you wrote:

 On 11/04/15 18:25, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 6:38:17 PM, you wrote:

 On 11/04/15 17:32, Andrew Cooper wrote:
 On 11/04/15 17:21, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 4:21:56 PM, you wrote:

 On 11/04/15 15:11, Sander Eikelenboom wrote:
 Friday, April 10, 2015, 8:55:27 PM, you wrote:

 On 10/04/15 11:24, Sander Eikelenboom wrote:
 Hi Andrew,

 Finally got some time to figure this out .. and i have narrowed it 
 down to:
 git://xenbits.xen.org/staging/qemu-upstream-unstable.git
 commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the 
 ioreq-server API when available
 A straight revert of this commit prevents the issue from happening.

 The reason i had a hard time figuring this out was:
 - I wasn't aware of this earlier, since git pulling the main xen 
 tree, doesn't 
   auto update the qemu-* trees.
 This has caught me out so many times.  It is very non-obvious 
 behaviour.
 - So i happen to get this when i cloned a fresh tree to try to figure 
 out the 
   other issue i was seeing.
 - After that checking out previous versions of the main xen tree 
 didn't resolve 
   this new issue, because the qemu tree doesn't get auto updated and 
 is set 
   master.
 - Cloning a xen-stable-4.5.0 made it go away .. because that has a 
 specific 
   git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which 
 is not 
   master.

 *sigh* 

 This is tested with xen main tree at last commit 
 3a28f760508fb35c430edac17a9efde5aff6d1d5
 (normal xen-unstable, not the staging branch)

 Ok so i have added some extra debug info (see attached diff) and this 
 is the 
 output when it crashes due to something the commit above triggered, 
 the 
 level is out of bounds and the pfn looks fishy too.
 Complete serial log from both bad and good (specific commit reverted) 
 are 
 attached.
 Just to confirm, you are positively identifying a qemu changeset as
 causing this crash?
 If so, the qemu change has discovered a pre-existing issue in the
 toolstack pci-passthrough interface.  Whatever qemu is or isn't doing,
 it should not be able to cause a crash like this.
 With this in mind, I need to brush up on my AMD-Vi details.
 In the meantime, can you run with the following patch to identify what
 is going on, domctl wise?  I assume it is the assign_device which is
 failing, but it will be nice to observe the differences between the
 working and failing case, which might offer a hint.
 Hrrm with your patch i end up with a fatal page fault in 
 iommu_do_pci_domctl:

 (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable  x86_64  debug=y 
  Tainted:C ]
 (XEN) [2015-04-11 14:03:31.857] CPU:5
 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] 
 iommu_do_pci_domctl+0x2dc/0x740
 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256   CONTEXT: 
 hypervisor
 (XEN) [2015-04-11 14:03:31.915] rax: 0008   rbx: 
 0800   rcx: ffebe5ed
 (XEN) [2015-04-11 14:03:31.942] rdx: 0800   rsi: 
    rdi: 830256ef7e38
 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98   rsp: 
 830256ef7c08   r8:  deadbeef
 (XEN) [2015-04-11 14:03:31.995] r9:  deadbeef   r10: 
 82d08024e500   r11: 0282
 (XEN) [2015-04-11 14:03:32.022] r12:    r13: 
 0008   r14: 
 (XEN) [2015-04-11 14:03:32.049] r15:    cr0: 
 80050033   cr4: 06f0
 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000   cr2: 
 
 (XEN) [2015-04-11 14:03:32.096] ds:    es:    fs:    gs: 
    ss: e010   cs: e008
 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from 
 rsp=830256ef7c08:
 (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 
 830256ef7c28 830256ef7c28
 (XEN) [2015-04-11 14:03:32.168]0010  
  
 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 
 830256eb7790 83025cc6d300
 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 
  7fe396bab004
 (XEN) [2015-04-11 14:03:32.249] 0005 
 830256ef7ca8 82d08014900b
 (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 
 0010 
 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 
 82d08018b655 830256ef7d48
 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 
 830256ef7d38 82d08012925e
 (XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 
 80022e12c167 
 (XEN) [2015-04-11 14:03:32.384]0002 830256ef7e38 
 0008 80022e12c167
 (XEN) [2015-04-11 14:03:32.411]0003 830256ef7db8 
  7fe396780eb0
 (XEN) [2015-04-11 14:03:32.439]

Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-11 Thread Andrew Cooper
On 11/04/15 18:25, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 6:38:17 PM, you wrote:

 On 11/04/15 17:32, Andrew Cooper wrote:
 On 11/04/15 17:21, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 4:21:56 PM, you wrote:

 On 11/04/15 15:11, Sander Eikelenboom wrote:
 Friday, April 10, 2015, 8:55:27 PM, you wrote:

 On 10/04/15 11:24, Sander Eikelenboom wrote:
 Hi Andrew,

 Finally got some time to figure this out .. and i have narrowed it 
 down to:
 git://xenbits.xen.org/staging/qemu-upstream-unstable.git
 commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the 
 ioreq-server API when available
 A straight revert of this commit prevents the issue from happening.

 The reason i had a hard time figuring this out was:
 - I wasn't aware of this earlier, since git pulling the main xen tree, 
 doesn't 
   auto update the qemu-* trees.
 This has caught me out so many times.  It is very non-obvious behaviour.
 - So i happen to get this when i cloned a fresh tree to try to figure 
 out the 
   other issue i was seeing.
 - After that checking out previous versions of the main xen tree 
 didn't resolve 
   this new issue, because the qemu tree doesn't get auto updated and 
 is set 
   master.
 - Cloning a xen-stable-4.5.0 made it go away .. because that has a 
 specific 
   git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which 
 is not 
   master.

 *sigh* 

 This is tested with xen main tree at last commit 
 3a28f760508fb35c430edac17a9efde5aff6d1d5
 (normal xen-unstable, not the staging branch)

 Ok so i have added some extra debug info (see attached diff) and this 
 is the 
 output when it crashes due to something the commit above triggered, 
 the 
 level is out of bounds and the pfn looks fishy too.
 Complete serial log from both bad and good (specific commit reverted) 
 are 
 attached.
 Just to confirm, you are positively identifying a qemu changeset as
 causing this crash?
 If so, the qemu change has discovered a pre-existing issue in the
 toolstack pci-passthrough interface.  Whatever qemu is or isn't doing,
 it should not be able to cause a crash like this.
 With this in mind, I need to brush up on my AMD-Vi details.
 In the meantime, can you run with the following patch to identify what
 is going on, domctl wise?  I assume it is the assign_device which is
 failing, but it will be nice to observe the differences between the
 working and failing case, which might offer a hint.
 Hrrm with your patch i end up with a fatal page fault in 
 iommu_do_pci_domctl:

 (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable  x86_64  debug=y  
 Tainted:C ]
 (XEN) [2015-04-11 14:03:31.857] CPU:5
 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] 
 iommu_do_pci_domctl+0x2dc/0x740
 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256   CONTEXT: 
 hypervisor
 (XEN) [2015-04-11 14:03:31.915] rax: 0008   rbx: 
 0800   rcx: ffebe5ed
 (XEN) [2015-04-11 14:03:31.942] rdx: 0800   rsi: 
    rdi: 830256ef7e38
 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98   rsp: 
 830256ef7c08   r8:  deadbeef
 (XEN) [2015-04-11 14:03:31.995] r9:  deadbeef   r10: 
 82d08024e500   r11: 0282
 (XEN) [2015-04-11 14:03:32.022] r12:    r13: 
 0008   r14: 
 (XEN) [2015-04-11 14:03:32.049] r15:    cr0: 
 80050033   cr4: 06f0
 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000   cr2: 
 
 (XEN) [2015-04-11 14:03:32.096] ds:    es:    fs:    gs: 
    ss: e010   cs: e008
 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from 
 rsp=830256ef7c08:
 (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 
 830256ef7c28 830256ef7c28
 (XEN) [2015-04-11 14:03:32.168]0010  
  
 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 
 830256eb7790 83025cc6d300
 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 
  7fe396bab004
 (XEN) [2015-04-11 14:03:32.249] 0005 
 830256ef7ca8 82d08014900b
 (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 
 0010 
 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 
 82d08018b655 830256ef7d48
 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 
 830256ef7d38 82d08012925e
 (XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 
 80022e12c167 
 (XEN) [2015-04-11 14:03:32.384]0002 830256ef7e38 
 0008 80022e12c167
 (XEN) [2015-04-11 14:03:32.411]0003 830256ef7db8 
  7fe396780eb0
 (XEN) [2015-04-11 14:03:32.439]0202  
  

Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-11 Thread Andrew Cooper
On 11/04/15 17:21, Sander Eikelenboom wrote:
 Saturday, April 11, 2015, 4:21:56 PM, you wrote:

 On 11/04/15 15:11, Sander Eikelenboom wrote:
 Friday, April 10, 2015, 8:55:27 PM, you wrote:

 On 10/04/15 11:24, Sander Eikelenboom wrote:
 Hi Andrew,

 Finally got some time to figure this out .. and i have narrowed it down 
 to:
 git://xenbits.xen.org/staging/qemu-upstream-unstable.git
 commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the 
 ioreq-server API when available
 A straight revert of this commit prevents the issue from happening.

 The reason i had a hard time figuring this out was:
 - I wasn't aware of this earlier, since git pulling the main xen tree, 
 doesn't 
   auto update the qemu-* trees.
 This has caught me out so many times.  It is very non-obvious behaviour.
 - So i happen to get this when i cloned a fresh tree to try to figure out 
 the 
   other issue i was seeing.
 - After that checking out previous versions of the main xen tree didn't 
 resolve 
   this new issue, because the qemu tree doesn't get auto updated and is 
 set 
   master.
 - Cloning a xen-stable-4.5.0 made it go away .. because that has a 
 specific 
   git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is 
 not 
   master.

 *sigh* 

 This is tested with xen main tree at last commit 
 3a28f760508fb35c430edac17a9efde5aff6d1d5
 (normal xen-unstable, not the staging branch)

 Ok so i have added some extra debug info (see attached diff) and this is 
 the 
 output when it crashes due to something the commit above triggered, the 
 level is out of bounds and the pfn looks fishy too.
 Complete serial log from both bad and good (specific commit reverted) are 
 attached.
 Just to confirm, you are positively identifying a qemu changeset as
 causing this crash?
 If so, the qemu change has discovered a pre-existing issue in the
 toolstack pci-passthrough interface.  Whatever qemu is or isn't doing,
 it should not be able to cause a crash like this.
 With this in mind, I need to brush up on my AMD-Vi details.
 In the meantime, can you run with the following patch to identify what
 is going on, domctl wise?  I assume it is the assign_device which is
 failing, but it will be nice to observe the differences between the
 working and failing case, which might offer a hint.
 Hrrm with your patch i end up with a fatal page fault in 
 iommu_do_pci_domctl:

 (XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable  x86_64  debug=y  
 Tainted:C ]
 (XEN) [2015-04-11 14:03:31.857] CPU:5
 (XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] 
 iommu_do_pci_domctl+0x2dc/0x740
 (XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256   CONTEXT: 
 hypervisor
 (XEN) [2015-04-11 14:03:31.915] rax: 0008   rbx: 
 0800   rcx: ffebe5ed
 (XEN) [2015-04-11 14:03:31.942] rdx: 0800   rsi: 
    rdi: 830256ef7e38
 (XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98   rsp: 
 830256ef7c08   r8:  deadbeef
 (XEN) [2015-04-11 14:03:31.995] r9:  deadbeef   r10: 
 82d08024e500   r11: 0282
 (XEN) [2015-04-11 14:03:32.022] r12:    r13: 
 0008   r14: 
 (XEN) [2015-04-11 14:03:32.049] r15:    cr0: 
 80050033   cr4: 06f0
 (XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000   cr2: 
 
 (XEN) [2015-04-11 14:03:32.096] ds:    es:    fs:    gs:    
 ss: e010   cs: e008
 (XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=830256ef7c08:
 (XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 
 830256ef7c28 830256ef7c28
 (XEN) [2015-04-11 14:03:32.168]0010  
  
 (XEN) [2015-04-11 14:03:32.195]06f0 7fe3 
 830256eb7790 83025cc6d300
 (XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 
  7fe396bab004
 (XEN) [2015-04-11 14:03:32.249] 0005 
 830256ef7ca8 82d08014900b
 (XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 
 0010 
 (XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 
 82d08018b655 830256ef7d48
 (XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 
 830256ef7d38 82d08012925e
 (XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 
 80022e12c167 
 (XEN) [2015-04-11 14:03:32.384]0002 830256ef7e38 
 0008 80022e12c167
 (XEN) [2015-04-11 14:03:32.411]0003 830256ef7db8 
  7fe396780eb0
 (XEN) [2015-04-11 14:03:32.439]0202  
  7fe396bab004
 (XEN) [2015-04-11 14:03:32.466] 0005 
 830256ef7ef8 82d08010497f
 (XEN) [2015-04-11 

Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-11 Thread Sander Eikelenboom

Friday, April 10, 2015, 8:55:27 PM, you wrote:

 On 10/04/15 11:24, Sander Eikelenboom wrote:
 Hi Andrew,

 Finally got some time to figure this out .. and i have narrowed it down to:
 git://xenbits.xen.org/staging/qemu-upstream-unstable.git
 commit 7665d6ba98e20fb05c420de947c1750fd47e5c07 Xen: Use the ioreq-server 
 API when available
 A straight revert of this commit prevents the issue from happening.

 The reason i had a hard time figuring this out was:
 - I wasn't aware of this earlier, since git pulling the main xen tree, 
 doesn't 
   auto update the qemu-* trees.

 This has caught me out so many times.  It is very non-obvious behaviour.

 - So i happen to get this when i cloned a fresh tree to try to figure out 
 the 
   other issue i was seeing.
 - After that checking out previous versions of the main xen tree didn't 
 resolve 
   this new issue, because the qemu tree doesn't get auto updated and is set 
   master.
 - Cloning a xen-stable-4.5.0 made it go away .. because that has a specific 
   git://xenbits.xen.org/staging/qemu-upstream-unstable.git tag which is not 
   master.

 *sigh* 

 This is tested with xen main tree at last commit 
 3a28f760508fb35c430edac17a9efde5aff6d1d5
 (normal xen-unstable, not the staging branch)

 Ok so i have added some extra debug info (see attached diff) and this is the 
 output when it crashes due to something the commit above triggered, the 
 level is out of bounds and the pfn looks fishy too.
 Complete serial log from both bad and good (specific commit reverted) are 
 attached.

 Just to confirm, you are positively identifying a qemu changeset as
 causing this crash?

 If so, the qemu change has discovered a pre-existing issue in the
 toolstack pci-passthrough interface.  Whatever qemu is or isn't doing,
 it should not be able to cause a crash like this.

 With this in mind, I need to brush up on my AMD-Vi details.

 In the meantime, can you run with the following patch to identify what
 is going on, domctl wise?  I assume it is the assign_device which is
 failing, but it will be nice to observe the differences between the
 working and failing case, which might offer a hint.

Hrrm with your patch i end up with a fatal page fault in iommu_do_pci_domctl:

(XEN) [2015-04-11 14:03:31.833] [ Xen-4.6-unstable  x86_64  debug=y  
Tainted:C ]
(XEN) [2015-04-11 14:03:31.857] CPU:5
(XEN) [2015-04-11 14:03:31.868] RIP:e008:[82d08014c52c] 
iommu_do_pci_domctl+0x2dc/0x740
(XEN) [2015-04-11 14:03:31.894] RFLAGS: 00010256   CONTEXT: hypervisor
(XEN) [2015-04-11 14:03:31.915] rax: 0008   rbx: 0800   
rcx: ffebe5ed
(XEN) [2015-04-11 14:03:31.942] rdx: 0800   rsi:    
rdi: 830256ef7e38
(XEN) [2015-04-11 14:03:31.968] rbp: 830256ef7c98   rsp: 830256ef7c08   
r8:  deadbeef
(XEN) [2015-04-11 14:03:31.995] r9:  deadbeef   r10: 82d08024e500   
r11: 0282
(XEN) [2015-04-11 14:03:32.022] r12:    r13: 0008   
r14: 
(XEN) [2015-04-11 14:03:32.049] r15:    cr0: 80050033   
cr4: 06f0
(XEN) [2015-04-11 14:03:32.076] cr3: 0002336a6000   cr2: 
(XEN) [2015-04-11 14:03:32.096] ds:    es:    fs:    gs:    ss: 
e010   cs: e008
(XEN) [2015-04-11 14:03:32.121] Xen stack trace from rsp=830256ef7c08:
(XEN) [2015-04-11 14:03:32.141]830256ef7c78 82d08012c178 
830256ef7c28 830256ef7c28
(XEN) [2015-04-11 14:03:32.168]0010  
 
(XEN) [2015-04-11 14:03:32.195]06f0 7fe3 
830256eb7790 83025cc6d300
(XEN) [2015-04-11 14:03:32.222]82d080330c60 7fe396bab004 
 7fe396bab004
(XEN) [2015-04-11 14:03:32.249] 0005 
830256ef7ca8 82d08014900b
(XEN) [2015-04-11 14:03:32.276]830256ef7d98 82d080161f2d 
0010 
(XEN) [2015-04-11 14:03:32.303] 830256ef7ce8 
82d08018b655 830256ef7d48
(XEN) [2015-04-11 14:03:32.330]830256ef7cf8 82d08018b66a 
830256ef7d38 82d08012925e
(XEN) [2015-04-11 14:03:32.357]830256efc068 00080001 
80022e12c167 
(XEN) [2015-04-11 14:03:32.384]0002 830256ef7e38 
0008 80022e12c167
(XEN) [2015-04-11 14:03:32.411]0003 830256ef7db8 
 7fe396780eb0
(XEN) [2015-04-11 14:03:32.439]0202  
 7fe396bab004
(XEN) [2015-04-11 14:03:32.466] 0005 
830256ef7ef8 82d08010497f
(XEN) [2015-04-11 14:03:32.493]0001 0011 
80022e12c167 88001f7ecc00
(XEN) [2015-04-11 14:03:32.520]7fe396780eb0 88001c849508 
000e0007 8105594a
(XEN) [2015-04-11 

Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-04-01 Thread Sander Eikelenboom

Wednesday, April 1, 2015, 1:38:34 AM, you wrote:

 On 31/03/2015 22:11, Sander Eikelenboom wrote:
 Hi all,

 I just tested xen-unstable staging (changeset: git:0522407-dirty) 

 with revert of commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622
 (due to an already reported but not yet resolved issue)

 and build with qemu xen from 
 git://xenbits.xen.org/staging/qemu-upstream-unstable.git
 (to include the pci command register patch from Jan)


 and now came across this new splat when starting an HVM with PCI passtrhough:

 Wow - you are getting all the fun bugs at the moment!

 Nothing has changed in the AMD IOMMU driver for a while, but the
 BUG_ON() is particularly unhelpful at identifying what went wrong.

 As a first pass triage, can you rerun with

 diff --git a/xen/drivers/passthrough/amd/iommu_map.c
 b/xen/drivers/passthrough/amd/iommu_map.c
 index 495ff5c..f15c324 100644
 --- a/xen/drivers/passthrough/amd/iommu_map.c
 +++ b/xen/drivers/passthrough/amd/iommu_map.c
 @@ -451,8 +451,9 @@ static int iommu_pde_from_gfn(struct domain *d,
 unsigned long pfn,
  table = hd-arch.root_table;
  level = hd-arch.paging_mode;

 -BUG_ON( table == NULL || level  IOMMU_PAGING_MODE_LEVEL_1 ||
-level  IOMMU_PAGING_MODE_LEVEL_6 );
 +BUG_ON(table == NULL);
 +BUG_ON(level  IOMMU_PAGING_MODE_LEVEL_1);
+BUG_ON(level  IOMMU_PAGING_MODE_LEVEL_6);

  next_table_mfn = page_to_mfn(table);

 which will help identify which of the conditions is failing.

 Can you please also provide the full serial log, including iommu=debug?

 ~Andrew


Hmm this was very weird .. i tried to get back to a previously working config 
(kernel + xen version) but to no avail .. ran memtest .. removed the cmos 
battery .. warm boots .. cold boots .. nothing seemed to help, always this same
crash when starting the HVM guest with pci passthrough. PV guests with pci 
passthrough worked fine though .. 

Now finally it's working again .. by going back to 4.5.0 release .. and doing a 
baremetal linux boot in between. No idea what helped .. but it was a very 
strange day. But i need the box for the foreseeable future. 

Will see late next week if i have time (and the guts) to try again :-)

--
Sander


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-03-31 Thread Sander Eikelenboom
Hi all,

I just tested xen-unstable staging (changeset: git:0522407-dirty) 

with revert of commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622
(due to an already reported but not yet resolved issue)

and build with qemu xen from 
git://xenbits.xen.org/staging/qemu-upstream-unstable.git
(to include the pci command register patch from Jan)


and now came across this new splat when starting an HVM with PCI passtrhough:

(XEN) [2015-03-31 20:58:20.710] io.c:429: d17: bind: m_gsi=37 g_gsi=36 
dev=00.00.5 intx=0
(XEN) [2015-03-31 20:58:21.100] Xen BUG at iommu_map.c:455
(XEN) [2015-03-31 20:58:21.100] [ Xen-4.6-unstable  x86_64  debug=y  Not 
tainted ]
(XEN) [2015-03-31 20:58:21.100] CPU:0
(XEN) [2015-03-31 20:58:21.100] RIP:e008:[82d080155bb1] 
iommu_pde_from_gfn+0x38/0x430
(XEN) [2015-03-31 20:58:21.100] RFLAGS: 00010202   CONTEXT: hypervisor
(XEN) [2015-03-31 20:58:21.100] rax: 0008   rbx: 0003   
rcx: 82c000802000
(XEN) [2015-03-31 20:58:21.100] rdx: 82e007d56740   rsi:    
rdi: 8305167dd000
(XEN) [2015-03-31 20:58:21.100] rbp: 82d0802efad8   rsp: 82d0802efa78   
r8:  83054eb755b0
(XEN) [2015-03-31 20:58:21.100] r9:  0003   r10: 0200   
r11: 82d0802fc0d0
(XEN) [2015-03-31 20:58:21.100] r12: 82e0075527e0   r13: 05e9   
r14: 
(XEN) [2015-03-31 20:58:21.100] r15: 7d20   cr0: 80050033   
cr4: 06f0
(XEN) [2015-03-31 20:58:21.100] cr3: 00051a197000   cr2: 7efdd5ee1d48
(XEN) [2015-03-31 20:58:21.100] ds:    es:    fs:    gs:    ss: 
e010   cs: e008
(XEN) [2015-03-31 20:58:21.100] Xen stack trace from rsp=82d0802efa78:
(XEN) [2015-03-31 20:58:21.100]8305167dd000 82d0802efb30 
 8305167dd190
(XEN) [2015-03-31 20:58:21.100]0286 82e007d56740 
82e007552800 0003
(XEN) [2015-03-31 20:58:21.100]82e0075527e0 05e9 
 7d20
(XEN) [2015-03-31 20:58:21.100]82d0802efb98 82d0801560b6 
7d2f7fd104e7 0001802351d2
(XEN) [2015-03-31 20:58:21.100]003aa93f  
00020001 8305167dd938
(XEN) [2015-03-31 20:58:21.100]82004ff8 8305167dd000 
0020941c 
(XEN) [2015-03-31 20:58:21.100]  
 
(XEN) [2015-03-31 20:58:21.100]  
8305167dd938 8305167dd000
(XEN) [2015-03-31 20:58:21.100]82e0075527e0 05e9 
 7d20
(XEN) [2015-03-31 20:58:21.100]82d0802efbf8 82d08015a54d 
 8305167dd020
(XEN) [2015-03-31 20:58:21.100]82d0802e8000 003aa93f 
82d0802efbf8 
(XEN) [2015-03-31 20:58:21.100]8305167dd000 0800 
8305167dd000 
(XEN) [2015-03-31 20:58:21.100]82d0802efc98 82d08014c6c1 
82d0802efc78 82d08012c298
(XEN) [2015-03-31 20:58:21.100]0286 82d0802efc28 
0020 
(XEN) [2015-03-31 20:58:21.100]  
0008 7f6525ed2004
(XEN) [2015-03-31 20:58:21.100]83054eb1ab60 83055cc6c300 
0282 7f6525ed2004
(XEN) [2015-03-31 20:58:21.100]8305167dd000 7f6525ed2004 
8305167dd000 0005
(XEN) [2015-03-31 20:58:21.100]82d0802efca8 82d08014908b 
82d0802efd98 82d080161f2d
(XEN) [2015-03-31 20:58:21.100]0020  
0005 0001
(XEN) [2015-03-31 20:58:21.100]82d080331bb8 0001 
82d0802efde8 82d080120d00
(XEN) [2015-03-31 20:58:21.100] Xen call trace:
(XEN) [2015-03-31 20:58:21.100][82d080155bb1] 
iommu_pde_from_gfn+0x38/0x430
(XEN) [2015-03-31 20:58:21.100][82d0801560b6] 
amd_iommu_map_page+0x10d/0x4e6
(XEN) [2015-03-31 20:58:21.100][82d08015a54d] 
arch_iommu_populate_page_table+0x179/0x4d8
(XEN) [2015-03-31 20:58:21.100][82d08014c6c1] 
iommu_do_pci_domctl+0x395/0x604
(XEN) [2015-03-31 20:58:21.100][82d08014908b] 
iommu_do_domctl+0x17/0x1a
(XEN) [2015-03-31 20:58:21.100][82d080161f2d] 
arch_do_domctl+0x2469/0x26e1
(XEN) [2015-03-31 20:58:21.100][82d080104a6f] do_domctl+0x1a1f/0x1d60
(XEN) [2015-03-31 20:58:21.100][82d080234c9b] syscall_enter+0xeb/0x145
(XEN) [2015-03-31 20:58:21.100] 
(XEN) [2015-03-31 20:58:22.167] 
(XEN) [2015-03-31 20:58:22.176] 
(XEN) [2015-03-31 20:58:22.195] Panic on CPU 0:
(XEN) [2015-03-31 20:58:22.208] Xen BUG at iommu_map.c:455
(XEN) [2015-03-31 20:58:22.223] 
(XEN) [2015-03-31 20:58:22.243] 
(XEN) [2015-03-31 20:58:22.252] Manual reset required ('noreboot' specified)


Haven't tried 

Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-03-31 Thread Andrew Cooper
On 31/03/2015 22:11, Sander Eikelenboom wrote:
 Hi all,

 I just tested xen-unstable staging (changeset: git:0522407-dirty) 

 with revert of commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622
 (due to an already reported but not yet resolved issue)

 and build with qemu xen from 
 git://xenbits.xen.org/staging/qemu-upstream-unstable.git
 (to include the pci command register patch from Jan)


 and now came across this new splat when starting an HVM with PCI passtrhough:

Wow - you are getting all the fun bugs at the moment!

Nothing has changed in the AMD IOMMU driver for a while, but the
BUG_ON() is particularly unhelpful at identifying what went wrong.

As a first pass triage, can you rerun with

diff --git a/xen/drivers/passthrough/amd/iommu_map.c
b/xen/drivers/passthrough/amd/iommu_map.c
index 495ff5c..f15c324 100644
--- a/xen/drivers/passthrough/amd/iommu_map.c
+++ b/xen/drivers/passthrough/amd/iommu_map.c
@@ -451,8 +451,9 @@ static int iommu_pde_from_gfn(struct domain *d,
unsigned long pfn,
 table = hd-arch.root_table;
 level = hd-arch.paging_mode;

-BUG_ON( table == NULL || level  IOMMU_PAGING_MODE_LEVEL_1 ||
-level  IOMMU_PAGING_MODE_LEVEL_6 );
+BUG_ON(table == NULL);
+BUG_ON(level  IOMMU_PAGING_MODE_LEVEL_1);
+BUG_ON(level  IOMMU_PAGING_MODE_LEVEL_6);

 next_table_mfn = page_to_mfn(table);

which will help identify which of the conditions is failing.

Can you please also provide the full serial log, including iommu=debug?

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen-unstable-staging: Xen BUG at iommu_map.c:455

2015-03-31 Thread Sander Eikelenboom

Wednesday, April 1, 2015, 1:38:34 AM, you wrote:

 On 31/03/2015 22:11, Sander Eikelenboom wrote:
 Hi all,

 I just tested xen-unstable staging (changeset: git:0522407-dirty) 

 with revert of commit 1aeb1156fa43fe2cd2b5003995b20466cd19a622
 (due to an already reported but not yet resolved issue)

 and build with qemu xen from 
 git://xenbits.xen.org/staging/qemu-upstream-unstable.git
 (to include the pci command register patch from Jan)


 and now came across this new splat when starting an HVM with PCI passtrhough:

 Wow - you are getting all the fun bugs at the moment!

Hrmm i'm not so sure at the moment .. could also be a stale tree or is it just
that it's april 1st ..
*sigh* 
tried to git reset --hard to a known good changeset .. but it still seems
to fail, even with cold boot. 

So sorry for the noise and please ignore for the moment while i'm trying to
figure out what is fooling me :-)

--
sander



 Nothing has changed in the AMD IOMMU driver for a while, but the
 BUG_ON() is particularly unhelpful at identifying what went wrong.

 As a first pass triage, can you rerun with

 diff --git a/xen/drivers/passthrough/amd/iommu_map.c
 b/xen/drivers/passthrough/amd/iommu_map.c
 index 495ff5c..f15c324 100644
 --- a/xen/drivers/passthrough/amd/iommu_map.c
 +++ b/xen/drivers/passthrough/amd/iommu_map.c
 @@ -451,8 +451,9 @@ static int iommu_pde_from_gfn(struct domain *d,
 unsigned long pfn,
  table = hd-arch.root_table;
  level = hd-arch.paging_mode;

 -BUG_ON( table == NULL || level  IOMMU_PAGING_MODE_LEVEL_1 ||
-level  IOMMU_PAGING_MODE_LEVEL_6 );
 +BUG_ON(table == NULL);
 +BUG_ON(level  IOMMU_PAGING_MODE_LEVEL_1);
+BUG_ON(level  IOMMU_PAGING_MODE_LEVEL_6);

  next_table_mfn = page_to_mfn(table);

 which will help identify which of the conditions is failing.

 Can you please also provide the full serial log, including iommu=debug?

 ~Andrew



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel