RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2009-01-04 Thread Zhang, Xiantao
Hi, Isaku  All
The attached patch should fix the weird issue.  In upstream, we also find 
some other weird issues, for example, we can't boot dom0 on some platforms, and 
dom0 may have different behavior with different initrds.  After debug, I found 
it should be caused by incorrect setting for pirq_needs_eoi page.  There are 
two main issues found during the debug: 
1.  the related two hypercalls are not enabled in the correct way, so dom0 and 
hypervisor doesn't have the agreement on which pirq needs EOI. 
2.  the page is not really linked to bss section even if this is the must, so 
kernel deems it as memory cache and uses it for many ways, and finally leads to 
varid issues. 
Thanks 
Xiantao



You, Yongkang wrote:
 I tried 2048M (and other value), but I wasn't reproduce it.
 Hmm, does it reproduce with dom0_mem=2048M on all boxes which you
 tested?
 
 Isaku/All,
 
 This issue is really very hard to locate. Now I am a little
 suspecting it is related with building process, as if changing
 building method, this issue is gone too.  
 
 1, It doesn't happen to all machines. But it can be stably reproduce
 in our nightly test machine with same binary. 2, When system
 crashing, dom0_mem is set to 2048M. And if using other memory size,
 this issue disappeared too. 3, It seems happened between dom0
 changeset 743~753, as it workds if we use old built Dom0 kernel + new
 Xen. And the old nightly testing doesn't have issue. 4, When I try to
 do regression testing between 743~753, I found different build method
 might cause crashing and non-crashing.
 
 In our default building process, as stubdomain is not supported in
 IA64, so we removed install-stubdom and dist-stubdom from install:
 and dist: lines in main Makefile. It has been changed  more than 2
 months. The real compiling command is make -j3 xyz_file. And the
 crashing issue is related with building process.
 
 When I do regression testing, sometimes I didn't change Makefile, but
 still use make -j3. Then the crashing is gone. 
 
 I am not sure if my suspection is possible, as it still need more
 trying. Compiling Dom0 is not easy like Xen. It is costing. I would
 try to do more, but maybe not so quick, as many another things need
 to do at the same time. If the default compilation is okay, do you
 think it is worthy to do more investigating?
 
 Any suggestion will be much appreciated.
 
 Best Regards,
 Yongkang You
 
 On Tuesday, December 16, 2008 10:22 AM, Isaku Yamahata wrote:
 
 On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote:
 On Monday, December 08, 2008 2:10 PM, Isaku Yamahata wrote:
 
 On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote:
 Isaku Yamahata wrote:
 On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
 Hi Isaku,
 We re-get the detail information from serial port, please
 see below. Two comments add:
 
 Thank you.
 
 
 1. We can be sure the Cset#18832 works well on the same
 tiger4 machine. But we did not do regression test between 18832
 and this 18860. 
 2. It is strange that on another Tiger4 box, dom0 will NOT
 crash. Do you have any idea from the serial log? Thanks!
 
 I haven't hit this crash. And Kuwamura-san's test seems that
 he haven't hit it either. Kuwamura-san, is it correct?
 Hmm... it seems to depend on hw configuration?
 I'm inclined to suspect masking/unmasking interruption race.
 event channel issues? But that's just only my very vague guess.
 
 The difference between 18832 and 18860 means the merging
 xen-unstable into xen-ia64-unstable. Looking the log, I suspect
 linux-2.6.18-xen instead of xen.
 Could you provide the linux c/s which corresponds to 18832 and
 18860?
 
 
 Hi Isaku,
 Yes, some of our machines do not crash. I am afraid there may
 be some potential issue. By testing 18832, we use linux#742.
 While 18860 uses linux#753. Thanks!
 
 Thank you. Taking rough look at them those change sets doesn't seem
 culprit. I agree with you that this may indicate some potential
 bugs...
 
 Hi All,
 
 This bug is stably reproduced, if providing dom0_mem=2048M in
 append option. And if setting dom0_mem to 1024M or 4096M, the
 crashing doesn't happen. 
 
 We tried #18869 Xen + #742 Dom0, system is okay. So the problem
 might be in Linux tree between #742~#753
 
 I tried 2048M (and other value), but I wasn't reproduce it.
 Hmm, does it reproduce with dom0_mem=2048M on all boxes which you
 tested? 
 
 thanks,
 
 ___
 Xen-ia64-devel mailing list
 Xen-ia64-devel@lists.xensource.com
 http://lists.xensource.com/xen-ia64-devel



fix_pirq_eoi_page.patch
Description: fix_pirq_eoi_page.patch
___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel

Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2009-01-04 Thread Isaku Yamahata
Hi. Good catch. Some comments.
I attached two patches to fix, could you try them?

- bss.page_aligned.
  Where is the section used?
  grep didn't tell me. Surely x86 uses .bss.page_aligned in
  linux/arch/[i386, x86_64]/kernel/head[-xen].S,
  but no files unuder linux/arch/ia64/ don't use it.

- ia64_fast_eoi.
  I suppose ia64_fast_eoi is used for optimization instead of
  PHYSDEVOP_eoi. I'm not sure how much improvement it provides, though.
  Anyway ia64_fast_eoi hypercall implementation should also be updated
  which I overlooked when I added PHYSDEVOP_pirq_eoi_gmfn support.

thanks,

On Sun, Jan 04, 2009 at 06:05:07PM +0800, Zhang, Xiantao wrote:
 Hi, Isaku  All
 The attached patch should fix the weird issue.  In upstream, we also find 
 some other weird issues, for example, we can't boot dom0 on some platforms, 
 and dom0 may have different behavior with different initrds.  After debug, I 
 found it should be caused by incorrect setting for pirq_needs_eoi page.  
 There are two main issues found during the debug: 
 1.  the related two hypercalls are not enabled in the correct way, so dom0 
 and hypervisor doesn't have the agreement on which pirq needs EOI. 
 2.  the page is not really linked to bss section even if this is the must, so 
 kernel deems it as memory cache and uses it for many ways, and finally leads 
 to varid issues. 
 Thanks 
 Xiantao
 
 
 
 You, Yongkang wrote:
  I tried 2048M (and other value), but I wasn't reproduce it.
  Hmm, does it reproduce with dom0_mem=2048M on all boxes which you
  tested?
  
  Isaku/All,
  
  This issue is really very hard to locate. Now I am a little
  suspecting it is related with building process, as if changing
  building method, this issue is gone too.  
  
  1, It doesn't happen to all machines. But it can be stably reproduce
  in our nightly test machine with same binary. 2, When system
  crashing, dom0_mem is set to 2048M. And if using other memory size,
  this issue disappeared too. 3, It seems happened between dom0
  changeset 743~753, as it workds if we use old built Dom0 kernel + new
  Xen. And the old nightly testing doesn't have issue. 4, When I try to
  do regression testing between 743~753, I found different build method
  might cause crashing and non-crashing.
  
  In our default building process, as stubdomain is not supported in
  IA64, so we removed install-stubdom and dist-stubdom from install:
  and dist: lines in main Makefile. It has been changed  more than 2
  months. The real compiling command is make -j3 xyz_file. And the
  crashing issue is related with building process.
  
  When I do regression testing, sometimes I didn't change Makefile, but
  still use make -j3. Then the crashing is gone. 
  
  I am not sure if my suspection is possible, as it still need more
  trying. Compiling Dom0 is not easy like Xen. It is costing. I would
  try to do more, but maybe not so quick, as many another things need
  to do at the same time. If the default compilation is okay, do you
  think it is worthy to do more investigating?
  
  Any suggestion will be much appreciated.
  
  Best Regards,
  Yongkang You
  
  On Tuesday, December 16, 2008 10:22 AM, Isaku Yamahata wrote:
  
  On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote:
  On Monday, December 08, 2008 2:10 PM, Isaku Yamahata wrote:
  
  On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote:
  Isaku Yamahata wrote:
  On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
  Hi Isaku,
  We re-get the detail information from serial port, please
  see below. Two comments add:
  
  Thank you.
  
  
  1. We can be sure the Cset#18832 works well on the same
  tiger4 machine. But we did not do regression test between 18832
  and this 18860. 
  2. It is strange that on another Tiger4 box, dom0 will NOT
  crash. Do you have any idea from the serial log? Thanks!
  
  I haven't hit this crash. And Kuwamura-san's test seems that
  he haven't hit it either. Kuwamura-san, is it correct?
  Hmm... it seems to depend on hw configuration?
  I'm inclined to suspect masking/unmasking interruption race.
  event channel issues? But that's just only my very vague guess.
  
  The difference between 18832 and 18860 means the merging
  xen-unstable into xen-ia64-unstable. Looking the log, I suspect
  linux-2.6.18-xen instead of xen.
  Could you provide the linux c/s which corresponds to 18832 and
  18860?
  
  
  Hi Isaku,
  Yes, some of our machines do not crash. I am afraid there may
  be some potential issue. By testing 18832, we use linux#742.
  While 18860 uses linux#753. Thanks!
  
  Thank you. Taking rough look at them those change sets doesn't seem
  culprit. I agree with you that this may indicate some potential
  bugs...
  
  Hi All,
  
  This bug is stably reproduced, if providing dom0_mem=2048M in
  append option. And if setting dom0_mem to 1024M or 4096M, the
  crashing doesn't happen. 
  
  We tried #18869 Xen + #742 Dom0, system is okay. 

RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2009-01-04 Thread Zhang, Xiantao
Isaku Yamahata wrote:
 Hi. Good catch. Some comments.
 I attached two patches to fix, could you try them?
 
 - bss.page_aligned.
   Where is the section used?
   grep didn't tell me. Surely x86 uses .bss.page_aligned in
   linux/arch/[i386, x86_64]/kernel/head[-xen].S,
   but no files unuder linux/arch/ia64/ don't use it.

You may need to check drivers/xen/core/evtchn.c, the code as following :-)
Xiantao

static int pirq_eoi_does_unmask;
static DECLARE_BITMAP(pirq_needs_eoi, ALIGN(NR_PIRQS, PAGE_SIZE * 8))
__attribute__ ((__section__(.bss.page_aligned), 
__aligned__(PAGE_SIZE)));



 - ia64_fast_eoi.
   I suppose ia64_fast_eoi is used for optimization instead of
   PHYSDEVOP_eoi. I'm not sure how much improvement it provides,
   though. Anyway ia64_fast_eoi hypercall implementation should also
   be updated which I overlooked when I added PHYSDEVOP_pirq_eoi_gmfn
 support. 
 
 thanks,
 
 On Sun, Jan 04, 2009 at 06:05:07PM +0800, Zhang, Xiantao wrote:
 Hi, Isaku  All
 The attached patch should fix the weird issue.  In upstream, we
 also find some other weird issues, for example, we can't boot dom0
 on some platforms, and dom0 may have different behavior with
 different initrds.  After debug, I found it should be caused by
 incorrect setting for pirq_needs_eoi page.  There are two main
 issues found during the debug: 
 1.  the related two hypercalls are not enabled in the correct way,
 so dom0 and hypervisor doesn't have the agreement on which pirq
 needs EOI.  
 2.  the page is not really linked to bss section even if this is the
 must, so kernel deems it as memory cache and uses it for many ways,
 and finally leads to varid issues.  
 Thanks
 Xiantao
 
 
 
 You, Yongkang wrote:
 I tried 2048M (and other value), but I wasn't reproduce it.
 Hmm, does it reproduce with dom0_mem=2048M on all boxes which you
 tested?
 
 Isaku/All,
 
 This issue is really very hard to locate. Now I am a little
 suspecting it is related with building process, as if changing
 building method, this issue is gone too.
 
 1, It doesn't happen to all machines. But it can be stably reproduce
 in our nightly test machine with same binary. 2, When system
 crashing, dom0_mem is set to 2048M. And if using other memory size,
 this issue disappeared too. 3, It seems happened between dom0
 changeset 743~753, as it workds if we use old built Dom0 kernel +
 new Xen. And the old nightly testing doesn't have issue. 4, When I
 try to do regression testing between 743~753, I found different
 build method might cause crashing and non-crashing.
 
 In our default building process, as stubdomain is not supported in
 IA64, so we removed install-stubdom and dist-stubdom from install:
 and dist: lines in main Makefile. It has been changed  more than 2
 months. The real compiling command is make -j3 xyz_file. And the
 crashing issue is related with building process.
 
 When I do regression testing, sometimes I didn't change Makefile,
 but still use make -j3. Then the crashing is gone.
 
 I am not sure if my suspection is possible, as it still need more
 trying. Compiling Dom0 is not easy like Xen. It is costing. I would
 try to do more, but maybe not so quick, as many another things need
 to do at the same time. If the default compilation is okay, do you
 think it is worthy to do more investigating?
 
 Any suggestion will be much appreciated.
 
 Best Regards,
 Yongkang You
 
 On Tuesday, December 16, 2008 10:22 AM, Isaku Yamahata wrote:
 
 On Tue, Dec 09, 2008 at 05:56:25PM +0800, You, Yongkang wrote:
 On Monday, December 08, 2008 2:10 PM, Isaku Yamahata wrote:
 
 On Mon, Dec 08, 2008 at 01:52:38PM +0800, Zhang, Jingke wrote:
 Isaku Yamahata wrote:
 On Mon, Dec 08, 2008 at 11:31:15AM +0800, Zhang, Jingke wrote:
 Hi Isaku,
 We re-get the detail information from serial port, please
 see below. Two comments add:
 
 Thank you.
 
 
 1. We can be sure the Cset#18832 works well on the same
 tiger4 machine. But we did not do regression test between
 18832 and this 18860. 
 2. It is strange that on another Tiger4 box, dom0 will NOT
 crash. Do you have any idea from the serial log? Thanks!
 
 I haven't hit this crash. And Kuwamura-san's test seems that
 he haven't hit it either. Kuwamura-san, is it correct?
 Hmm... it seems to depend on hw configuration?
 I'm inclined to suspect masking/unmasking interruption race.
 event channel issues? But that's just only my very vague guess.
 
 The difference between 18832 and 18860 means the merging
 xen-unstable into xen-ia64-unstable. Looking the log, I suspect
 linux-2.6.18-xen instead of xen.
 Could you provide the linux c/s which corresponds to 18832 and
 18860?
 
 
 Hi Isaku,
 Yes, some of our machines do not crash. I am afraid there
 may be some potential issue. By testing 18832, we use
 linux#742. While 18860 uses linux#753. Thanks!
 
 Thank you. Taking rough look at them those change sets doesn't
 seem culprit. I agree with you that this may indicate some
 potential bugs...
 
 Hi 

Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2009-01-04 Thread Isaku Yamahata
On Mon, Jan 05, 2009 at 12:29:55PM +0800, Zhang, Xiantao wrote:
 Isaku Yamahata wrote:
  Hi. Good catch. Some comments.
  I attached two patches to fix, could you try them?
  
  - bss.page_aligned.
Where is the section used?
grep didn't tell me. Surely x86 uses .bss.page_aligned in
linux/arch/[i386, x86_64]/kernel/head[-xen].S,
but no files unuder linux/arch/ia64/ don't use it.
 
 You may need to check drivers/xen/core/evtchn.c, the code as following :-)
 Xiantao
 
 static int pirq_eoi_does_unmask;
 static DECLARE_BITMAP(pirq_needs_eoi, ALIGN(NR_PIRQS, PAGE_SIZE * 8))
 __attribute__ ((__section__(.bss.page_aligned), 
 __aligned__(PAGE_SIZE)));
 

Ah, that line was deleted by the chageset of 760:0d10be086a78.

-- 
yamahata

___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


RE: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2009-01-04 Thread Zhang, Xiantao

Isaku Yamahata wrote:
 On Mon, Jan 05, 2009 at 12:29:55PM +0800, Zhang, Xiantao wrote:
 Isaku Yamahata wrote:
 Hi. Good catch. Some comments.
 I attached two patches to fix, could you try them?
 
 - bss.page_aligned.
   Where is the section used?
   grep didn't tell me. Surely x86 uses .bss.page_aligned in
   linux/arch/[i386, x86_64]/kernel/head[-xen].S,
   but no files unuder linux/arch/ia64/ don't use it.
 
 You may need to check drivers/xen/core/evtchn.c, the code as
 following :-) 
 Xiantao
 
 static int pirq_eoi_does_unmask;
 static DECLARE_BITMAP(pirq_needs_eoi, ALIGN(NR_PIRQS, PAGE_SIZE * 8))
 __attribute__ ((__section__(.bss.page_aligned),
 __aligned__(PAGE_SIZE))); 
 
 
 Ah, that line was deleted by the chageset of 760:0d10be086a78

Oh, I haven't notice the check-in due to my old codebase. It introduces many 
odd issues to us.   Okay, it is also good to remove it. :)
For adopting fast eoi path,  it should be okay to me.  Please check-in them.  
Xiantao


___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


Re: [Xen-ia64-devel] [Test Report] Xen/IPF Unstable CS#18860 Status --- Dom0 Crash

2009-01-04 Thread Isaku Yamahata
On Mon, Jan 05, 2009 at 01:06:23PM +0800, Zhang, Xiantao wrote:
 Oh, I haven't notice the check-in due to my old codebase. It introduces many 
 odd issues to us.   Okay, it is also good to remove it. :)
 For adopting fast eoi path,  it should be okay to me.  Please check-in them.  

Applied, thanks.
-- 
yamahata

___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


Re: [Xen-ia64-devel] A patch to fix mis-setting ed bit for itlb entry.

2009-01-04 Thread Isaku Yamahata
applied, thanks.

On Sun, Jan 04, 2009 at 03:22:06PM +0800, Zhang, Xiantao wrote:
 Hi, Isaku 
 When debugging  a windows BSOD issue,  we found it is caused by 
 mis-setting pte's ED bit for itlb entry.  For hash vTLB, it uses unified tlb 
 and doesn't differentiate itc and dtc in its implementation, so itlb_miss 
 handler may reference dtlb entry in hash vTLB.  But it may result in issues, 
 because dtlb's ED bit may be different with itlb's setting.  Since the case 
 is very rare, so just purge the corresponding entry in hash vTLB and let 
 guest OS to determin how to set ED bit for itlb mapping once found it. 
 Xiantao
 
 Signed-off-by : Xiantao Zhang xiantao.zh...@intel.com
 
 diff -r e97216802360 xen/arch/ia64/vmx/vtlb.c
 --- a/xen/arch/ia64/vmx/vtlb.c  Fri Dec 12 10:43:39 2008 +0900
 +++ b/xen/arch/ia64/vmx/vtlb.c  Sun Jan 04 10:43:19 2009 +0800
 @@ -678,11 +678,20 @@ thash_data_t *vtlb_lookup(VCPU *v, u64 v
  cch = vtlb_thash(hcb-pta, va, vrr.rrval, tag);
  do {
  if (cch-etag == tag  cch-ps == ps)
 -return cch;
 +goto found;
  cch = cch-next;
  } while(cch);
  }
  return NULL;
 +found:
 +if (is_data == ISIDE_TLB  !cch-ed) {
 +  /*The case is very rare, and it may lead to incorrect setting
 +  for itlb's ed bit! Purge it from hash vTLB and let guest os
 +  determin the ed bit of the itlb entry.*/
 +   vtlb_purge(v, va, ps);
 +   cch = NULL;
 +}
 +return cch;
  }

 ___
 Xen-ia64-devel mailing list
 Xen-ia64-devel@lists.xensource.com
 http://lists.xensource.com/xen-ia64-devel

-- 
yamahata

___
Xen-ia64-devel mailing list
Xen-ia64-devel@lists.xensource.com
http://lists.xensource.com/xen-ia64-devel


Re: [Xen-ia64-devel] [PATCH] Fix some IPF Xen VT-d bugs

2009-01-04 Thread Isaku Yamahata
Hi. Sorry for delayed reply.

On Thu, Dec 25, 2008 at 10:14:09PM +0800, Cui, Dexuan wrote:
 Isaku Yamahata wrote:
  On Wed, Dec 24, 2008 at 01:11:03PM +0800, Cui, Dexuan wrote:
  Isaku Yamahata wrote:
  diff -r 008b68ff6095 xen/arch/ia64/xen/domain.c
  --- a/xen/arch/ia64/xen/domain.c Tue Nov 18 10:33:55 2008 +0900
  +++ b/xen/arch/ia64/xen/domain.c Mon Dec 15 18:41:52 2008 +0800
  @@ -602,10 +602,8 @@ int arch_domain_create(struct domain *d,
   if ((d-arch.mm.pgd = pgd_alloc(d-arch.mm)) == NULL)  
 
  goto fail_nomem; 
  
  -if ( iommu_enabled  (is_hvm_domain(d) || need_iommu(d)) ){
  -if(iommu_domain_init(d) != 0)
  -goto fail_iommu;
  -}
  +if(iommu_domain_init(d) != 0)
  +goto fail_iommu;
  
   /*
* grant_table_create() can't fully initialize grant table for
  domain
  
  Please don't drop is_hvm_domain(d) check.
  At this moment ia64 doesn't support iommu for PV domain because
  Oh, thanks for the reminder. Here I neglected this.
  
  Do you mean this:
  if ( is_hvm_domain(d) )
  if(iommu_domain_init(d) != 0)
  goto fail_iommu;
  This is also not ok since we must ensure iommu_domain_init() is
  invoked for Dom0 -- we need the function invoked to enable DMA
  remapping.  
  
  So how about changing the logic to:
  if ( (d-domain_id == 0) || is_hvm_domain(d) )
  if(iommu_domain_init(d) != 0)
  goto fail_iommu;
  
  If you agree this, I'll post a new patch.
  
  Do you mean if ( d-domain_id == 0 ) clause in
  the function, intel_iommu_domain_init()?
 Yes. 
 
  Is iommu map/unmap for dom0 is necessary?
intel_iommu_domain_init() maps all the pages excect ones xen uses
to dom0. I suppose this is what you want.
 Yes.
 When Dom0 boots up, we assign all the devices to it, so it needs the 1:1 VT-d 
 pagetables mapping.
 
However later pages is mapped/unmapped even for dom0 because
 I suppose you mean the balloon driver and the grant table operations. Correct?

That's right.


need_iommu(dom0) returns true due ot iommu_domain_init(dom0).
Since dom0 is PV, so iommu mapping/unmapping causes race on ia64.
 In the cases of balloon and granttable, the iommu mapping/unmapping would 
 cause race on IA64?
 Sorry, I know few about the lockless p2m table now. I'm trying to understand 
 more.

Yes. That is why the first ia64 VT-d patches doesn't enable VT-d
for PV domains by not calling iommu_domain_init().
On x86 case p2m_lock/unlock() avoids the race, but ia64 doesn't have such
lock.
At this moment, the only HVM domain would be supported.
The issue is dom0 case. I suppose it can be supported by mapping
all the pages except xen pages at boot time and not iommu
mapping/unmapping because those pages are already mapped to dom0
by intel_iommu_domain_init().


Only setting up iommu tables at the dom0 creation is necessary,
 Could you please explain more about the this? I can't get the point.
 
all if ( iommu_enabled  (is_hvm_domain(d) || need_iommu(d)) )
would be if ( iommu_enabled  is_hvm_domain(d)  need_iommu(d))
  ) 
 Am I missing somthing?
 #define need_iommu(d)((d)-need_iommu  !(d)-is_hvm)
 So,
 iommu_enabled  is_hvm_domain(d)  need_iommu(d)
 is undoubtedly false. :-)

Ah sorry. I missed d-is_hvm. Please forget this sentence.


  intel_iommu_domain_init() and dom0 memory size
calc_dom0_size() in xen/arch/ia64/domain.c calculates default dom0
memory size. You should take memory for iommu page table
into account because the memory size for iommu page table wouldn't
be neglectable.
probably iommu_pages = (max phys addr) / PTRS_PER_PTE_4K + (some
spare) where PTRS_PER_PTE_4K = (1  (PAGE_SHIFT_4K - 3))
 Now, in intel_iommu_domain_init(), with respect to iommu mapping, Xen maps 
 all the pages for Dom0 except for the pages used by Xen itself.
 Do you mean xen should only maps the page owned actually by Dom0?  -- for 
 instance, you're saying xen should not map the iommu page tables? -- since in 
 Dom0 normally drivers don't touch iommu pagetables at all, looks the current 
 code  is OK?

No. I meant that calc_dom0_size() should be updated.
It calculates the maximum memory size which can be passed to dom0 safely.
Without dom0_mem_size Xen VMM tries to give dom0 the maximum memory size
which is a common use case.

On the other hand, it isn't uncommon that ia64 machine has several
hundred Giga bytes, so memory size for VT-d table would reach tens or
hundreds megabytes which can't be neglectable compared to xen heap size.
Memory for the VT-d table size should be taken into acount
in calc_dom0_size().


  intel_iommu_domain_init() and sparse memory.
To be honest, I'm not sure how it matters in practice.
On ia64 memory might be located sparsely. So iommu page table
should also sparse instead of [0, max_page] - (xen page).
You want to use efi_memmap_walk() instead of for loop.
 Thanks for pointing this out!
 So my