RE: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
Isaku Yamahata wrote: On Thu, Oct 23, 2008 at 03:10:55PM +0800, Xu, Anthony wrote: The new one, It looks almost okay. The last one. diff -r 02c8733e2d91 xen/arch/ia64/vmx/viosapic.c --- a/xen/arch/ia64/vmx/viosapic.c Wed Oct 22 17:20:15 2008 +0900 +++ b/xen/arch/ia64/vmx/viosapic.c Thu Oct 23 14:48:09 2008 +0800 @@ -121,6 +121,13 @@ redir_num, vector); return; } +if ( iommu_enabled ) +{ +spin_unlock(viosapic-lock); +hvm_dpci_eoi(current-domain, redir_num, viosapic-redirtbl[redir_num]); +spin_lock(viosapic-lock); +} + service_iosapic(viosapic); spin_unlock(viosapic-lock); } Is this unlock/lock sequence okay? I'm asking simply because I'm not sure. viosapic-irr and isr are protected by the lock. And viosapic_update_EOI() updates them atomically. The above unlock/lock seems to break its atomicity. I think it is Okay, One atomical operation in viosapic_update_EOI is divided into two atomical operations. If you get spin_lock again, when returning from hvm_dpci_eoi. There are many code segments in linux kernel. And viosapic-irr and isr is still protected by lock. Anthony I'm not sure it's okay or not. To make sure, it is required to take closer look at viosapic.c. thanks, ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Re: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
Applied, thanks. On Fri, Oct 24, 2008 at 11:02:21AM +0800, Xu, Anthony wrote: Isaku Yamahata wrote: On Thu, Oct 23, 2008 at 03:10:55PM +0800, Xu, Anthony wrote: The new one, It looks almost okay. The last one. diff -r 02c8733e2d91 xen/arch/ia64/vmx/viosapic.c --- a/xen/arch/ia64/vmx/viosapic.c Wed Oct 22 17:20:15 2008 +0900 +++ b/xen/arch/ia64/vmx/viosapic.c Thu Oct 23 14:48:09 2008 +0800 @@ -121,6 +121,13 @@ redir_num, vector); return; } +if ( iommu_enabled ) +{ +spin_unlock(viosapic-lock); +hvm_dpci_eoi(current-domain, redir_num, viosapic-redirtbl[redir_num]); +spin_lock(viosapic-lock); +} + service_iosapic(viosapic); spin_unlock(viosapic-lock); } Is this unlock/lock sequence okay? I'm asking simply because I'm not sure. viosapic-irr and isr are protected by the lock. And viosapic_update_EOI() updates them atomically. The above unlock/lock seems to break its atomicity. I think it is Okay, One atomical operation in viosapic_update_EOI is divided into two atomical operations. If you get spin_lock again, when returning from hvm_dpci_eoi. There are many code segments in linux kernel. And viosapic-irr and isr is still protected by lock. Anthony I'm not sure it's okay or not. To make sure, it is required to take closer look at viosapic.c. thanks, ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel -- yamahata ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Re: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
On Wed, Oct 22, 2008 at 02:11:52PM +0800, Xu, Anthony wrote: Isaku Yamahata wrote: On Wed, Oct 22, 2008 at 01:56:05PM +0800, Xu, Anthony wrote: Yes, it is not SMP-safe there is lock for p2m. Modifying p2m is not a frequent operation, why not add a lock for it? It is frequent to read p2m table. So lockless approach was adopted for scalability. It doesn't make sense to lock around only writer side. If only add write lock for p2m, is there any bad impact/senario? Can you explain more details? Generally lock should protect both readers and writers. So locking around only writers doesn't make sense. -- yamahata ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
Isaku Yamahata wrote: On Wed, Oct 22, 2008 at 02:11:52PM +0800, Xu, Anthony wrote: Isaku Yamahata wrote: On Wed, Oct 22, 2008 at 01:56:05PM +0800, Xu, Anthony wrote: Yes, it is not SMP-safe there is lock for p2m. Modifying p2m is not a frequent operation, why not add a lock for it? It is frequent to read p2m table. So lockless approach was adopted for scalability. It doesn't make sense to lock around only writer side. If only add write lock for p2m, is there any bad impact/senario? Can you explain more details? Generally lock should protect both readers and writers. So locking around only writers doesn't make sense. So you can use read/write lock, multiple reader one writer. Because write is very rare, it will not impact performance, but it makes code in mm.c clear and easy to modify. I think that's why read/write lock exist. Anthony ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Re: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
On Wed, Oct 22, 2008 at 02:30:50PM +0800, Xu, Anthony wrote: Isaku Yamahata wrote: On Wed, Oct 22, 2008 at 02:11:52PM +0800, Xu, Anthony wrote: Isaku Yamahata wrote: On Wed, Oct 22, 2008 at 01:56:05PM +0800, Xu, Anthony wrote: Yes, it is not SMP-safe there is lock for p2m. Modifying p2m is not a frequent operation, why not add a lock for it? It is frequent to read p2m table. So lockless approach was adopted for scalability. It doesn't make sense to lock around only writer side. If only add write lock for p2m, is there any bad impact/senario? Can you explain more details? Generally lock should protect both readers and writers. So locking around only writers doesn't make sense. So you can use read/write lock, multiple reader one writer. Because write is very rare, it will not impact performance, but it makes code in mm.c clear and easy to modify. I think that's why read/write lock exist. Yes, that's quite right. It's another discussion to go for reader/writer lock or to keep the current lockless approach. thanks, -- yamahata ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
Isaku Yamahata wrote: On Wed, Oct 22, 2008 at 02:30:50PM +0800, Xu, Anthony wrote: Isaku Yamahata wrote: On Wed, Oct 22, 2008 at 02:11:52PM +0800, Xu, Anthony wrote: Isaku Yamahata wrote: On Wed, Oct 22, 2008 at 01:56:05PM +0800, Xu, Anthony wrote: Yes, it is not SMP-safe there is lock for p2m. Modifying p2m is not a frequent operation, why not add a lock for it? It is frequent to read p2m table. So lockless approach was adopted for scalability. It doesn't make sense to lock around only writer side. If only add write lock for p2m, is there any bad impact/senario? Can you explain more details? Generally lock should protect both readers and writers. So locking around only writers doesn't make sense. So you can use read/write lock, multiple reader one writer. Because write is very rare, it will not impact performance, but it makes code in mm.c clear and easy to modify. I think that's why read/write lock exist. Yes, that's quite right. It's another discussion to go for reader/writer lock or to keep the current lockless approach. If we want to keep the current lockless approach, we need to use ptep_cmpxchg_rel as less as possible Currently there are four functions using ptep_cmpxchg_rel. assign_domain_page_cmpxchg_rel zap_domain_page_one replace_grant_host_mapping __assign_domain_page And the four functions have similar function. I suggest implementing below code segment as a core function __replace_domain_page(){ old_pte = ptep_xchg_rel(); If(old_pte is INVALID) { } Elseif( old_pte is _PAGE_IO) { } Else if (old_pte is assigned_MMIO_page) { } Else if ( old_pte is normal page) { put_page( old_page) } . We need to fully consider the old_pte type. Other functions are simple wrapper of this core fucntion __replace_domain_page. This way mm.c may be more readable. Thanks, Anthony thanks, ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Re: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
On Wed, Oct 22, 2008 at 02:50:40PM +0800, Xu, Anthony wrote: Isaku Yamahata wrote: On Wed, Oct 22, 2008 at 02:30:50PM +0800, Xu, Anthony wrote: Isaku Yamahata wrote: On Wed, Oct 22, 2008 at 02:11:52PM +0800, Xu, Anthony wrote: Isaku Yamahata wrote: On Wed, Oct 22, 2008 at 01:56:05PM +0800, Xu, Anthony wrote: Yes, it is not SMP-safe there is lock for p2m. Modifying p2m is not a frequent operation, why not add a lock for it? It is frequent to read p2m table. So lockless approach was adopted for scalability. It doesn't make sense to lock around only writer side. If only add write lock for p2m, is there any bad impact/senario? Can you explain more details? Generally lock should protect both readers and writers. So locking around only writers doesn't make sense. So you can use read/write lock, multiple reader one writer. Because write is very rare, it will not impact performance, but it makes code in mm.c clear and easy to modify. I think that's why read/write lock exist. Yes, that's quite right. It's another discussion to go for reader/writer lock or to keep the current lockless approach. If we want to keep the current lockless approach, we need to use ptep_cmpxchg_rel as less as possible Currently there are four functions using ptep_cmpxchg_rel. assign_domain_page_cmpxchg_rel zap_domain_page_one replace_grant_host_mapping __assign_domain_page And the four functions have similar function. I suggest implementing below code segment as a core function __replace_domain_page(){ old_pte = ptep_xchg_rel(); If(old_pte is INVALID) { } Elseif( old_pte is _PAGE_IO) { } Else if (old_pte is assigned_MMIO_page) { } Else if ( old_pte is normal page) { put_page( old_page) } . We need to fully consider the old_pte type. Other functions are simple wrapper of this core fucntion __replace_domain_page. This way mm.c may be more readable. Sounds great. Refactoring mm.c is good itself. I have to admit that mm.c is unnecessarily complicated. -- yamahata ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
The new one, -if (phy_pte.ma != VA_MATTR_NATPAGE) +/* if a device is assigned to a domain through VTD, the MMIO for this + * device needs to retain to UC attribute + */ +if (phy_pte.ma == VA_MATTR_WC) phy_pte.ma = VA_MATTR_WB; Doesn't this break the intention of the c/s 15134:466f71b1e831?a To be honest, I'm not sure. Kyouya or Akio, do you have any comments? This section is not included, need kyouya or akio confirmation. Patches about mm.c is not inculded, I'll send out a separate patch. Thanks, Anthony Isaku Yamahata wrote: [I added Kyouya and Akio to CC for comments on the hunk in vtlb.c] On Tue, Oct 21, 2008 at 07:38:21PM +0800, Xu, Anthony wrote: other small patches for VTD Signed-off-by: Anthony Xu [EMAIL PROTECTED] This patch is based on #Cset 18673 of xen-devel Thanks, Anthony The patch touches the common codes for both PV and HVM domain without any hvm domain check. Before you told the target was only hvm domain (at this moment). Hmm, what is the exact target the series of patch. (Off course, eventually you may want to support all the cases.) Some comments below. other small patches for VTD Signed-off-by: Anthony Xu [EMAIL PROTECTED] diff -r e1ed3e5cd001 xen/arch/ia64/linux-xen/acpi.c --- a/xen/arch/ia64/linux-xen/acpi.c Tue Oct 21 18:55:22 2008 +0800 +++ b/xen/arch/ia64/linux-xen/acpi.c Tue Oct 21 19:13:45 2008 +0800 @@ -797,6 +797,10 @@ if (acpi_table_parse(ACPI_SIG_FADT, acpi_parse_fadt)) printk(KERN_ERR PREFIX Can't find FADT\n); +#ifdef XEN + acpi_dmar_init(); +#endif + #ifdef CONFIG_SMP if (available_cpus == 0) { printk(KERN_INFO ACPI: Found 0 CPUS; assuming 1\n); diff -r e1ed3e5cd001 xen/arch/ia64/vmx/viosapic.c --- a/xen/arch/ia64/vmx/viosapic.cTue Oct 21 18:55:22 2008 +0800 +++ b/xen/arch/ia64/vmx/viosapic.cTue Oct 21 19:13:45 2008 +0800 @@ -121,6 +121,13 @@ redir_num, vector); return; } +if ( iommu_enabled ) +{ +spin_unlock(viosapic-lock); +hvm_dpci_eoi(current-domain, redir_num, viosapic-redirtbl[redir_num]); +spin_lock(viosapic-lock); +} + service_iosapic(viosapic); spin_unlock(viosapic-lock); } diff -r e1ed3e5cd001 xen/arch/ia64/vmx/vmx_fault.c --- a/xen/arch/ia64/vmx/vmx_fault.c Tue Oct 21 18:55:22 2008 +0800 +++ b/xen/arch/ia64/vmx/vmx_fault.c Tue Oct 21 19:13:45 2008 +0800 @@ -54,6 +54,7 @@ #include asm/shadow.h #include asm/sioemu.h #include public/arch-ia64/sioemu.h +#include xen/hvm/irq.h /* reset all PSR field to 0, except up,mfl,mfh,pk,dt,rt,mc,it */ #define INITIAL_PSR_VALUE_AT_INTERRUPTION 0x001808028034 @@ -306,6 +307,7 @@ viosapic_set_irq(d, callback_irq, 0); } } +hvm_dirq_assist(v); } rmb(); diff -r e1ed3e5cd001 xen/arch/ia64/vmx/vtlb.c --- a/xen/arch/ia64/vmx/vtlb.cTue Oct 21 18:55:22 2008 +0800 +++ b/xen/arch/ia64/vmx/vtlb.cTue Oct 21 19:13:45 2008 +0800 @@ -522,7 +522,10 @@ * which is required by vga acceleration since qemu maps shared * vram buffer with WB. */ -if (phy_pte.ma != VA_MATTR_NATPAGE) +/* if a device is assigned to a domain through VTD, the MMIO for this + * device needs to retain to UC attribute + */ +if (phy_pte.ma == VA_MATTR_WC) phy_pte.ma = VA_MATTR_WB; Doesn't this break the intention of the c/s 15134:466f71b1e831?a To be honest, I'm not sure. Kyouya or Akio, do you have any comments? maddr = ((maddr _PAGE_PPN_MASK) PAGE_MASK) | (paddr ~PAGE_MASK); diff -r e1ed3e5cd001 xen/arch/ia64/xen/domain.c --- a/xen/arch/ia64/xen/domain.c Tue Oct 21 18:55:22 2008 +0800 +++ b/xen/arch/ia64/xen/domain.c Tue Oct 21 19:13:45 2008 +0800 @@ -569,6 +569,7 @@ if (is_idle_domain(d)) return 0; + INIT_LIST_HEAD(d-arch.pdev_list); foreign_p2m_init(d); #ifdef CONFIG_XEN_IA64_PERVCPU_VHPT d-arch.has_pervcpu_vhpt = opt_pervcpu_vhpt; This hunk shoundn't be included in this patch. @@ -601,6 +602,9 @@ if ((d-arch.mm.pgd = pgd_alloc(d-arch.mm)) == NULL) goto fail_nomem; + if(iommu_domain_init(d) != 0) + goto fail_iommu; + /* * grant_table_create() can't fully initialize grant table for domain * because it is called before arch_domain_create(). @@ -617,6 +621,8 @@ dprintk(XENLOG_DEBUG, arch_domain_create: domain=%p\n, d); return 0; +fail_iommu: + iommu_domain_destroy(d); fail_nomem: tlb_track_destroy(d); fail_nomem1: @@ -635,6 +641,10 @@ if (d-shared_info != NULL) free_xenheap_pages(d-shared_info, get_order_from_shift(XSI_SHIFT)); + + pci_release_devices(d); + if( !is_idle_domain(d)) + iommu_domain_destroy(d); tlb_track_destroy(d); diff
RE: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
Hi Anthony, Xu, Anthony writes: -if (phy_pte.ma != VA_MATTR_NATPAGE) +/* if a device is assigned to a domain through VTD, the MMIO for this + * device needs to retain to UC attribute + */ +if (phy_pte.ma == VA_MATTR_WC) phy_pte.ma = VA_MATTR_WB; Doesn't this break the intention of the c/s 15134:466f71b1e831?a To be honest, I'm not sure. Kyouya or Akio, do you have any comments? This section is not included, need kyouya or akio confirmation. I applied only this hunk and tried invoking HVMs (windows2003 and RHEL5.1). In both of them, the system silently rebooted. The following Machine Check occurred. BUS Check Info [0] Transaction size: 0, Internal Bus Error:, Type: 1 (Partial read), Severity: 0, Hierarchy: 0, Status information: 0 (Unknown/unclassified) CPUID Regs: 0x49656e69756e6547 0x6c65746e 0 0x2504 I have no idea what's going on, though my machine might be ill-conditioned. Have you tested it? -- Kouya ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Re: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
On Wed, Oct 22, 2008 at 05:35:36PM +0800, Xu, Anthony wrote: The new one, -if (phy_pte.ma != VA_MATTR_NATPAGE) +/* if a device is assigned to a domain through VTD, the MMIO for this + * device needs to retain to UC attribute + */ +if (phy_pte.ma == VA_MATTR_WC) phy_pte.ma = VA_MATTR_WB; Doesn't this break the intention of the c/s 15134:466f71b1e831?a To be honest, I'm not sure. Kyouya or Akio, do you have any comments? This section is not included, need kyouya or akio confirmation. Patches about mm.c is not inculded, I'll send out a separate patch. Sounds good. the stuff in mm.c seems very tough. However the following patch still touches mm.c. Did you forget to remove it accidently? I had given it consideration a bit. I suppose the lockless implementation is possible. In fact tlb insert case is handled by retry using the p2m entry as change indicator. See ia64_do_page_fault(), vcpu_itc_i() and vcpu_itc_d(). iommu case could be handled similary. diff -r 02c8733e2d91 xen/arch/ia64/vmx/viosapic.c --- a/xen/arch/ia64/vmx/viosapic.cWed Oct 22 17:20:15 2008 +0900 +++ b/xen/arch/ia64/vmx/viosapic.cWed Oct 22 17:08:32 2008 +0800 @@ -121,6 +121,13 @@ redir_num, vector); return; } +if ( iommu_enabled ) +{ +spin_unlock(viosapic-lock); +hvm_dpci_eoi(current-domain, redir_num, viosapic-redirtbl[redir_num]); +spin_lock(viosapic-lock); +} + service_iosapic(viosapic); spin_unlock(viosapic-lock); } viosapic-isr and irr must handled atomically. So unlocking and locking again breaks the requirement. (I haven't looked the viosapic code very closely, though. So I may be wrong.) diff -r 02c8733e2d91 xen/arch/ia64/xen/mm.c --- a/xen/arch/ia64/xen/mm.c Wed Oct 22 17:20:15 2008 +0900 +++ b/xen/arch/ia64/xen/mm.c Wed Oct 22 17:08:32 2008 +0800 @@ -1427,6 +1427,8 @@ if (mfn == INVALID_MFN) { // clear pte old_pte = ptep_get_and_clear(mm, mpaddr, pte); +if(!pte_mem(old_pte)) +return; mfn = pte_pfn(old_pte); } else { unsigned long old_arflags; @@ -1463,6 +1465,13 @@ perfc_incr(zap_domain_page_one); if(!mfn_valid(mfn)) return; + +{ +int i, j; +j = 1 (PAGE_SHIFT-PAGE_SHIFT_4K); +for(i = 0 ; i j; i++) +iommu_unmap_page(d, (mpaddrPAGE_SHIFT)*j + i); +} page = mfn_to_page(mfn); BUG_ON((page-count_info PGC_count_mask) == 0); @@ -2844,10 +2853,16 @@ __guest_physmap_add_page(struct domain *d, unsigned long gpfn, unsigned long mfn) { +int i, j; + set_gpfn_from_mfn(mfn, gpfn); smp_mb(); assign_domain_page_replace(d, gpfn PAGE_SHIFT, mfn, ASSIGN_writable | ASSIGN_pgc_allocated); +j = 1 (PAGE_SHIFT-PAGE_SHIFT_4K); +for(i = 0 ; i j; i++) +iommu_map_page(d, gpfn*j + i, mfn*j + i); + } int The same loop logic. Introducing a helper function would help? -- yamahata ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Re: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
On Thu, Oct 23, 2008 at 10:03:55AM +0800, Xu, Anthony wrote: Isaku Yamahata wrote: On Wed, Oct 22, 2008 at 05:35:36PM +0800, Xu, Anthony wrote: The new one, -if (phy_pte.ma != VA_MATTR_NATPAGE) +/* if a device is assigned to a domain through VTD, the MMIO for this + * device needs to retain to UC attribute + */ +if (phy_pte.ma == VA_MATTR_WC) phy_pte.ma = VA_MATTR_WB; Doesn't this break the intention of the c/s 15134:466f71b1e831?a To be honest, I'm not sure. Kyouya or Akio, do you have any comments? This section is not included, need kyouya or akio confirmation. Patches about mm.c is not inculded, I'll send out a separate patch. Sounds good. the stuff in mm.c seems very tough. However the following patch still touches mm.c. Did you forget to remove it accidently? I didn't remove all mm.c small patches, I just removed the difficult part, which is related to atomic operation Please, check in this patch first, I had tested it by booting linux guest. There is a race. guest_physmap_{add, remove}_page() can be called for PV domain simultaneously. The p2m table and the iommu table are updated without any lock. So they can be inconsistent with each other. thanks, diff -r 02c8733e2d91 xen/arch/ia64/vmx/viosapic.c --- a/xen/arch/ia64/vmx/viosapic.cWed Oct 22 17:20:15 2008 +0900 +++ b/xen/arch/ia64/vmx/viosapic.cWed Oct 22 17:08:32 2008 +0800 @@ -121,6 +121,13 @@ redir_num, vector); return; } +if ( iommu_enabled ) +{ +spin_unlock(viosapic-lock); +hvm_dpci_eoi(current-domain, redir_num, viosapic-redirtbl[redir_num]); +spin_lock(viosapic-lock); +} + service_iosapic(viosapic); spin_unlock(viosapic-lock); } viosapic-isr and irr must handled atomically. So unlocking and locking again breaks the requirement. (I haven't looked the viosapic code very closely, though. So I may be wrong.) diff -r 02c8733e2d91 xen/arch/ia64/xen/mm.c --- a/xen/arch/ia64/xen/mm.c Wed Oct 22 17:20:15 2008 +0900 +++ b/xen/arch/ia64/xen/mm.c Wed Oct 22 17:08:32 2008 +0800 @@ -1427,6 +1427,8 @@ if (mfn == INVALID_MFN) { // clear pte old_pte = ptep_get_and_clear(mm, mpaddr, pte); +if(!pte_mem(old_pte)) +return; mfn = pte_pfn(old_pte); } else { unsigned long old_arflags; @@ -1463,6 +1465,13 @@ perfc_incr(zap_domain_page_one); if(!mfn_valid(mfn)) return; + +{ +int i, j; +j = 1 (PAGE_SHIFT-PAGE_SHIFT_4K); +for(i = 0 ; i j; i++) +iommu_unmap_page(d, (mpaddrPAGE_SHIFT)*j + i); +} page = mfn_to_page(mfn); BUG_ON((page-count_info PGC_count_mask) == 0); @@ -2844,10 +2853,16 @@ __guest_physmap_add_page(struct domain *d, unsigned long gpfn, unsigned long mfn) { +int i, j; + set_gpfn_from_mfn(mfn, gpfn); smp_mb(); assign_domain_page_replace(d, gpfn PAGE_SHIFT, mfn, ASSIGN_writable | ASSIGN_pgc_allocated); +j = 1 (PAGE_SHIFT-PAGE_SHIFT_4K); +for(i = 0 ; i j; i++) +iommu_map_page(d, gpfn*j + i, mfn*j + i); + } int The same loop logic. Introducing a helper function would help? -- yamahata ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
Isaku Yamahata wrote: On Thu, Oct 23, 2008 at 10:03:55AM +0800, Xu, Anthony wrote: Isaku Yamahata wrote: On Wed, Oct 22, 2008 at 05:35:36PM +0800, Xu, Anthony wrote: The new one, -if (phy_pte.ma != VA_MATTR_NATPAGE) +/* if a device is assigned to a domain through VTD, the MMIO for this + * device needs to retain to UC attribute + */ +if (phy_pte.ma == VA_MATTR_WC) phy_pte.ma = VA_MATTR_WB; Doesn't this break the intention of the c/s 15134:466f71b1e831?a To be honest, I'm not sure. Kyouya or Akio, do you have any comments? This section is not included, need kyouya or akio confirmation. Patches about mm.c is not inculded, I'll send out a separate patch. Sounds good. the stuff in mm.c seems very tough. However the following patch still touches mm.c. Did you forget to remove it accidently? I didn't remove all mm.c small patches, I just removed the difficult part, which is related to atomic operation Please, check in this patch first, I had tested it by booting linux guest. There is a race. guest_physmap_{add, remove}_page() can be called for PV domain simultaneously. The p2m table and the iommu table are updated without any lock. So they can be inconsistent with each other. Iommu_un/map_page does nothing for PV domain. Why is there a race? Anthony int iommu_map_page(struct domain *d, unsigned long gfn, unsigned long mfn) { struct hvm_iommu *hd = domain_hvm_iommu(d); if ( !iommu_enabled || !hd-platform_ops ) return 0; diff -r 02c8733e2d91 xen/arch/ia64/vmx/viosapic.c --- a/xen/arch/ia64/vmx/viosapic.cWed Oct 22 17:20:15 2008 +0900 +++ b/xen/arch/ia64/vmx/viosapic.cWed Oct 22 17:08:32 2008 +0800 @@ -121,6 +121,13 @@ redir_num, vector); return; } +if ( iommu_enabled ) +{ +spin_unlock(viosapic-lock); +hvm_dpci_eoi(current-domain, redir_num, viosapic-redirtbl[redir_num]); + spin_lock(viosapic-lock); +} + service_iosapic(viosapic); spin_unlock(viosapic-lock); } viosapic-isr and irr must handled atomically. So unlocking and locking again breaks the requirement. (I haven't looked the viosapic code very closely, though. So I may be wrong.) diff -r 02c8733e2d91 xen/arch/ia64/xen/mm.c --- a/xen/arch/ia64/xen/mm.c Wed Oct 22 17:20:15 2008 +0900 +++ b/xen/arch/ia64/xen/mm.c Wed Oct 22 17:08:32 2008 +0800 @@ -1427,6 +1427,8 @@ if (mfn == INVALID_MFN) { // clear pte old_pte = ptep_get_and_clear(mm, mpaddr, pte); +if(!pte_mem(old_pte)) +return; mfn = pte_pfn(old_pte); } else { unsigned long old_arflags; @@ -1463,6 +1465,13 @@ perfc_incr(zap_domain_page_one); if(!mfn_valid(mfn)) return; + +{ +int i, j; +j = 1 (PAGE_SHIFT-PAGE_SHIFT_4K); +for(i = 0 ; i j; i++) +iommu_unmap_page(d, (mpaddrPAGE_SHIFT)*j + i); + } page = mfn_to_page(mfn); BUG_ON((page-count_info PGC_count_mask) == 0); @@ -2844,10 +2853,16 @@ __guest_physmap_add_page(struct domain *d, unsigned long gpfn, unsigned long mfn) { +int i, j; + set_gpfn_from_mfn(mfn, gpfn); smp_mb(); assign_domain_page_replace(d, gpfn PAGE_SHIFT, mfn, ASSIGN_writable | ASSIGN_pgc_allocated); +j = 1 (PAGE_SHIFT-PAGE_SHIFT_4K); + for(i = 0 ; i j; i++) +iommu_map_page(d, gpfn*j + i, mfn*j + i); + } int The same loop logic. Introducing a helper function would help? ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Re: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
On Thu, Oct 23, 2008 at 10:31:20AM +0800, Xu, Anthony wrote: Iommu_un/map_page does nothing for PV domain. Why is there a race? Oh sorry. I'm not very familiar with iommu code (yet). My understanding is iommu_domain_init() is called unconditionaly by arch_domain_create() so that platform_ops is initialized. iommu_(un)map_page() checks iommu_enabled and whether platform_ops is initialized. Thus iommu_(un)map_page() manipulates iommu page table even for PV domain. Clearly I'm missing something important. Could you enlighten me? thanks, -- yamahata ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
Isaku Yamahata wrote: On Thu, Oct 23, 2008 at 10:31:20AM +0800, Xu, Anthony wrote: Iommu_un/map_page does nothing for PV domain. Why is there a race? Oh sorry. I'm not very familiar with iommu code (yet). My understanding is iommu_domain_init() is called unconditionaly by arch_domain_create() so that platform_ops is initialized. iommu_(un)map_page() checks iommu_enabled and whether platform_ops is initialized. Thus iommu_(un)map_page() manipulates iommu page table even for PV domain. Clearly I'm missing something important. Could you enlighten me? Acctually I missed something. First, PV domain can use VTD in x86 side. So before calling iommu_(un)map_page, Xen needs to check if it is needed. if ( iommu_enabled (is_hvm_domain(d) || need_iommu(d)) ) { I'll update the patch Thanks, Anthony ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
Xu, Anthony wrote: Isaku Yamahata wrote: On Thu, Oct 23, 2008 at 10:31:20AM +0800, Xu, Anthony wrote: Iommu_un/map_page does nothing for PV domain. Why is there a race? Oh sorry. I'm not very familiar with iommu code (yet). My understanding is iommu_domain_init() is called unconditionaly by arch_domain_create() so that platform_ops is initialized. iommu_(un)map_page() checks iommu_enabled and whether platform_ops is initialized. Thus iommu_(un)map_page() manipulates iommu page table even for PV domain. Clearly I'm missing something important. Could you enlighten me? Acctually I missed something. First, PV domain can use VTD in x86 side. So before calling iommu_(un)map_page, Xen needs to check if it is needed. if ( iommu_enabled (is_hvm_domain(d) || need_iommu(d)) ) { I'll update the patch There is an issue in VTD, That's even if no device is assigned to hvm guest, VTD page table is still setup. Intel is working on that, to make sure setting up VTD page table only when needed. Anthony Thanks, Anthony ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
RE: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
Hi shimura I only tested in old version 18134. I can reproduce it. And I find another way to handle it, Will send out patch soon. Thanks, anthony Kouya Shimura wrote: Hi Anthony, Xu, Anthony writes: -if (phy_pte.ma != VA_MATTR_NATPAGE) +/* if a device is assigned to a domain through VTD, the MMIO for this + * device needs to retain to UC attribute + */ +if (phy_pte.ma == VA_MATTR_WC) phy_pte.ma = VA_MATTR_WB; Doesn't this break the intention of the c/s 15134:466f71b1e831?a To be honest, I'm not sure. Kyouya or Akio, do you have any comments? This section is not included, need kyouya or akio confirmation. I applied only this hunk and tried invoking HVMs (windows2003 and RHEL5.1). In both of them, the system silently rebooted. The following Machine Check occurred. BUS Check Info [0] Transaction size: 0, Internal Bus Error:, Type: 1 (Partial read), Severity: 0, Hierarchy: 0, Status information: 0 (Unknown/unclassified) CPUID Regs: 0x49656e69756e6547 0x6c65746e 0 0x2504 I have no idea what's going on, though my machine might be ill-conditioned. Have you tested it? ___ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Re: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
[I added Kyouya and Akio to CC for comments on the hunk in vtlb.c] On Tue, Oct 21, 2008 at 07:38:21PM +0800, Xu, Anthony wrote: other small patches for VTD Signed-off-by: Anthony Xu [EMAIL PROTECTED] This patch is based on #Cset 18673 of xen-devel Thanks, Anthony The patch touches the common codes for both PV and HVM domain without any hvm domain check. Before you told the target was only hvm domain (at this moment). Hmm, what is the exact target the series of patch. (Off course, eventually you may want to support all the cases.) Some comments below. other small patches for VTD Signed-off-by: Anthony Xu [EMAIL PROTECTED] diff -r e1ed3e5cd001 xen/arch/ia64/linux-xen/acpi.c --- a/xen/arch/ia64/linux-xen/acpi.c Tue Oct 21 18:55:22 2008 +0800 +++ b/xen/arch/ia64/linux-xen/acpi.c Tue Oct 21 19:13:45 2008 +0800 @@ -797,6 +797,10 @@ if (acpi_table_parse(ACPI_SIG_FADT, acpi_parse_fadt)) printk(KERN_ERR PREFIX Can't find FADT\n); +#ifdef XEN + acpi_dmar_init(); +#endif + #ifdef CONFIG_SMP if (available_cpus == 0) { printk(KERN_INFO ACPI: Found 0 CPUS; assuming 1\n); diff -r e1ed3e5cd001 xen/arch/ia64/vmx/viosapic.c --- a/xen/arch/ia64/vmx/viosapic.cTue Oct 21 18:55:22 2008 +0800 +++ b/xen/arch/ia64/vmx/viosapic.cTue Oct 21 19:13:45 2008 +0800 @@ -121,6 +121,13 @@ redir_num, vector); return; } +if ( iommu_enabled ) +{ +spin_unlock(viosapic-lock); +hvm_dpci_eoi(current-domain, redir_num, viosapic-redirtbl[redir_num]); +spin_lock(viosapic-lock); +} + service_iosapic(viosapic); spin_unlock(viosapic-lock); } diff -r e1ed3e5cd001 xen/arch/ia64/vmx/vmx_fault.c --- a/xen/arch/ia64/vmx/vmx_fault.c Tue Oct 21 18:55:22 2008 +0800 +++ b/xen/arch/ia64/vmx/vmx_fault.c Tue Oct 21 19:13:45 2008 +0800 @@ -54,6 +54,7 @@ #include asm/shadow.h #include asm/sioemu.h #include public/arch-ia64/sioemu.h +#include xen/hvm/irq.h /* reset all PSR field to 0, except up,mfl,mfh,pk,dt,rt,mc,it */ #define INITIAL_PSR_VALUE_AT_INTERRUPTION 0x001808028034 @@ -306,6 +307,7 @@ viosapic_set_irq(d, callback_irq, 0); } } +hvm_dirq_assist(v); } rmb(); diff -r e1ed3e5cd001 xen/arch/ia64/vmx/vtlb.c --- a/xen/arch/ia64/vmx/vtlb.cTue Oct 21 18:55:22 2008 +0800 +++ b/xen/arch/ia64/vmx/vtlb.cTue Oct 21 19:13:45 2008 +0800 @@ -522,7 +522,10 @@ * which is required by vga acceleration since qemu maps shared * vram buffer with WB. */ -if (phy_pte.ma != VA_MATTR_NATPAGE) +/* if a device is assigned to a domain through VTD, the MMIO for this + * device needs to retain to UC attribute + */ +if (phy_pte.ma == VA_MATTR_WC) phy_pte.ma = VA_MATTR_WB; Doesn't this break the intention of the c/s 15134:466f71b1e831?a To be honest, I'm not sure. Kyouya or Akio, do you have any comments? maddr = ((maddr _PAGE_PPN_MASK) PAGE_MASK) | (paddr ~PAGE_MASK); diff -r e1ed3e5cd001 xen/arch/ia64/xen/domain.c --- a/xen/arch/ia64/xen/domain.c Tue Oct 21 18:55:22 2008 +0800 +++ b/xen/arch/ia64/xen/domain.c Tue Oct 21 19:13:45 2008 +0800 @@ -569,6 +569,7 @@ if (is_idle_domain(d)) return 0; + INIT_LIST_HEAD(d-arch.pdev_list); foreign_p2m_init(d); #ifdef CONFIG_XEN_IA64_PERVCPU_VHPT d-arch.has_pervcpu_vhpt = opt_pervcpu_vhpt; This hunk shoundn't be included in this patch. @@ -601,6 +602,9 @@ if ((d-arch.mm.pgd = pgd_alloc(d-arch.mm)) == NULL) goto fail_nomem; + if(iommu_domain_init(d) != 0) + goto fail_iommu; + /* * grant_table_create() can't fully initialize grant table for domain * because it is called before arch_domain_create(). @@ -617,6 +621,8 @@ dprintk(XENLOG_DEBUG, arch_domain_create: domain=%p\n, d); return 0; +fail_iommu: + iommu_domain_destroy(d); fail_nomem: tlb_track_destroy(d); fail_nomem1: @@ -635,6 +641,10 @@ if (d-shared_info != NULL) free_xenheap_pages(d-shared_info, get_order_from_shift(XSI_SHIFT)); + + pci_release_devices(d); + if( !is_idle_domain(d)) + iommu_domain_destroy(d); tlb_track_destroy(d); diff -r e1ed3e5cd001 xen/arch/ia64/xen/irq.c --- a/xen/arch/ia64/xen/irq.c Tue Oct 21 18:55:22 2008 +0800 +++ b/xen/arch/ia64/xen/irq.c Tue Oct 21 19:13:45 2008 +0800 @@ -312,12 +312,25 @@ struct domain *guest[IRQ_MAX_GUESTS]; } irq_guest_action_t; +static struct timer irq_guest_eoi_timer[NR_IRQS]; +static void irq_guest_eoi_timer_fn(void *data) +{ + irq_desc_t *desc = data; + unsigned vector = desc - irq_desc; + unsigned long flags; + + spin_lock_irqsave(desc-lock, flags); + desc-status =
RE: [Xen-ia64-devel][PATCH][VTD] small patches for VTD
Isaku Yamahata wrote: Doesn't this break the intention of the c/s 15134:466f71b1e831?a To be honest, I'm not sure. Kyouya or Akio, do you have any comments? maddr = ((maddr _PAGE_PPN_MASK) PAGE_MASK) | (paddr ~PAGE_MASK); diff -r e1ed3e5cd001 xen/arch/ia64/xen/domain.c --- a/xen/arch/ia64/xen/domain.c Tue Oct 21 18:55:22 2008 +0800 +++ b/xen/arch/ia64/xen/domain.c Tue Oct 21 19:13:45 2008 +0800 @@ -569,6 +569,7 @@ if (is_idle_domain(d)) return 0; + INIT_LIST_HEAD(d-arch.pdev_list); foreign_p2m_init(d); #ifdef CONFIG_XEN_IA64_PERVCPU_VHPT d-arch.has_pervcpu_vhpt = opt_pervcpu_vhpt; This hunk shoundn't be included in this patch. Agree @@ -601,6 +602,9 @@ if ((d-arch.mm.pgd = pgd_alloc(d-arch.mm)) == NULL) goto fail_nomem; + if(iommu_domain_init(d) != 0) + goto fail_iommu; + /* * grant_table_create() can't fully initialize grant table for domain * because it is called before arch_domain_create(). @@ -617,6 +621,8 @@ dprintk(XENLOG_DEBUG, arch_domain_create: domain=%p\n, d); return 0; +fail_iommu: + iommu_domain_destroy(d); fail_nomem: tlb_track_destroy(d); fail_nomem1: @@ -635,6 +641,10 @@ if (d-shared_info != NULL) free_xenheap_pages(d-shared_info, get_order_from_shift(XSI_SHIFT)); + + pci_release_devices(d); + if( !is_idle_domain(d)) + iommu_domain_destroy(d); tlb_track_destroy(d); diff -r e1ed3e5cd001 xen/arch/ia64/xen/irq.c --- a/xen/arch/ia64/xen/irq.c Tue Oct 21 18:55:22 2008 +0800 +++ b/xen/arch/ia64/xen/irq.c Tue Oct 21 19:13:45 2008 +0800 @@ -312,12 +312,25 @@ struct domain *guest[IRQ_MAX_GUESTS]; } irq_guest_action_t; +static struct timer irq_guest_eoi_timer[NR_IRQS]; +static void irq_guest_eoi_timer_fn(void *data) +{ + irq_desc_t *desc = data; + unsigned vector = desc - irq_desc; + unsigned long flags; + + spin_lock_irqsave(desc-lock, flags); + desc-status = ~IRQ_INPROGRESS; + desc-handler-enable(vector); + spin_unlock_irqrestore(desc-lock, flags); +} + void __do_IRQ_guest(int irq) { irq_desc_t *desc = irq_desc[irq]; irq_guest_action_t *action = (irq_guest_action_t *)desc-action; struct domain *d; -int i; +int i, already_pending = 0; for ( i = 0; i action-nr_guests; i++ ) { @@ -325,8 +338,29 @@ if ( (action-ack_type != ACKTYPE_NONE) !test_and_set_bit(irq, d-pirq_mask) ) action-in_flight++; -send_guest_pirq(d, irq); -} + if ( hvm_do_IRQ_dpci(d, irq) ) + { + if ( action-ack_type == ACKTYPE_NONE ) + { + already_pending += !!(desc-status IRQ_INPROGRESS); + desc-status |= IRQ_INPROGRESS; /* cleared during hvm eoi */ + } + } + else if ( send_guest_pirq(d, irq) + (action-ack_type == ACKTYPE_NONE) ) + { + already_pending++; + } + } + + if ( already_pending == action-nr_guests ) + { + desc-handler-disable(irq); + stop_timer(irq_guest_eoi_timer[irq]); + init_timer(irq_guest_eoi_timer[irq], + irq_guest_eoi_timer_fn, desc, smp_processor_id()); + set_timer(irq_guest_eoi_timer[irq], NOW() + MILLISECS(1)); + } } int pirq_acktype(int irq) Is those hunk for interrupt storm avoidance as 17963:1db0b09b290e? If so, please split it out and send it as another patch. It is different, 17963 is used to prevent interrupt storm of host MSI, xen/ia64 don't support MSI so far. This code segment is used to handle interrupt sharing between multiple domain. diff -r e1ed3e5cd001 xen/arch/ia64/xen/mm.c --- a/xen/arch/ia64/xen/mm.c Tue Oct 21 18:55:22 2008 +0800 +++ b/xen/arch/ia64/xen/mm.c Tue Oct 21 19:13:45 2008 +0800 @@ -189,6 +189,10 @@ static void __xencomm_mark_dirty(struct domain *d, unsigned long addr, unsigned int len); + +static void +zap_domain_page_one(struct domain *d, unsigned long mpaddr, +int clear_PGC_allocate, unsigned long mfn); extern unsigned long ia64_iobase; @@ -908,20 +912,21 @@ unsigned long flags) { volatile pte_t *pte; -pte_t old_pte; pte_t new_pte; pte_t ret_pte; unsigned long prot = flags_to_prot(flags); pte = lookup_alloc_domain_pte(d, mpaddr); +new_pte = pfn_pte(physaddr PAGE_SHIFT, __pgprot(prot)); +if(pte_val(new_pte) == pte_val(*pte)) +return 0; -old_pte = __pte(0); -new_pte = pfn_pte(physaddr PAGE_SHIFT, __pgprot(prot)); -ret_pte = ptep_cmpxchg_rel(d-arch.mm, mpaddr, pte, old_pte, new_pte);