Re: [PATCH] intel-iommu: Fix array overflow
On Thu, Oct 25, 2007 at 09:31:02AM +0200, Takashi Iwai wrote: > At Wed, 24 Oct 2007 16:30:37 -0700, > Mark Gross wrote: > > > > On Tue, Oct 23, 2007 at 10:57:51AM +0200, Takashi Iwai wrote: > > > Fix possible array overflow: > > > > > > drivers/pci/intel-iommu.c: In function ‘dmar_get_fault_reason’: > > > drivers/pci/intel-iommu.c:753: warning: array subscript is above array > > > bounds > > > drivers/pci/intel-iommu.c: In function ‘iommu_page_fault’: > > > drivers/pci/intel-iommu.c:753: warning: array subscript is above array > > > bounds > > > > > > Signed-off-by: Takashi Iwai <[EMAIL PROTECTED]> > > > > > > --- > > > drivers/pci/intel-iommu.c |4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c > > > index b3d7031..e4b0a0d 100644 > > > --- a/drivers/pci/intel-iommu.c > > > +++ b/drivers/pci/intel-iommu.c > > > @@ -749,8 +749,8 @@ static char *fault_reason_strings[] = > > > > > > char *dmar_get_fault_reason(u8 fault_reason) > > > { > > > - if (fault_reason > MAX_FAULT_REASON_IDX) > > > - return fault_reason_strings[MAX_FAULT_REASON_IDX]; > > > + if (fault_reason >= MAX_FAULT_REASON_IDX) > > > + return fault_reason_strings[MAX_FAULT_REASON_IDX - 1]; > > > > This looks like what the code meant to implement. > > I think not. The size of fault_reason_strings[] is > MAX_FAULT_REASON_IDX, not + 1. So gcc warning is correct. > Maybe the main problem is that the constant name is confusing... Yup, GCC warning is correct. the size of fault_reason_strings[] is MAX_FAULT_REASON_IDX and hence the max array that can be referenced is [MAX_FAULT_REASON_IDX -1], hence the fix by Takashi is correct. -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] intel-iommu: Fix array overflow
On Thu, Oct 25, 2007 at 09:31:02AM +0200, Takashi Iwai wrote: At Wed, 24 Oct 2007 16:30:37 -0700, Mark Gross wrote: On Tue, Oct 23, 2007 at 10:57:51AM +0200, Takashi Iwai wrote: Fix possible array overflow: drivers/pci/intel-iommu.c: In function ‘dmar_get_fault_reason’: drivers/pci/intel-iommu.c:753: warning: array subscript is above array bounds drivers/pci/intel-iommu.c: In function ‘iommu_page_fault’: drivers/pci/intel-iommu.c:753: warning: array subscript is above array bounds Signed-off-by: Takashi Iwai [EMAIL PROTECTED] --- drivers/pci/intel-iommu.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index b3d7031..e4b0a0d 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -749,8 +749,8 @@ static char *fault_reason_strings[] = char *dmar_get_fault_reason(u8 fault_reason) { - if (fault_reason MAX_FAULT_REASON_IDX) - return fault_reason_strings[MAX_FAULT_REASON_IDX]; + if (fault_reason = MAX_FAULT_REASON_IDX) + return fault_reason_strings[MAX_FAULT_REASON_IDX - 1]; This looks like what the code meant to implement. I think not. The size of fault_reason_strings[] is MAX_FAULT_REASON_IDX, not + 1. So gcc warning is correct. Maybe the main problem is that the constant name is confusing... Yup, GCC warning is correct. the size of fault_reason_strings[] is MAX_FAULT_REASON_IDX and hence the max array that can be referenced is [MAX_FAULT_REASON_IDX -1], hence the fix by Takashi is correct. -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH]: Genericizing iova.[ch]
On Tue, Oct 23, 2007 at 01:01:56AM -0700, David Miller wrote: > > I would like to potentially move the sparc64 IOMMU code over to using > the nice new drivers/pci/iova.[ch] code for free area management.. Welcome!! > > In order to do that we have to detach the IOMMU page size assumptions > which only really need to exist in the intel-iommu.[ch] code. > > This patch attempts to implement that. Your patch looks good to me. Andrew, Please queue this patch in your mm tree. > > Signed-off-by: David S. Miller <[EMAIL PROTECTED]> Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> > > diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c > index b3d7031..7f2aec4 100644 > --- a/drivers/pci/intel-iommu.c > +++ b/drivers/pci/intel-iommu.c > @@ -1089,7 +1089,7 @@ static void dmar_init_reserved_ranges(void) > int i; > u64 addr, size; > > - init_iova_domain(_iova_list); > + init_iova_domain(_iova_list, DMA_32BIT_PFN); > > /* IOAPIC ranges shouldn't be accessed by DMA */ > iova = reserve_iova(_iova_list, IOVA_PFN(IOAPIC_RANGE_START), > @@ -1143,7 +1143,7 @@ static int domain_init(struct dmar_domain *domain, int > guest_width) > int adjust_width, agaw; > unsigned long sagaw; > > - init_iova_domain(>iovad); > + init_iova_domain(>iovad, DMA_32BIT_PFN); > spin_lock_init(>mapping_lock); > > domain_reserve_special_ranges(domain); > diff --git a/drivers/pci/intel-iommu.h b/drivers/pci/intel-iommu.h > index ee88dd2..af8560c 100644 > --- a/drivers/pci/intel-iommu.h > +++ b/drivers/pci/intel-iommu.h > @@ -27,6 +27,19 @@ > #include > > /* > + * We need a fixed PAGE_SIZE of 4K irrespective of > + * arch PAGE_SIZE for IOMMU page tables. > + */ > +#define PAGE_SHIFT_4K(12) > +#define PAGE_SIZE_4K (1UL << PAGE_SHIFT_4K) > +#define PAGE_MASK_4K (((u64)-1) << PAGE_SHIFT_4K) > +#define PAGE_ALIGN_4K(addr) (((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K) > + > +#define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT_4K) > +#define DMA_32BIT_PFNIOVA_PFN(DMA_32BIT_MASK) > +#define DMA_64BIT_PFNIOVA_PFN(DMA_64BIT_MASK) > + > +/* > * Intel IOMMU register specification per version 1.0 public spec. > */ > > diff --git a/drivers/pci/iova.c b/drivers/pci/iova.c > index a84571c..8de7ab6 100644 > --- a/drivers/pci/iova.c > +++ b/drivers/pci/iova.c > @@ -9,19 +9,19 @@ > #include "iova.h" > > void > -init_iova_domain(struct iova_domain *iovad) > +init_iova_domain(struct iova_domain *iovad, unsigned long pfn_32bit) > { > spin_lock_init(>iova_alloc_lock); > spin_lock_init(>iova_rbtree_lock); > iovad->rbroot = RB_ROOT; > iovad->cached32_node = NULL; > - > + iovad->dma_32bit_pfn = pfn_32bit; > } > > static struct rb_node * > __get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn) > { > - if ((*limit_pfn != DMA_32BIT_PFN) || > + if ((*limit_pfn != iovad->dma_32bit_pfn) || > (iovad->cached32_node == NULL)) > return rb_last(>rbroot); > else { > @@ -37,7 +37,7 @@ static void > __cached_rbnode_insert_update(struct iova_domain *iovad, > unsigned long limit_pfn, struct iova *new) > { > - if (limit_pfn != DMA_32BIT_PFN) > + if (limit_pfn != iovad->dma_32bit_pfn) > return; > iovad->cached32_node = >node; > } > diff --git a/drivers/pci/iova.h b/drivers/pci/iova.h > index ae3028d..d521b5b 100644 > --- a/drivers/pci/iova.h > +++ b/drivers/pci/iova.h > @@ -15,22 +15,9 @@ > #include > #include > > -/* > - * We need a fixed PAGE_SIZE of 4K irrespective of > - * arch PAGE_SIZE for IOMMU page tables. > - */ > -#define PAGE_SHIFT_4K(12) > -#define PAGE_SIZE_4K (1UL << PAGE_SHIFT_4K) > -#define PAGE_MASK_4K (((u64)-1) << PAGE_SHIFT_4K) > -#define PAGE_ALIGN_4K(addr) (((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K) > - > /* IO virtual address start page frame number */ > #define IOVA_START_PFN (1) > > -#define IOVA_PFN(addr) ((addr) >> PAGE_SHIFT_4K) > -#define DMA_32BIT_PFNIOVA_PFN(DMA_32BIT_MASK) > -#define DMA_64BIT_PFNIOVA_PFN(DMA_64BIT_MASK) > - > /* iova structure */ > struct iova { > struct rb_node node; > @@ -44,6 +31,7 @@ struct iova_domain { > spinlock_t iova_rbtree_lock; /* Lock to protect update of rbtree */ > struct rb_root rbroot; /* iova domain rbtree root */ > struct rb_node *cached32_node; /* Save last alloced node */ > + unsigned long dma_32bit_pfn; > }; > > struct iova *alloc_iova_mem(void); > @@ -56,7 +44,7 @@ struct iova *alloc_iova(struct iova_domain *iovad, unsigned > long size, > struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo, > unsigned long pfn_hi); > void copy_reserved_iova(struct iova_domain *from, struct iova_domain *to); > -void init_iova_domain(struct iova_domain
Re: [PATCH]: Genericizing iova.[ch]
On Tue, Oct 23, 2007 at 01:01:56AM -0700, David Miller wrote: I would like to potentially move the sparc64 IOMMU code over to using the nice new drivers/pci/iova.[ch] code for free area management.. Welcome!! In order to do that we have to detach the IOMMU page size assumptions which only really need to exist in the intel-iommu.[ch] code. This patch attempts to implement that. Your patch looks good to me. Andrew, Please queue this patch in your mm tree. Signed-off-by: David S. Miller [EMAIL PROTECTED] Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index b3d7031..7f2aec4 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -1089,7 +1089,7 @@ static void dmar_init_reserved_ranges(void) int i; u64 addr, size; - init_iova_domain(reserved_iova_list); + init_iova_domain(reserved_iova_list, DMA_32BIT_PFN); /* IOAPIC ranges shouldn't be accessed by DMA */ iova = reserve_iova(reserved_iova_list, IOVA_PFN(IOAPIC_RANGE_START), @@ -1143,7 +1143,7 @@ static int domain_init(struct dmar_domain *domain, int guest_width) int adjust_width, agaw; unsigned long sagaw; - init_iova_domain(domain-iovad); + init_iova_domain(domain-iovad, DMA_32BIT_PFN); spin_lock_init(domain-mapping_lock); domain_reserve_special_ranges(domain); diff --git a/drivers/pci/intel-iommu.h b/drivers/pci/intel-iommu.h index ee88dd2..af8560c 100644 --- a/drivers/pci/intel-iommu.h +++ b/drivers/pci/intel-iommu.h @@ -27,6 +27,19 @@ #include linux/io.h /* + * We need a fixed PAGE_SIZE of 4K irrespective of + * arch PAGE_SIZE for IOMMU page tables. + */ +#define PAGE_SHIFT_4K(12) +#define PAGE_SIZE_4K (1UL PAGE_SHIFT_4K) +#define PAGE_MASK_4K (((u64)-1) PAGE_SHIFT_4K) +#define PAGE_ALIGN_4K(addr) (((addr) + PAGE_SIZE_4K - 1) PAGE_MASK_4K) + +#define IOVA_PFN(addr) ((addr) PAGE_SHIFT_4K) +#define DMA_32BIT_PFNIOVA_PFN(DMA_32BIT_MASK) +#define DMA_64BIT_PFNIOVA_PFN(DMA_64BIT_MASK) + +/* * Intel IOMMU register specification per version 1.0 public spec. */ diff --git a/drivers/pci/iova.c b/drivers/pci/iova.c index a84571c..8de7ab6 100644 --- a/drivers/pci/iova.c +++ b/drivers/pci/iova.c @@ -9,19 +9,19 @@ #include iova.h void -init_iova_domain(struct iova_domain *iovad) +init_iova_domain(struct iova_domain *iovad, unsigned long pfn_32bit) { spin_lock_init(iovad-iova_alloc_lock); spin_lock_init(iovad-iova_rbtree_lock); iovad-rbroot = RB_ROOT; iovad-cached32_node = NULL; - + iovad-dma_32bit_pfn = pfn_32bit; } static struct rb_node * __get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn) { - if ((*limit_pfn != DMA_32BIT_PFN) || + if ((*limit_pfn != iovad-dma_32bit_pfn) || (iovad-cached32_node == NULL)) return rb_last(iovad-rbroot); else { @@ -37,7 +37,7 @@ static void __cached_rbnode_insert_update(struct iova_domain *iovad, unsigned long limit_pfn, struct iova *new) { - if (limit_pfn != DMA_32BIT_PFN) + if (limit_pfn != iovad-dma_32bit_pfn) return; iovad-cached32_node = new-node; } diff --git a/drivers/pci/iova.h b/drivers/pci/iova.h index ae3028d..d521b5b 100644 --- a/drivers/pci/iova.h +++ b/drivers/pci/iova.h @@ -15,22 +15,9 @@ #include linux/rbtree.h #include linux/dma-mapping.h -/* - * We need a fixed PAGE_SIZE of 4K irrespective of - * arch PAGE_SIZE for IOMMU page tables. - */ -#define PAGE_SHIFT_4K(12) -#define PAGE_SIZE_4K (1UL PAGE_SHIFT_4K) -#define PAGE_MASK_4K (((u64)-1) PAGE_SHIFT_4K) -#define PAGE_ALIGN_4K(addr) (((addr) + PAGE_SIZE_4K - 1) PAGE_MASK_4K) - /* IO virtual address start page frame number */ #define IOVA_START_PFN (1) -#define IOVA_PFN(addr) ((addr) PAGE_SHIFT_4K) -#define DMA_32BIT_PFNIOVA_PFN(DMA_32BIT_MASK) -#define DMA_64BIT_PFNIOVA_PFN(DMA_64BIT_MASK) - /* iova structure */ struct iova { struct rb_node node; @@ -44,6 +31,7 @@ struct iova_domain { spinlock_t iova_rbtree_lock; /* Lock to protect update of rbtree */ struct rb_root rbroot; /* iova domain rbtree root */ struct rb_node *cached32_node; /* Save last alloced node */ + unsigned long dma_32bit_pfn; }; struct iova *alloc_iova_mem(void); @@ -56,7 +44,7 @@ struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size, struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo, unsigned long pfn_hi); void copy_reserved_iova(struct iova_domain *from, struct iova_domain *to); -void init_iova_domain(struct iova_domain *iovad); +void init_iova_domain(struct iova_domain *iovad,
Re: [PATCH] intel-iommu: fix sg_page()
On Tue, Oct 23, 2007 at 01:59:26PM +0900, FUJITA Tomonori wrote: > drivers/pci/intel-iommu.c: In function 'intel_unmap_sg': > drivers/pci/intel-iommu.c:1987: error: 'struct scatterlist' has no member > named 'page' > drivers/pci/intel-iommu.c: In function 'intel_nontranslate_map_sg': > drivers/pci/intel-iommu.c:2013: error: 'struct scatterlist' has no member > named 'page' > drivers/pci/intel-iommu.c:2014: error: 'struct scatterlist' has no member > named 'page' > drivers/pci/intel-iommu.c: In function 'intel_map_sg': > drivers/pci/intel-iommu.c:2044: error: 'struct scatterlist' has no member > named 'page' > drivers/pci/intel-iommu.c:2068: error: 'struct scatterlist' has no member > named 'page' > > Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]> Acked by: Anil S Keshavamurthy <[EMAIL PROTECTED]> > --- > drivers/pci/intel-iommu.c | 11 +-- > 1 files changed, 5 insertions(+), 6 deletions(-) > > diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c > index b3d7031..0f6e528 100644 > --- a/drivers/pci/intel-iommu.c > +++ b/drivers/pci/intel-iommu.c > @@ -1962,7 +1962,6 @@ static void intel_free_coherent(struct device *hwdev, > size_t size, > free_pages((unsigned long)vaddr, order); > } > > -#define SG_ENT_VIRT_ADDRESS(sg) (page_address((sg)->page) + > (sg)->offset) > static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sglist, > int nelems, int dir) > { > @@ -1984,7 +1983,7 @@ static void intel_unmap_sg(struct device *hwdev, struct > scatterlist *sglist, > if (!iova) > return; > for_each_sg(sglist, sg, nelems, i) { > - addr = SG_ENT_VIRT_ADDRESS(sg); > + addr = sg_virt(sg); > size += aligned_size((u64)addr, sg->length); > } > > @@ -2010,8 +2009,8 @@ static int intel_nontranslate_map_sg(struct device > *hddev, > struct scatterlist *sg; > > for_each_sg(sglist, sg, nelems, i) { > - BUG_ON(!sg->page); > - sg->dma_address = virt_to_bus(SG_ENT_VIRT_ADDRESS(sg)); > + BUG_ON(!sg_page(sg)); > + sg->dma_address = virt_to_bus(sg_virt(sg)); > sg->dma_length = sg->length; > } > return nelems; > @@ -2041,7 +2040,7 @@ static int intel_map_sg(struct device *hwdev, struct > scatterlist *sglist, > return 0; > > for_each_sg(sglist, sg, nelems, i) { > - addr = SG_ENT_VIRT_ADDRESS(sg); > + addr = sg_virt(sg); > addr = (void *)virt_to_phys(addr); > size += aligned_size((u64)addr, sg->length); > } > @@ -2065,7 +2064,7 @@ static int intel_map_sg(struct device *hwdev, struct > scatterlist *sglist, > start_addr = iova->pfn_lo << PAGE_SHIFT_4K; > offset = 0; > for_each_sg(sglist, sg, nelems, i) { > - addr = SG_ENT_VIRT_ADDRESS(sg); > + addr = sg_virt(sg); > addr = (void *)virt_to_phys(addr); > size = aligned_size((u64)addr, sg->length); > ret = domain_page_mapping(domain, start_addr + offset, > -- > 1.5.2.4 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] intel-iommu: fix sg_page()
On Tue, Oct 23, 2007 at 01:59:26PM +0900, FUJITA Tomonori wrote: drivers/pci/intel-iommu.c: In function 'intel_unmap_sg': drivers/pci/intel-iommu.c:1987: error: 'struct scatterlist' has no member named 'page' drivers/pci/intel-iommu.c: In function 'intel_nontranslate_map_sg': drivers/pci/intel-iommu.c:2013: error: 'struct scatterlist' has no member named 'page' drivers/pci/intel-iommu.c:2014: error: 'struct scatterlist' has no member named 'page' drivers/pci/intel-iommu.c: In function 'intel_map_sg': drivers/pci/intel-iommu.c:2044: error: 'struct scatterlist' has no member named 'page' drivers/pci/intel-iommu.c:2068: error: 'struct scatterlist' has no member named 'page' Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED] Acked by: Anil S Keshavamurthy [EMAIL PROTECTED] --- drivers/pci/intel-iommu.c | 11 +-- 1 files changed, 5 insertions(+), 6 deletions(-) diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index b3d7031..0f6e528 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -1962,7 +1962,6 @@ static void intel_free_coherent(struct device *hwdev, size_t size, free_pages((unsigned long)vaddr, order); } -#define SG_ENT_VIRT_ADDRESS(sg) (page_address((sg)-page) + (sg)-offset) static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sglist, int nelems, int dir) { @@ -1984,7 +1983,7 @@ static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sglist, if (!iova) return; for_each_sg(sglist, sg, nelems, i) { - addr = SG_ENT_VIRT_ADDRESS(sg); + addr = sg_virt(sg); size += aligned_size((u64)addr, sg-length); } @@ -2010,8 +2009,8 @@ static int intel_nontranslate_map_sg(struct device *hddev, struct scatterlist *sg; for_each_sg(sglist, sg, nelems, i) { - BUG_ON(!sg-page); - sg-dma_address = virt_to_bus(SG_ENT_VIRT_ADDRESS(sg)); + BUG_ON(!sg_page(sg)); + sg-dma_address = virt_to_bus(sg_virt(sg)); sg-dma_length = sg-length; } return nelems; @@ -2041,7 +2040,7 @@ static int intel_map_sg(struct device *hwdev, struct scatterlist *sglist, return 0; for_each_sg(sglist, sg, nelems, i) { - addr = SG_ENT_VIRT_ADDRESS(sg); + addr = sg_virt(sg); addr = (void *)virt_to_phys(addr); size += aligned_size((u64)addr, sg-length); } @@ -2065,7 +2064,7 @@ static int intel_map_sg(struct device *hwdev, struct scatterlist *sglist, start_addr = iova-pfn_lo PAGE_SHIFT_4K; offset = 0; for_each_sg(sglist, sg, nelems, i) { - addr = SG_ENT_VIRT_ADDRESS(sg); + addr = sg_virt(sg); addr = (void *)virt_to_phys(addr); size = aligned_size((u64)addr, sg-length); ret = domain_page_mapping(domain, start_addr + offset, -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch take 2][Intel-IOMMU] Fix for IOMMU early crash
On Thu, Oct 04, 2007 at 01:39:39PM +1000, Benjamin Herrenschmidt wrote: > > > > Why don't you use the new struct dev_archdata mechanism ? That's what I > > > use on powerpc to provide optional iommu linkage to any device in the > > > system. > > Good one. I will certainly try out your idea and will update the list > > tomorrow. > > The advantage is that it allows to completely isolate the iommu code > from any dependency to PCI, which means you can implement DMA ops > support for various platform devices or other fancy things. Maybe not > the most useful in x86-land, but still ;-) Andrew, Please delete my previous patch and add the below patch to your MM queue. I guess the patch name is "intel-iommu-fix-for-iommu-early-crash.patch". Thanks, Anil -- Subject: [Intel-IOMMU] Fix for IOMMU early crash pci_dev's->sysdata is highly overloaded and currently IOMMU is broken due to IOMMU code depending on this field. This patch introduces new field in pci_dev's dev.archdata struct to hold IOMMU specific per device IOMMU private data. Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- drivers/pci/intel-iommu.c | 22 +++--- include/asm-x86_64/device.h |3 +++ 2 files changed, 14 insertions(+), 11 deletions(-) Index: 2.6-mm/drivers/pci/intel-iommu.c === --- 2.6-mm.orig/drivers/pci/intel-iommu.c 2007-10-04 11:35:09.0 -0700 +++ 2.6-mm/drivers/pci/intel-iommu.c2007-10-04 11:47:47.0 -0700 @@ -1348,7 +1348,7 @@ list_del(>link); list_del(>global); if (info->dev) - info->dev->sysdata = NULL; + info->dev->dev.archdata.iommu = NULL; spin_unlock_irqrestore(_domain_lock, flags); detach_domain_for_dev(info->domain, info->bus, info->devfn); @@ -1361,7 +1361,7 @@ /* * find_domain - * Note: we use struct pci_dev->sysdata stores the info + * Note: we use struct pci_dev->dev.archdata.iommu stores the info */ struct dmar_domain * find_domain(struct pci_dev *pdev) @@ -1369,7 +1369,7 @@ struct device_domain_info *info; /* No lock here, assumes no domain exit in normal case */ - info = pdev->sysdata; + info = pdev->dev.archdata.iommu; if (info) return info->domain; return NULL; @@ -1519,7 +1519,7 @@ } list_add(>link, >devices); list_add(>global, _domain_list); - pdev->sysdata = info; + pdev->dev.archdata.iommu = info; spin_unlock_irqrestore(_domain_lock, flags); return domain; error: @@ -1579,7 +1579,7 @@ static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr, struct pci_dev *pdev) { - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO) return 0; return iommu_prepare_identity_map(pdev, rmrr->base_address, rmrr->end_address + 1); @@ -1595,7 +1595,7 @@ int ret; for_each_pci_dev(pdev) { - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO || + if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO || !IS_GFX_DEVICE(pdev)) continue; printk(KERN_INFO "IOMMU: gfx device %s 1-1 mapping\n", @@ -1836,7 +1836,7 @@ int prot = 0; BUG_ON(dir == DMA_NONE); - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO) return virt_to_bus(addr); domain = get_valid_domain_for_dev(pdev); @@ -1900,7 +1900,7 @@ unsigned long start_addr; struct iova *iova; - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO) return; domain = find_domain(pdev); BUG_ON(!domain); @@ -1974,7 +1974,7 @@ size_t size = 0; void *addr; - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO) return; domain = find_domain(pdev); @@ -2032,7 +2032,7 @@ unsigned long start_addr; BUG_ON(dir == DMA_NONE); - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO) return intel_nontranslate_map_sg(hwdev, sg, nelems, dir); domain = get_valid_domain_for_dev(pdev); @@ -2234,7 +2234,7 @@ for (i = 0; i < drhd->devices_cnt; i++) { if (!drhd->devices[i]) continue; - drhd->devices[i]->sysdata = DUMMY_DEVICE_DOMAIN_INFO; + drhd->devices[i]->dev.archdata.iommu =
Re: [patch take 2][Intel-IOMMU] Fix for IOMMU early crash
On Thu, Oct 04, 2007 at 01:39:39PM +1000, Benjamin Herrenschmidt wrote: Why don't you use the new struct dev_archdata mechanism ? That's what I use on powerpc to provide optional iommu linkage to any device in the system. Good one. I will certainly try out your idea and will update the list tomorrow. The advantage is that it allows to completely isolate the iommu code from any dependency to PCI, which means you can implement DMA ops support for various platform devices or other fancy things. Maybe not the most useful in x86-land, but still ;-) Andrew, Please delete my previous patch and add the below patch to your MM queue. I guess the patch name is intel-iommu-fix-for-iommu-early-crash.patch. Thanks, Anil -- Subject: [Intel-IOMMU] Fix for IOMMU early crash pci_dev's-sysdata is highly overloaded and currently IOMMU is broken due to IOMMU code depending on this field. This patch introduces new field in pci_dev's dev.archdata struct to hold IOMMU specific per device IOMMU private data. Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- drivers/pci/intel-iommu.c | 22 +++--- include/asm-x86_64/device.h |3 +++ 2 files changed, 14 insertions(+), 11 deletions(-) Index: 2.6-mm/drivers/pci/intel-iommu.c === --- 2.6-mm.orig/drivers/pci/intel-iommu.c 2007-10-04 11:35:09.0 -0700 +++ 2.6-mm/drivers/pci/intel-iommu.c2007-10-04 11:47:47.0 -0700 @@ -1348,7 +1348,7 @@ list_del(info-link); list_del(info-global); if (info-dev) - info-dev-sysdata = NULL; + info-dev-dev.archdata.iommu = NULL; spin_unlock_irqrestore(device_domain_lock, flags); detach_domain_for_dev(info-domain, info-bus, info-devfn); @@ -1361,7 +1361,7 @@ /* * find_domain - * Note: we use struct pci_dev-sysdata stores the info + * Note: we use struct pci_dev-dev.archdata.iommu stores the info */ struct dmar_domain * find_domain(struct pci_dev *pdev) @@ -1369,7 +1369,7 @@ struct device_domain_info *info; /* No lock here, assumes no domain exit in normal case */ - info = pdev-sysdata; + info = pdev-dev.archdata.iommu; if (info) return info-domain; return NULL; @@ -1519,7 +1519,7 @@ } list_add(info-link, domain-devices); list_add(info-global, device_domain_list); - pdev-sysdata = info; + pdev-dev.archdata.iommu = info; spin_unlock_irqrestore(device_domain_lock, flags); return domain; error: @@ -1579,7 +1579,7 @@ static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr, struct pci_dev *pdev) { - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO) return 0; return iommu_prepare_identity_map(pdev, rmrr-base_address, rmrr-end_address + 1); @@ -1595,7 +1595,7 @@ int ret; for_each_pci_dev(pdev) { - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO || + if (pdev-dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO || !IS_GFX_DEVICE(pdev)) continue; printk(KERN_INFO IOMMU: gfx device %s 1-1 mapping\n, @@ -1836,7 +1836,7 @@ int prot = 0; BUG_ON(dir == DMA_NONE); - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO) return virt_to_bus(addr); domain = get_valid_domain_for_dev(pdev); @@ -1900,7 +1900,7 @@ unsigned long start_addr; struct iova *iova; - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO) return; domain = find_domain(pdev); BUG_ON(!domain); @@ -1974,7 +1974,7 @@ size_t size = 0; void *addr; - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO) return; domain = find_domain(pdev); @@ -2032,7 +2032,7 @@ unsigned long start_addr; BUG_ON(dir == DMA_NONE); - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-dev.archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO) return intel_nontranslate_map_sg(hwdev, sg, nelems, dir); domain = get_valid_domain_for_dev(pdev); @@ -2234,7 +2234,7 @@ for (i = 0; i drhd-devices_cnt; i++) { if (!drhd-devices[i]) continue; - drhd-devices[i]-sysdata = DUMMY_DEVICE_DOMAIN_INFO; + drhd-devices[i]-dev.archdata.iommu = DUMMY_DEVICE_DOMAIN_INFO;
Re: [patch take 2][Intel-IOMMU] Fix for IOMMU early crash
On Thu, Oct 04, 2007 at 11:19:33AM +1000, Benjamin Herrenschmidt wrote: > > Index: 2.6-mm/include/linux/pci.h > > === > > --- 2.6-mm.orig/include/linux/pci.h 2007-10-03 13:48:20.0 -0700 > > +++ 2.6-mm/include/linux/pci.h 2007-10-03 13:49:08.0 -0700 > > @@ -195,6 +195,7 @@ > > #ifdef CONFIG_PCI_MSI > > struct list_head msi_list; > > #endif > > + void*iommu_private; /* hook for IOMMU specific extension */ > > }; > > I'm not fan of this. That would imply that iommu stuff is specific to > PCI or that sort of thing. > > Why don't you use the new struct dev_archdata mechanism ? That's what I > use on powerpc to provide optional iommu linkage to any device in the > system. Good one. I will certainly try out your idea and will update the list tomorrow. -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] intel-iommu sg chaining support
On Wed, Oct 03, 2007 at 02:12:03PM -0700, Andrew Morton wrote: > On Mon, 1 Oct 2007 09:12:56 -0700 > "Keshavamurthy, Anil S" <[EMAIL PROTECTED]> wrote: > > > On Sat, Sep 29, 2007 at 05:16:38AM -0700, FUJITA Tomonori wrote: > > > > > >x86_64 defines ARCH_HAS_SG_CHAIN. So if IOMMU implementations don't > > >support sg chaining, we will get data corruption. > > >Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]> > > > > Acked-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> > > > > Am I correct in believing that this patch is needed only when the > chaining patches which are presently in git-block are combined with the > intel-iommu work which is presently in -mm? Yes, that is correct. -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch take 2][Intel-IOMMU] Fix for IOMMU early crash
Subject: [patch][Intel-IOMMU] Fix for IOMMU early crash pci_dev's->sysdata is highly overloaded and currently IOMMU is broken due to IOMMU code depending on this field. This patch introduces new field in pci_dev's struct to hold IOMMU specific per device IOMMU private data. Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- drivers/pci/intel-iommu.c | 22 +++--- include/linux/pci.h |1 + 2 files changed, 12 insertions(+), 11 deletions(-) Index: 2.6-mm/drivers/pci/intel-iommu.c === --- 2.6-mm.orig/drivers/pci/intel-iommu.c 2007-10-03 13:48:18.0 -0700 +++ 2.6-mm/drivers/pci/intel-iommu.c2007-10-03 13:48:41.0 -0700 @@ -1348,7 +1348,7 @@ list_del(>link); list_del(>global); if (info->dev) - info->dev->sysdata = NULL; + info->dev->iommu_private = NULL; spin_unlock_irqrestore(_domain_lock, flags); detach_domain_for_dev(info->domain, info->bus, info->devfn); @@ -1361,7 +1361,7 @@ /* * find_domain - * Note: we use struct pci_dev->sysdata stores the info + * Note: we use struct pci_dev->iommu_private stores the info */ struct dmar_domain * find_domain(struct pci_dev *pdev) @@ -1369,7 +1369,7 @@ struct device_domain_info *info; /* No lock here, assumes no domain exit in normal case */ - info = pdev->sysdata; + info = pdev->iommu_private; if (info) return info->domain; return NULL; @@ -1519,7 +1519,7 @@ } list_add(>link, >devices); list_add(>global, _domain_list); - pdev->sysdata = info; + pdev->iommu_private = info; spin_unlock_irqrestore(_domain_lock, flags); return domain; error: @@ -1579,7 +1579,7 @@ static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr, struct pci_dev *pdev) { - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->iommu_private == DUMMY_DEVICE_DOMAIN_INFO) return 0; return iommu_prepare_identity_map(pdev, rmrr->base_address, rmrr->end_address + 1); @@ -1595,7 +1595,7 @@ int ret; for_each_pci_dev(pdev) { - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO || + if (pdev->iommu_private == DUMMY_DEVICE_DOMAIN_INFO || !IS_GFX_DEVICE(pdev)) continue; printk(KERN_INFO "IOMMU: gfx device %s 1-1 mapping\n", @@ -1836,7 +1836,7 @@ int prot = 0; BUG_ON(dir == DMA_NONE); - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->iommu_private == DUMMY_DEVICE_DOMAIN_INFO) return virt_to_bus(addr); domain = get_valid_domain_for_dev(pdev); @@ -1900,7 +1900,7 @@ unsigned long start_addr; struct iova *iova; - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->iommu_private == DUMMY_DEVICE_DOMAIN_INFO) return; domain = find_domain(pdev); BUG_ON(!domain); @@ -1974,7 +1974,7 @@ size_t size = 0; void *addr; - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->iommu_private == DUMMY_DEVICE_DOMAIN_INFO) return; domain = find_domain(pdev); @@ -2032,7 +2032,7 @@ unsigned long start_addr; BUG_ON(dir == DMA_NONE); - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->iommu_private == DUMMY_DEVICE_DOMAIN_INFO) return intel_nontranslate_map_sg(hwdev, sg, nelems, dir); domain = get_valid_domain_for_dev(pdev); @@ -2234,7 +2234,7 @@ for (i = 0; i < drhd->devices_cnt; i++) { if (!drhd->devices[i]) continue; - drhd->devices[i]->sysdata = DUMMY_DEVICE_DOMAIN_INFO; + drhd->devices[i]->iommu_private = DUMMY_DEVICE_DOMAIN_INFO; } } } Index: 2.6-mm/include/linux/pci.h === --- 2.6-mm.orig/include/linux/pci.h 2007-10-03 13:48:20.0 -0700 +++ 2.6-mm/include/linux/pci.h 2007-10-03 13:49:08.0 -0700 @@ -195,6 +195,7 @@ #ifdef CONFIG_PCI_MSI struct list_head msi_list; #endif + void*iommu_private; /* hook for IOMMU specific extension */ }; extern struct pci_dev *alloc_pci_dev(void); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch take 2][Intel-IOMMU] Fix for IOMMU early crash
Subject: [patch][Intel-IOMMU] Fix for IOMMU early crash pci_dev's-sysdata is highly overloaded and currently IOMMU is broken due to IOMMU code depending on this field. This patch introduces new field in pci_dev's struct to hold IOMMU specific per device IOMMU private data. Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- drivers/pci/intel-iommu.c | 22 +++--- include/linux/pci.h |1 + 2 files changed, 12 insertions(+), 11 deletions(-) Index: 2.6-mm/drivers/pci/intel-iommu.c === --- 2.6-mm.orig/drivers/pci/intel-iommu.c 2007-10-03 13:48:18.0 -0700 +++ 2.6-mm/drivers/pci/intel-iommu.c2007-10-03 13:48:41.0 -0700 @@ -1348,7 +1348,7 @@ list_del(info-link); list_del(info-global); if (info-dev) - info-dev-sysdata = NULL; + info-dev-iommu_private = NULL; spin_unlock_irqrestore(device_domain_lock, flags); detach_domain_for_dev(info-domain, info-bus, info-devfn); @@ -1361,7 +1361,7 @@ /* * find_domain - * Note: we use struct pci_dev-sysdata stores the info + * Note: we use struct pci_dev-iommu_private stores the info */ struct dmar_domain * find_domain(struct pci_dev *pdev) @@ -1369,7 +1369,7 @@ struct device_domain_info *info; /* No lock here, assumes no domain exit in normal case */ - info = pdev-sysdata; + info = pdev-iommu_private; if (info) return info-domain; return NULL; @@ -1519,7 +1519,7 @@ } list_add(info-link, domain-devices); list_add(info-global, device_domain_list); - pdev-sysdata = info; + pdev-iommu_private = info; spin_unlock_irqrestore(device_domain_lock, flags); return domain; error: @@ -1579,7 +1579,7 @@ static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr, struct pci_dev *pdev) { - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-iommu_private == DUMMY_DEVICE_DOMAIN_INFO) return 0; return iommu_prepare_identity_map(pdev, rmrr-base_address, rmrr-end_address + 1); @@ -1595,7 +1595,7 @@ int ret; for_each_pci_dev(pdev) { - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO || + if (pdev-iommu_private == DUMMY_DEVICE_DOMAIN_INFO || !IS_GFX_DEVICE(pdev)) continue; printk(KERN_INFO IOMMU: gfx device %s 1-1 mapping\n, @@ -1836,7 +1836,7 @@ int prot = 0; BUG_ON(dir == DMA_NONE); - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-iommu_private == DUMMY_DEVICE_DOMAIN_INFO) return virt_to_bus(addr); domain = get_valid_domain_for_dev(pdev); @@ -1900,7 +1900,7 @@ unsigned long start_addr; struct iova *iova; - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-iommu_private == DUMMY_DEVICE_DOMAIN_INFO) return; domain = find_domain(pdev); BUG_ON(!domain); @@ -1974,7 +1974,7 @@ size_t size = 0; void *addr; - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-iommu_private == DUMMY_DEVICE_DOMAIN_INFO) return; domain = find_domain(pdev); @@ -2032,7 +2032,7 @@ unsigned long start_addr; BUG_ON(dir == DMA_NONE); - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-iommu_private == DUMMY_DEVICE_DOMAIN_INFO) return intel_nontranslate_map_sg(hwdev, sg, nelems, dir); domain = get_valid_domain_for_dev(pdev); @@ -2234,7 +2234,7 @@ for (i = 0; i drhd-devices_cnt; i++) { if (!drhd-devices[i]) continue; - drhd-devices[i]-sysdata = DUMMY_DEVICE_DOMAIN_INFO; + drhd-devices[i]-iommu_private = DUMMY_DEVICE_DOMAIN_INFO; } } } Index: 2.6-mm/include/linux/pci.h === --- 2.6-mm.orig/include/linux/pci.h 2007-10-03 13:48:20.0 -0700 +++ 2.6-mm/include/linux/pci.h 2007-10-03 13:49:08.0 -0700 @@ -195,6 +195,7 @@ #ifdef CONFIG_PCI_MSI struct list_head msi_list; #endif + void*iommu_private; /* hook for IOMMU specific extension */ }; extern struct pci_dev *alloc_pci_dev(void); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] intel-iommu sg chaining support
On Wed, Oct 03, 2007 at 02:12:03PM -0700, Andrew Morton wrote: On Mon, 1 Oct 2007 09:12:56 -0700 Keshavamurthy, Anil S [EMAIL PROTECTED] wrote: On Sat, Sep 29, 2007 at 05:16:38AM -0700, FUJITA Tomonori wrote: x86_64 defines ARCH_HAS_SG_CHAIN. So if IOMMU implementations don't support sg chaining, we will get data corruption. Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED] Acked-by: Anil S Keshavamurthy [EMAIL PROTECTED] Am I correct in believing that this patch is needed only when the chaining patches which are presently in git-block are combined with the intel-iommu work which is presently in -mm? Yes, that is correct. -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch take 2][Intel-IOMMU] Fix for IOMMU early crash
On Thu, Oct 04, 2007 at 11:19:33AM +1000, Benjamin Herrenschmidt wrote: Index: 2.6-mm/include/linux/pci.h === --- 2.6-mm.orig/include/linux/pci.h 2007-10-03 13:48:20.0 -0700 +++ 2.6-mm/include/linux/pci.h 2007-10-03 13:49:08.0 -0700 @@ -195,6 +195,7 @@ #ifdef CONFIG_PCI_MSI struct list_head msi_list; #endif + void*iommu_private; /* hook for IOMMU specific extension */ }; I'm not fan of this. That would imply that iommu stuff is specific to PCI or that sort of thing. Why don't you use the new struct dev_archdata mechanism ? That's what I use on powerpc to provide optional iommu linkage to any device in the system. Good one. I will certainly try out your idea and will update the list tomorrow. -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] intel-iommu sg chaining support
On Sat, Sep 29, 2007 at 05:16:38AM -0700, FUJITA Tomonori wrote: > >x86_64 defines ARCH_HAS_SG_CHAIN. So if IOMMU implementations don't >support sg chaining, we will get data corruption. >Signed-off-by: FUJITA Tomonori <[EMAIL PROTECTED]> Acked-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> >--- > drivers/pci/intel-iommu.c | 32 > 1 files changed, 16 insertions(+), 16 deletions(-) >diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c >index dab329f..4668995 100644 >--- a/drivers/pci/intel-iommu.c >+++ b/drivers/pci/intel-iommu.c >@@ -1963,7 +1963,7 @@ static void intel_free_coherent(struct device *hwdev, >size_t size, > } > #define SG_ENT_VIRT_ADDRESS(sg) (page_address((sg)->page) + >(sg)->offset) >-static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sg, >+static void intel_unmap_sg(struct device *hwdev, struct scatterlist >*sglist, >int nelems, int dir) > { >int i; >@@ -1973,16 +1973,17 @@ static void intel_unmap_sg(struct device *hwdev, >struct scatterlist *sg, >struct iova *iova; >size_t size = 0; >void *addr; >+ struct scatterlist *sg; >if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) >return; >domain = find_domain(pdev); >- iova = find_iova(>iovad, IOVA_PFN(sg[0].dma_address)); >+ iova = find_iova(>iovad, IOVA_PFN(sglist[0].dma_address)); >if (!iova) >return; >- for (i = 0; i < nelems; i++, sg++) { >+ for_each_sg(sglist, sg, nelems, i) { >addr = SG_ENT_VIRT_ADDRESS(sg); >size += aligned_size((u64)addr, sg->length); >} >@@ -2003,20 +2004,20 @@ static void intel_unmap_sg(struct device *hwdev, >struct scatterlist *sg, > } > static int intel_nontranslate_map_sg(struct device *hddev, >- struct scatterlist *sg, int nelems, int dir) >+ struct scatterlist *sglist, int nelems, int dir) > { >int i; >+ struct scatterlist *sg; >- for (i = 0; i < nelems; i++) { >- struct scatterlist *s = [i]; >- BUG_ON(!s->page); >- s->dma_address = virt_to_bus(SG_ENT_VIRT_ADDRESS(s)); >- s->dma_length = s->length; >+ for_each_sg(sglist, sg, nelems, i) { >+ BUG_ON(!sg->page); >+ sg->dma_address = virt_to_bus(SG_ENT_VIRT_ADDRESS(sg)); >+ sg->dma_length = sg->length; >} >return nelems; > } >-static int intel_map_sg(struct device *hwdev, struct scatterlist *sg, >+static int intel_map_sg(struct device *hwdev, struct scatterlist *sglist, >int nelems, int dir) > { >void *addr; >@@ -2028,18 +2029,18 @@ static int intel_map_sg(struct device *hwdev, > struct >scatterlist *sg, >size_t offset = 0; >struct iova *iova = NULL; >int ret; >- struct scatterlist *orig_sg = sg; >+ struct scatterlist *sg; >unsigned long start_addr; >BUG_ON(dir == DMA_NONE); >if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) >- return intel_nontranslate_map_sg(hwdev, sg, nelems, dir); >+ return intel_nontranslate_map_sg(hwdev, sglist, nelems, >dir); >domain = get_valid_domain_for_dev(pdev); >if (!domain) >return 0; >- for (i = 0; i < nelems; i++, sg++) { >+ for_each_sg(sglist, sg, nelems, i) { >addr = SG_ENT_VIRT_ADDRESS(sg); >addr = (void *)virt_to_phys(addr); >size += aligned_size((u64)addr, sg->length); >@@ -2047,7 +2048,7 @@ static int intel_map_sg(struct device *hwdev, struct >scatterlist *sg, >iova = __intel_alloc_iova(hwdev, domain, size); >if (!iova) { >- orig_sg->dma_length = 0; >+ sglist->dma_length = 0; >return 0; >} >@@ -2063,8 +2064,7 @@ static int intel_map_sg(struct device *hwdev, struct >scatterlist *sg, >start_addr = iova->pfn_lo << PAGE_SHIFT_4K; >offset = 0; >- sg = orig_sg; >- for (i = 0; i < nelems; i++, sg++) { >+ for_each_sg(sglist, sg, nelems, i) { >addr = SG_ENT_VIRT_ADDRESS(sg); >addr = (void *)virt_to_phys(addr); >size = aligned_size((u64)addr, sg->length); >-- >1.5.2.4 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] intel-iommu sg chaining support
On Sat, Sep 29, 2007 at 05:16:38AM -0700, FUJITA Tomonori wrote: x86_64 defines ARCH_HAS_SG_CHAIN. So if IOMMU implementations don't support sg chaining, we will get data corruption. Signed-off-by: FUJITA Tomonori [EMAIL PROTECTED] Acked-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- drivers/pci/intel-iommu.c | 32 1 files changed, 16 insertions(+), 16 deletions(-) diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index dab329f..4668995 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -1963,7 +1963,7 @@ static void intel_free_coherent(struct device *hwdev, size_t size, } #define SG_ENT_VIRT_ADDRESS(sg) (page_address((sg)-page) + (sg)-offset) -static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sg, +static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sglist, int nelems, int dir) { int i; @@ -1973,16 +1973,17 @@ static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sg, struct iova *iova; size_t size = 0; void *addr; + struct scatterlist *sg; if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) return; domain = find_domain(pdev); - iova = find_iova(domain-iovad, IOVA_PFN(sg[0].dma_address)); + iova = find_iova(domain-iovad, IOVA_PFN(sglist[0].dma_address)); if (!iova) return; - for (i = 0; i nelems; i++, sg++) { + for_each_sg(sglist, sg, nelems, i) { addr = SG_ENT_VIRT_ADDRESS(sg); size += aligned_size((u64)addr, sg-length); } @@ -2003,20 +2004,20 @@ static void intel_unmap_sg(struct device *hwdev, struct scatterlist *sg, } static int intel_nontranslate_map_sg(struct device *hddev, - struct scatterlist *sg, int nelems, int dir) + struct scatterlist *sglist, int nelems, int dir) { int i; + struct scatterlist *sg; - for (i = 0; i nelems; i++) { - struct scatterlist *s = sg[i]; - BUG_ON(!s-page); - s-dma_address = virt_to_bus(SG_ENT_VIRT_ADDRESS(s)); - s-dma_length = s-length; + for_each_sg(sglist, sg, nelems, i) { + BUG_ON(!sg-page); + sg-dma_address = virt_to_bus(SG_ENT_VIRT_ADDRESS(sg)); + sg-dma_length = sg-length; } return nelems; } -static int intel_map_sg(struct device *hwdev, struct scatterlist *sg, +static int intel_map_sg(struct device *hwdev, struct scatterlist *sglist, int nelems, int dir) { void *addr; @@ -2028,18 +2029,18 @@ static int intel_map_sg(struct device *hwdev, struct scatterlist *sg, size_t offset = 0; struct iova *iova = NULL; int ret; - struct scatterlist *orig_sg = sg; + struct scatterlist *sg; unsigned long start_addr; BUG_ON(dir == DMA_NONE); if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) - return intel_nontranslate_map_sg(hwdev, sg, nelems, dir); + return intel_nontranslate_map_sg(hwdev, sglist, nelems, dir); domain = get_valid_domain_for_dev(pdev); if (!domain) return 0; - for (i = 0; i nelems; i++, sg++) { + for_each_sg(sglist, sg, nelems, i) { addr = SG_ENT_VIRT_ADDRESS(sg); addr = (void *)virt_to_phys(addr); size += aligned_size((u64)addr, sg-length); @@ -2047,7 +2048,7 @@ static int intel_map_sg(struct device *hwdev, struct scatterlist *sg, iova = __intel_alloc_iova(hwdev, domain, size); if (!iova) { - orig_sg-dma_length = 0; + sglist-dma_length = 0; return 0; } @@ -2063,8 +2064,7 @@ static int intel_map_sg(struct device *hwdev, struct scatterlist *sg, start_addr = iova-pfn_lo PAGE_SHIFT_4K; offset = 0; - sg = orig_sg; - for (i = 0; i nelems; i++, sg++) { + for_each_sg(sglist, sg, nelems, i) { addr = SG_ENT_VIRT_ADDRESS(sg); addr = (void *)virt_to_phys(addr); size = aligned_size((u64)addr, sg-length); -- 1.5.2.4 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: KPROBES: Instrumenting a function's call site
On Wed, Sep 26, 2007 at 10:09:33AM +0530, Ananth N Mavinakayanahalli wrote: > On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote: > > Hello, > > I am trying to use kprobes to measure the latency of a function by > > instrumenting its call site. Basically, I find the call instruction, > > and insert a kprobe with a pre-handler and post-handler at that point. > > The pre-handler measures the latency (reads the TSC counter). The > > post-handler measures the latency again, and subtracts the value that > > was read in the pre-handler to compute the total latency of the called > > function. > > This sounds ok... So what you are really measuring is the latency of just that single instruction where you have inserted the probe i.e. because your pre-handler is called just before the probed instruction is executed and your post-handler is called right after you probed instruction is single-stepped. > > > So to measure the latency of foo(), I basically want kprobes to do this: > > pre_handler(); > > foo(); When you insert a probe, you are inserting probe on an instruction boundary and not at function level. > > post_handler(); Hence the above looks like pre-handler() Probed-instruction; // most likely the first instruction in the foo(); post-hanlder() rest-of-foo() > > > > The problem is that the latencies that I am getting are consistently low > > (~10,000 cycles). When I manually instrument the functions, the latency > > is about 20,000,000 cycles. Clearly something is not right here. As I mentioned above what you are seeing is the latency of just the probed instruction and hence it is very very low compared to the latency of the function foo(). > You could try a a couple of approaches for starters. I agree with Ananth, you can try the below approaches for your measurements. > > a. As you mention above, a kprobe on the function invocation and the > other on the instruction following the call; both need just pre_handlers. > > b. > - Insert a kprobe and a kretprobe on foo() > - The kprobe needs to have only a pre_handler that'll measure the latency > - A similar handler for the kretprobe handler can measure the latency > again and their difference will give you foo()'s latency. > > though will require you to do some housekeeping in case foo() is > reentrant to track which return instance corresponds to which call. > > Ananth > > PS: There was a thought of providing a facility to run a handler at > function entry even when just a kretprobe is used. Maybe we need to > relook at that; it'd have been useful in this case. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: KPROBES: Instrumenting a function's call site
On Wed, Sep 26, 2007 at 10:09:33AM +0530, Ananth N Mavinakayanahalli wrote: On Tue, Sep 25, 2007 at 06:12:38PM -0400, Avishay Traeger wrote: Hello, I am trying to use kprobes to measure the latency of a function by instrumenting its call site. Basically, I find the call instruction, and insert a kprobe with a pre-handler and post-handler at that point. The pre-handler measures the latency (reads the TSC counter). The post-handler measures the latency again, and subtracts the value that was read in the pre-handler to compute the total latency of the called function. This sounds ok... So what you are really measuring is the latency of just that single instruction where you have inserted the probe i.e. because your pre-handler is called just before the probed instruction is executed and your post-handler is called right after you probed instruction is single-stepped. So to measure the latency of foo(), I basically want kprobes to do this: pre_handler(); foo(); When you insert a probe, you are inserting probe on an instruction boundary and not at function level. post_handler(); Hence the above looks like pre-handler() Probed-instruction; // most likely the first instruction in the foo(); post-hanlder() rest-of-foo() The problem is that the latencies that I am getting are consistently low (~10,000 cycles). When I manually instrument the functions, the latency is about 20,000,000 cycles. Clearly something is not right here. As I mentioned above what you are seeing is the latency of just the probed instruction and hence it is very very low compared to the latency of the function foo(). You could try a a couple of approaches for starters. I agree with Ananth, you can try the below approaches for your measurements. a. As you mention above, a kprobe on the function invocation and the other on the instruction following the call; both need just pre_handlers. b. - Insert a kprobe and a kretprobe on foo() - The kprobe needs to have only a pre_handler that'll measure the latency - A similar handler for the kretprobe handler can measure the latency again and their difference will give you foo()'s latency. b though will require you to do some housekeeping in case foo() is reentrant to track which return instance corresponds to which call. Ananth PS: There was a thought of providing a facility to run a handler at function entry even when just a kretprobe is used. Maybe we need to relook at that; it'd have been useful in this case. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch][Intel-IOMMU] Fix for IOMMU early crash
On Sat, Sep 15, 2007 at 02:30:40AM +1000, Paul Mackerras wrote: > Keshavamurthy, Anil S writes: > > > Can I expect the ppc64 code changes from you? > > Once I get your, I will merge with mine and post it again. > > Sure, but it will be next week, since I am travelling this week. Any progress? -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch][Intel-IOMMU] Fix for IOMMU early crash
On Sat, Sep 15, 2007 at 02:30:40AM +1000, Paul Mackerras wrote: Keshavamurthy, Anil S writes: Can I expect the ppc64 code changes from you? Once I get your, I will merge with mine and post it again. Sure, but it will be next week, since I am travelling this week. Any progress? -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Fix BIOS-e820 end address
On Fri, Sep 14, 2007 at 02:31:59PM -0700, Siddha, Suresh B wrote: > On Fri, Sep 14, 2007 at 02:00:02PM -0700, Jeremy Fitzhardinge wrote: > > Keshavamurthy, Anil S wrote: > > > Subject: [patch] Fix BIOS-e820 end address > > > > > > --snip of boot message-- > > > BIOS-provided physical RAM map: > > > BIOS-e820: - 000a (usable) > > > BIOS-e820: 000f - 0010 (reserved) > > > BIOS-e820: 0010 - 7fe8cc00 (usable) > > > end snip--- > > > > > > As you see from above the address 0010 is both > > > shown as reserved and usable which is confusing. > > > > > > > I think this is consistent with many other kernel interfaces (such as > > /proc/X/maps) where the end address is taken to be exclusive: [0xf, > > 0x10). > > Andrew, Please disregard this patch. As Jermy, Jan pointed out, this > will cause more confusions. Thanks. I agree, we can discard my patch. -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch] Fix BIOS-e820 end address
On Fri, Sep 14, 2007 at 02:31:59PM -0700, Siddha, Suresh B wrote: On Fri, Sep 14, 2007 at 02:00:02PM -0700, Jeremy Fitzhardinge wrote: Keshavamurthy, Anil S wrote: Subject: [patch] Fix BIOS-e820 end address --snip of boot message-- BIOS-provided physical RAM map: BIOS-e820: - 000a (usable) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 7fe8cc00 (usable) end snip--- As you see from above the address 0010 is both shown as reserved and usable which is confusing. I think this is consistent with many other kernel interfaces (such as /proc/X/maps) where the end address is taken to be exclusive: [0xf, 0x10). Andrew, Please disregard this patch. As Jermy, Jan pointed out, this will cause more confusions. Thanks. I agree, we can discard my patch. -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch][Intel-IOMMU] Fix for IOMMU early crash
On Wed, Sep 12, 2007 at 05:48:52AM +1000, Paul Mackerras wrote: > Keshavamurthy, Anil S writes: > > > Subject: Fix IOMMU early crash > > > > This patch avoids copying pci_bus's->sysdata to > > pci_dev's->sysdata as one can easily obtain > > the same through pci_dev->bus->sysdata. > > At the moment this will cause ppc64 to crash, since we rely on > pci_dev->sysdata pointing to some node in the firmware device tree, > either the device's node or the node for a parent bus. > > We could change the ppc64 code to use pci_dev->bus->sysdata in the > case when pci_dev->sysdata is NULL, which would fix the problem. I > think that change should be incorporated as part of this patch so that > we don't break git bisection. Can I expect the ppc64 code changes from you? Once I get your, I will merge with mine and post it again. > > In other words I don't want to see this patch applied as it stands. Yup, I agree with you. -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch][Intel-IOMMU] Fix for IOMMU early crash
On Wed, Sep 12, 2007 at 05:48:52AM +1000, Paul Mackerras wrote: > Keshavamurthy, Anil S writes: > > > Subject: Fix IOMMU early crash > > > > This patch avoids copying pci_bus's->sysdata to > > pci_dev's->sysdata as one can easily obtain > > the same through pci_dev->bus->sysdata. > > At the moment this will cause ppc64 to crash, since we rely on > pci_dev->sysdata pointing to some node in the firmware device tree, > either the device's node or the node for a parent bus. > > We could change the ppc64 code to use pci_dev->bus->sysdata in the > case when pci_dev->sysdata is NULL, which would fix the problem. I > think that change should be incorporated as part of this patch so that > we don't break git bisection. Why do you want to check if pci_dev->sysdata is NULL then use pci_dev->bus->sysdata else pci_dev->sysdata? If you change this to always use pci_dev->bus->sysdata, then you don;t have to depend on my patch and your patch can get in independent of mine. > > In other words I don't want to see this patch applied as it stands. Is it possible to post your patch anytime soon? Or feel free to combine mine with yours and post it with your signed-off-by. Thanks, Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch][Intel-IOMMU] Fix for IOMMU early crash
Subject: Fix IOMMU early crash This patch avoids copying pci_bus's->sysdata to pci_dev's->sysdata as one can easily obtain the same through pci_dev->bus->sysdata. Now with some recent pci_sysdata patches, the bus's->sysdata gets populated way early and a value of non-NULL gets copied from bus's->sysdata to pci_dev's->sysdata which causes IOMMU to crash way early in the boot time as IOMMU depends on this field for its per device IOMMU context. Hence this patch not only makes pci_dev's->sysdata useful and but also fixes the IOMMU early crash. This patch needs to be applied before IOMMU patches, so that git bisect will not fail on IOMMU code :-) Tested on x86_64. Please apply. Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- drivers/pci/hotplug/fakephp.c |1 - drivers/pci/probe.c |1 - 2 files changed, 2 deletions(-) Index: work/drivers/pci/hotplug/fakephp.c === --- work.orig/drivers/pci/hotplug/fakephp.c 2007-09-11 10:29:30.0 -0700 +++ work/drivers/pci/hotplug/fakephp.c 2007-09-11 10:35:22.0 -0700 @@ -243,7 +243,6 @@ return; dev->bus = (struct pci_bus*)bus; - dev->sysdata = bus->sysdata; for (devfn = 0; devfn < 0x100; devfn += 8) { dev->devfn = devfn; pci_rescan_slot(dev); Index: work/drivers/pci/probe.c === --- work.orig/drivers/pci/probe.c 2007-09-11 10:29:30.0 -0700 +++ work/drivers/pci/probe.c2007-09-11 10:35:22.0 -0700 @@ -994,7 +994,6 @@ return NULL; dev->bus = bus; - dev->sysdata = bus->sysdata; dev->dev.parent = bus->bridge; dev->dev.bus = _bus_type; dev->devfn = devfn; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch][Intel-IOMMU] Fix for IOMMU early crash
Subject: Fix IOMMU early crash This patch avoids copying pci_bus's-sysdata to pci_dev's-sysdata as one can easily obtain the same through pci_dev-bus-sysdata. Now with some recent pci_sysdata patches, the bus's-sysdata gets populated way early and a value of non-NULL gets copied from bus's-sysdata to pci_dev's-sysdata which causes IOMMU to crash way early in the boot time as IOMMU depends on this field for its per device IOMMU context. Hence this patch not only makes pci_dev's-sysdata useful and but also fixes the IOMMU early crash. This patch needs to be applied before IOMMU patches, so that git bisect will not fail on IOMMU code :-) Tested on x86_64. Please apply. Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- drivers/pci/hotplug/fakephp.c |1 - drivers/pci/probe.c |1 - 2 files changed, 2 deletions(-) Index: work/drivers/pci/hotplug/fakephp.c === --- work.orig/drivers/pci/hotplug/fakephp.c 2007-09-11 10:29:30.0 -0700 +++ work/drivers/pci/hotplug/fakephp.c 2007-09-11 10:35:22.0 -0700 @@ -243,7 +243,6 @@ return; dev-bus = (struct pci_bus*)bus; - dev-sysdata = bus-sysdata; for (devfn = 0; devfn 0x100; devfn += 8) { dev-devfn = devfn; pci_rescan_slot(dev); Index: work/drivers/pci/probe.c === --- work.orig/drivers/pci/probe.c 2007-09-11 10:29:30.0 -0700 +++ work/drivers/pci/probe.c2007-09-11 10:35:22.0 -0700 @@ -994,7 +994,6 @@ return NULL; dev-bus = bus; - dev-sysdata = bus-sysdata; dev-dev.parent = bus-bridge; dev-dev.bus = pci_bus_type; dev-devfn = devfn; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch][Intel-IOMMU] Fix for IOMMU early crash
On Wed, Sep 12, 2007 at 05:48:52AM +1000, Paul Mackerras wrote: Keshavamurthy, Anil S writes: Subject: Fix IOMMU early crash This patch avoids copying pci_bus's-sysdata to pci_dev's-sysdata as one can easily obtain the same through pci_dev-bus-sysdata. At the moment this will cause ppc64 to crash, since we rely on pci_dev-sysdata pointing to some node in the firmware device tree, either the device's node or the node for a parent bus. We could change the ppc64 code to use pci_dev-bus-sysdata in the case when pci_dev-sysdata is NULL, which would fix the problem. I think that change should be incorporated as part of this patch so that we don't break git bisection. Why do you want to check if pci_dev-sysdata is NULL then use pci_dev-bus-sysdata else pci_dev-sysdata? If you change this to always use pci_dev-bus-sysdata, then you don;t have to depend on my patch and your patch can get in independent of mine. In other words I don't want to see this patch applied as it stands. Is it possible to post your patch anytime soon? Or feel free to combine mine with yours and post it with your signed-off-by. Thanks, Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch][Intel-IOMMU] Fix for IOMMU early crash
On Wed, Sep 12, 2007 at 05:48:52AM +1000, Paul Mackerras wrote: Keshavamurthy, Anil S writes: Subject: Fix IOMMU early crash This patch avoids copying pci_bus's-sysdata to pci_dev's-sysdata as one can easily obtain the same through pci_dev-bus-sysdata. At the moment this will cause ppc64 to crash, since we rely on pci_dev-sysdata pointing to some node in the firmware device tree, either the device's node or the node for a parent bus. We could change the ppc64 code to use pci_dev-bus-sysdata in the case when pci_dev-sysdata is NULL, which would fix the problem. I think that change should be incorporated as part of this patch so that we don't break git bisection. Can I expect the ppc64 code changes from you? Once I get your, I will merge with mine and post it again. In other words I don't want to see this patch applied as it stands. Yup, I agree with you. -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash
On Mon, Sep 10, 2007 at 11:25:43PM +0300, Muli Ben-Yehuda wrote: > On Tue, Sep 11, 2007 at 10:42:31AM -0700, Keshavamurthy, Anil S wrote: > > > Yes, I agree that pci_dev->sysdata can;t be removed. Even we (IOMMU) > > were dependent on this field but somehow this field is being > > overwritten to point to pci_bus's->sysdata and hence IOMMU was > > failing. Earlier it was overwritten to NULL and hence we were not > > failing but now it is overwritten to non-NULL and hence we fail. > > Do you know which commit caused that change? > > > My therory is that we don;t need to copy pci_bus's->sysdata to > > pci_dev's->sysdata. Below patch solves my problem. > > Any objection to below patch? > > I will give it a spin to verify it works for me, but in general I am > wary of making such changes unless we can verify (read: audit) that > they have no adverse side effects *on all architectures*. Thanks Muli for your help here. I tested on x86_64 and saw no issues. Looking at the code, pci_dev's->sysdata becomes useless if the intent here is to keep a copy of it's bus's->sysdata as the same can be obtained from pci_dev->bus->sysdata. Thanks, Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash
On Mon, Sep 10, 2007 at 03:37:48AM +1000, Paul Mackerras wrote: > Keshavamurthy, Anil S writes: > > > Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash > > > > Populating pci_bus->sysdata way early in the pci discovery phase > > sets NON-NULL value to pci_dev->sysdata which breaks the assumption > > in the Intel IOMMU driver and crashes the system. > > > > > > In the drivers/pci/probe.c, pci_dev->sysdata gets a copy of > > its pci_bus->sysdata which is not required as > > the same can be obtained from pci_dev->bus->sysdata. More over > > the left hand assignment of pci_dev->sysdata is never being used, > > Wrong. You needed to grep a bit more widely... Ah..Thanks for pointing this out. sorry I had checked only i386 and x86_64. > > > so their is no point is setting > > pci_dev->sysdata = pci_bus->sysdata; > > > > This patch removes sysdata from pci_dev struct and creates a new > > field called sys_data which is exclusively used > > by IOMMU driver to keep its per device context pointer. > > This will break powerpc, because we use the pci_dev->sysdata field to > point to a firmware device tree node. Please figure out another way > to solve your problem. Yes, I agree that pci_dev->sysdata can;t be removed. Even we (IOMMU) were dependent on this field but somehow this field is being overwritten to point to pci_bus's->sysdata and hence IOMMU was failing. Earlier it was overwritten to NULL and hence we were not failing but now it is overwritten to non-NULL and hence we fail. My therory is that we don;t need to copy pci_bus's->sysdata to pci_dev's->sysdata. Below patch solves my problem. Any objection to below patch? --- drivers/pci/hotplug/fakephp.c |1 - drivers/pci/probe.c |1 - 2 files changed, 2 deletions(-) Index: work/drivers/pci/hotplug/fakephp.c === --- work.orig/drivers/pci/hotplug/fakephp.c 2007-09-11 10:29:30.0 -0700 +++ work/drivers/pci/hotplug/fakephp.c 2007-09-11 10:35:22.0 -0700 @@ -243,7 +243,6 @@ return; dev->bus = (struct pci_bus*)bus; - dev->sysdata = bus->sysdata; for (devfn = 0; devfn < 0x100; devfn += 8) { dev->devfn = devfn; pci_rescan_slot(dev); Index: work/drivers/pci/probe.c === --- work.orig/drivers/pci/probe.c 2007-09-11 10:29:30.0 -0700 +++ work/drivers/pci/probe.c2007-09-11 10:35:22.0 -0700 @@ -994,7 +994,6 @@ return NULL; dev->bus = bus; - dev->sysdata = bus->sysdata; dev->dev.parent = bus->bridge; dev->dev.bus = _bus_type; dev->devfn = devfn; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash
On Sun, Sep 09, 2007 at 08:51:40PM +0300, Muli Ben-Yehuda wrote: > On Mon, Sep 10, 2007 at 08:43:59AM -0700, Keshavamurthy, Anil S wrote: > > On Sun, Sep 09, 2007 at 02:16:19PM +0300, Muli Ben-Yehuda wrote: > > > On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote: > > > > > > > Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash > > > > > > This patch feels like a huge hack. See below. > > > > You seem to be jumping to conclusion without going in detail. The > > pci_dev struct contains pointer to sysdata, which in turn points to > > the copy of its parent's bus sysdata. So technically speaking we > > can eliminate sysdata pointer from pci_dev struct which is what one > > portion of this patch does. > > ... provided nothing relies on this relationship or the existence of > the pci_dev's sysdata. Have you audited every architecture's use of > the sysdata pointers? > > > > > This patch removes sysdata from pci_dev struct and creates a new > > > > field called sys_data which is exclusively used by IOMMU driver to > > > > keep its per device context pointer. > > > > > > Hmpf, why is this needed? with the pci_sysdata work that recently went > > > into mainline we have a void *iommu member in pci_sysdata which should > > > be all that's needed. Please elaborate if it's not enough for your > > > needs. > > > I looked at your patch and it was not suitable because I need to > > store iommu private pointer in pci_dev > > Could you elaborate on why you need this? I'm assuming it's for the > per-device IOMMU page tables? Yes, it is for per-device IOMMU domain information which internally holds info about the page tables. > > > and not in the pci_bus. So I have added a new member sys_data in the > > pci_dev struct. I can change the name from sys_dev to iomu_priv to > > clear the confusion. Do let me know. > > Well, you should be able to just use the pci_dev's ->sysdata (that's > what it's there for after all!) but you might need to make it point to > a structure if it's shared, the same way we did with the bus's > ->sysdata. How about adding a new field as I don't care about KABI for mainline? >I agree that just having it point to the bus's ->sysdata is > not very useful *but* there may be code in the kernel that relies on > it (Calgary did until very recently...) so it would have to be audited > first. I still wonder copying bus's->sysdata to pci_dev's-> sysdata is any useful? -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash
On Sun, Sep 09, 2007 at 08:51:40PM +0300, Muli Ben-Yehuda wrote: On Mon, Sep 10, 2007 at 08:43:59AM -0700, Keshavamurthy, Anil S wrote: On Sun, Sep 09, 2007 at 02:16:19PM +0300, Muli Ben-Yehuda wrote: On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote: Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash This patch feels like a huge hack. See below. You seem to be jumping to conclusion without going in detail. The pci_dev struct contains pointer to sysdata, which in turn points to the copy of its parent's bus sysdata. So technically speaking we can eliminate sysdata pointer from pci_dev struct which is what one portion of this patch does. ... provided nothing relies on this relationship or the existence of the pci_dev's sysdata. Have you audited every architecture's use of the sysdata pointers? This patch removes sysdata from pci_dev struct and creates a new field called sys_data which is exclusively used by IOMMU driver to keep its per device context pointer. Hmpf, why is this needed? with the pci_sysdata work that recently went into mainline we have a void *iommu member in pci_sysdata which should be all that's needed. Please elaborate if it's not enough for your needs. I looked at your patch and it was not suitable because I need to store iommu private pointer in pci_dev Could you elaborate on why you need this? I'm assuming it's for the per-device IOMMU page tables? Yes, it is for per-device IOMMU domain information which internally holds info about the page tables. and not in the pci_bus. So I have added a new member sys_data in the pci_dev struct. I can change the name from sys_dev to iomu_priv to clear the confusion. Do let me know. Well, you should be able to just use the pci_dev's -sysdata (that's what it's there for after all!) but you might need to make it point to a structure if it's shared, the same way we did with the bus's -sysdata. How about adding a new field as I don't care about KABI for mainline? I agree that just having it point to the bus's -sysdata is not very useful *but* there may be code in the kernel that relies on it (Calgary did until very recently...) so it would have to be audited first. I still wonder copying bus's-sysdata to pci_dev's- sysdata is any useful? -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash
On Mon, Sep 10, 2007 at 03:37:48AM +1000, Paul Mackerras wrote: Keshavamurthy, Anil S writes: Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash Populating pci_bus-sysdata way early in the pci discovery phase sets NON-NULL value to pci_dev-sysdata which breaks the assumption in the Intel IOMMU driver and crashes the system. In the drivers/pci/probe.c, pci_dev-sysdata gets a copy of its pci_bus-sysdata which is not required as the same can be obtained from pci_dev-bus-sysdata. More over the left hand assignment of pci_dev-sysdata is never being used, Wrong. You needed to grep a bit more widely... Ah..Thanks for pointing this out. sorry I had checked only i386 and x86_64. so their is no point is setting pci_dev-sysdata = pci_bus-sysdata; This patch removes sysdata from pci_dev struct and creates a new field called sys_data which is exclusively used by IOMMU driver to keep its per device context pointer. This will break powerpc, because we use the pci_dev-sysdata field to point to a firmware device tree node. Please figure out another way to solve your problem. Yes, I agree that pci_dev-sysdata can;t be removed. Even we (IOMMU) were dependent on this field but somehow this field is being overwritten to point to pci_bus's-sysdata and hence IOMMU was failing. Earlier it was overwritten to NULL and hence we were not failing but now it is overwritten to non-NULL and hence we fail. My therory is that we don;t need to copy pci_bus's-sysdata to pci_dev's-sysdata. Below patch solves my problem. Any objection to below patch? --- drivers/pci/hotplug/fakephp.c |1 - drivers/pci/probe.c |1 - 2 files changed, 2 deletions(-) Index: work/drivers/pci/hotplug/fakephp.c === --- work.orig/drivers/pci/hotplug/fakephp.c 2007-09-11 10:29:30.0 -0700 +++ work/drivers/pci/hotplug/fakephp.c 2007-09-11 10:35:22.0 -0700 @@ -243,7 +243,6 @@ return; dev-bus = (struct pci_bus*)bus; - dev-sysdata = bus-sysdata; for (devfn = 0; devfn 0x100; devfn += 8) { dev-devfn = devfn; pci_rescan_slot(dev); Index: work/drivers/pci/probe.c === --- work.orig/drivers/pci/probe.c 2007-09-11 10:29:30.0 -0700 +++ work/drivers/pci/probe.c2007-09-11 10:35:22.0 -0700 @@ -994,7 +994,6 @@ return NULL; dev-bus = bus; - dev-sysdata = bus-sysdata; dev-dev.parent = bus-bridge; dev-dev.bus = pci_bus_type; dev-devfn = devfn; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash
On Mon, Sep 10, 2007 at 11:25:43PM +0300, Muli Ben-Yehuda wrote: On Tue, Sep 11, 2007 at 10:42:31AM -0700, Keshavamurthy, Anil S wrote: Yes, I agree that pci_dev-sysdata can;t be removed. Even we (IOMMU) were dependent on this field but somehow this field is being overwritten to point to pci_bus's-sysdata and hence IOMMU was failing. Earlier it was overwritten to NULL and hence we were not failing but now it is overwritten to non-NULL and hence we fail. Do you know which commit caused that change? My therory is that we don;t need to copy pci_bus's-sysdata to pci_dev's-sysdata. Below patch solves my problem. Any objection to below patch? I will give it a spin to verify it works for me, but in general I am wary of making such changes unless we can verify (read: audit) that they have no adverse side effects *on all architectures*. Thanks Muli for your help here. I tested on x86_64 and saw no issues. Looking at the code, pci_dev's-sysdata becomes useless if the intent here is to keep a copy of it's bus's-sysdata as the same can be obtained from pci_dev-bus-sysdata. Thanks, Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash
On Sun, Sep 09, 2007 at 02:16:19PM +0300, Muli Ben-Yehuda wrote: > On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote: > > > Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash > > This patch feels like a huge hack. See below. You seem to be jumping to conclusion without going in detail. The pci_dev struct contains pointer to sysdata, which in turn points to the copy of its parent's bus sysdata. So technically speaking we can eliminate sysdata pointer from pci_dev struct which is what one portion of this patch does. > > > This patch removes sysdata from pci_dev struct and creates a new > > field called sys_data which is exclusively used by IOMMU driver to > > keep its per device context pointer. > > Hmpf, why is this needed? with the pci_sysdata work that recently went > into mainline we have a void *iommu member in pci_sysdata which should > be all that's needed. Please elaborate if it's not enough for your > needs. I looked at your patch and it was not suitable because I need to store iommu private pointer in pci_dev and not in the pci_bus. So I have added a new member sys_data in the pci_dev struct. I can change the name from sys_dev to iomu_priv to clear the confusion. Do let me know. -Anil > > Thanks, > Muli - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][Intel-IOMMU] Fix for IOMMU early crash
On Sun, Sep 09, 2007 at 02:16:19PM +0300, Muli Ben-Yehuda wrote: On Sat, Sep 08, 2007 at 01:05:24PM -0700, Keshavamurthy, Anil S wrote: Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash This patch feels like a huge hack. See below. You seem to be jumping to conclusion without going in detail. The pci_dev struct contains pointer to sysdata, which in turn points to the copy of its parent's bus sysdata. So technically speaking we can eliminate sysdata pointer from pci_dev struct which is what one portion of this patch does. This patch removes sysdata from pci_dev struct and creates a new field called sys_data which is exclusively used by IOMMU driver to keep its per device context pointer. Hmpf, why is this needed? with the pci_sysdata work that recently went into mainline we have a void *iommu member in pci_sysdata which should be all that's needed. Please elaborate if it's not enough for your needs. I looked at your patch and it was not suitable because I need to store iommu private pointer in pci_dev and not in the pci_bus. So I have added a new member sys_data in the pci_dev struct. I can change the name from sys_dev to iomu_priv to clear the confusion. Do let me know. -Anil Thanks, Muli - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch] Fix BIOS-e820 end address
Subject: [patch] Fix BIOS-e820 end address --snip of boot message-- BIOS-provided physical RAM map: BIOS-e820: - 000a (usable) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 7fe8cc00 (usable) end snip--- As you see from above the address 0010 is both shown as reserved and usable which is confusing. This patch fixes the BIOS-e820 end address. Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- arch/i386/kernel/e820.c |2 +- arch/x86_64/kernel/e820.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) Index: work/arch/i386/kernel/e820.c === --- work.orig/arch/i386/kernel/e820.c 2007-09-08 12:00:33.0 -0700 +++ work/arch/i386/kernel/e820.c2007-09-08 13:39:12.0 -0700 @@ -753,7 +753,7 @@ for (i = 0; i < e820.nr_map; i++) { printk(" %s: %016Lx - %016Lx ", who, e820.map[i].addr, - e820.map[i].addr + e820.map[i].size); + e820.map[i].addr + e820.map[i].size - 1); switch (e820.map[i].type) { case E820_RAM: printk("(usable)\n"); break; Index: work/arch/x86_64/kernel/e820.c === --- work.orig/arch/x86_64/kernel/e820.c 2007-09-08 12:00:46.0 -0700 +++ work/arch/x86_64/kernel/e820.c 2007-09-08 13:38:57.0 -0700 @@ -368,7 +368,7 @@ for (i = 0; i < e820.nr_map; i++) { printk(KERN_INFO " %s: %016Lx - %016Lx ", who, (unsigned long long) e820.map[i].addr, - (unsigned long long) (e820.map[i].addr + e820.map[i].size)); + (unsigned long long) (e820.map[i].addr + e820.map[i].size - 1)); switch (e820.map[i].type) { case E820_RAM: printk("(usable)\n"); break; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][Intel-IOMMU] Fix for IOMMU early crash
Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash Populating pci_bus->sysdata way early in the pci discovery phase sets NON-NULL value to pci_dev->sysdata which breaks the assumption in the Intel IOMMU driver and crashes the system. In the drivers/pci/probe.c, pci_dev->sysdata gets a copy of its pci_bus->sysdata which is not required as the same can be obtained from pci_dev->bus->sysdata. More over the left hand assignment of pci_dev->sysdata is never being used, so their is no point is setting pci_dev->sysdata = pci_bus->sysdata; This patch removes sysdata from pci_dev struct and creates a new field called sys_data which is exclusively used by IOMMU driver to keep its per device context pointer. Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- drivers/pci/hotplug/fakephp.c |1 - drivers/pci/intel-iommu.c | 22 +++--- drivers/pci/probe.c |1 - include/linux/pci.h |2 +- 4 files changed, 12 insertions(+), 14 deletions(-) Index: work/drivers/pci/hotplug/fakephp.c === --- work.orig/drivers/pci/hotplug/fakephp.c 2007-09-08 12:00:20.0 -0700 +++ work/drivers/pci/hotplug/fakephp.c 2007-09-08 12:07:19.0 -0700 @@ -243,7 +243,6 @@ return; dev->bus = (struct pci_bus*)bus; - dev->sysdata = bus->sysdata; for (devfn = 0; devfn < 0x100; devfn += 8) { dev->devfn = devfn; pci_rescan_slot(dev); Index: work/drivers/pci/intel-iommu.c === --- work.orig/drivers/pci/intel-iommu.c 2007-09-08 12:00:47.0 -0700 +++ work/drivers/pci/intel-iommu.c 2007-09-08 12:08:20.0 -0700 @@ -1348,7 +1348,7 @@ list_del(>link); list_del(>global); if (info->dev) - info->dev->sysdata = NULL; + info->dev->sys_data = NULL; spin_unlock_irqrestore(_domain_lock, flags); detach_domain_for_dev(info->domain, info->bus, info->devfn); @@ -1361,7 +1361,7 @@ /* * find_domain - * Note: we use struct pci_dev->sysdata stores the info + * Note: we use struct pci_dev->sys_data stores the info */ struct dmar_domain * find_domain(struct pci_dev *pdev) @@ -1369,7 +1369,7 @@ struct device_domain_info *info; /* No lock here, assumes no domain exit in normal case */ - info = pdev->sysdata; + info = pdev->sys_data; if (info) return info->domain; return NULL; @@ -1519,7 +1519,7 @@ } list_add(>link, >devices); list_add(>global, _domain_list); - pdev->sysdata = info; + pdev->sys_data = info; spin_unlock_irqrestore(_domain_lock, flags); return domain; error: @@ -1579,7 +1579,7 @@ static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr, struct pci_dev *pdev) { - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO) return 0; return iommu_prepare_identity_map(pdev, rmrr->base_address, rmrr->end_address + 1); @@ -1595,7 +1595,7 @@ int ret; for_each_pci_dev(pdev) { - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO || + if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO || !IS_GFX_DEVICE(pdev)) continue; printk(KERN_INFO "IOMMU: gfx device %s 1-1 mapping\n", @@ -1836,7 +1836,7 @@ int prot = 0; BUG_ON(dir == DMA_NONE); - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO) return virt_to_bus(addr); domain = get_valid_domain_for_dev(pdev); @@ -1900,7 +1900,7 @@ unsigned long start_addr; struct iova *iova; - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO) return; domain = find_domain(pdev); BUG_ON(!domain); @@ -1974,7 +1974,7 @@ size_t size = 0; void *addr; - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO) return; domain = find_domain(pdev); @@ -2032,7 +2032,7 @@ unsigned long start_addr; BUG_ON(dir == DMA_NONE); - if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev->sys_data == DUMMY_DEVICE_DOMAIN_INFO) return intel_nontranslate_map_sg(hwdev, sg, nelems, dir); domain = get_valid_domain_for_dev(pdev); @@ -2234,7 +2234,7 @@ for (i = 0; i < drhd->devices_cnt; i++) { if (!drhd->devices[i]) continue; - drhd->devices[i]->sysdata =
[RFC][Intel-IOMMU] Fix for IOMMU early crash
Subject: [RFC][Intel-IOMMU] Fix for IOMMU early crash Populating pci_bus-sysdata way early in the pci discovery phase sets NON-NULL value to pci_dev-sysdata which breaks the assumption in the Intel IOMMU driver and crashes the system. In the drivers/pci/probe.c, pci_dev-sysdata gets a copy of its pci_bus-sysdata which is not required as the same can be obtained from pci_dev-bus-sysdata. More over the left hand assignment of pci_dev-sysdata is never being used, so their is no point is setting pci_dev-sysdata = pci_bus-sysdata; This patch removes sysdata from pci_dev struct and creates a new field called sys_data which is exclusively used by IOMMU driver to keep its per device context pointer. Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- drivers/pci/hotplug/fakephp.c |1 - drivers/pci/intel-iommu.c | 22 +++--- drivers/pci/probe.c |1 - include/linux/pci.h |2 +- 4 files changed, 12 insertions(+), 14 deletions(-) Index: work/drivers/pci/hotplug/fakephp.c === --- work.orig/drivers/pci/hotplug/fakephp.c 2007-09-08 12:00:20.0 -0700 +++ work/drivers/pci/hotplug/fakephp.c 2007-09-08 12:07:19.0 -0700 @@ -243,7 +243,6 @@ return; dev-bus = (struct pci_bus*)bus; - dev-sysdata = bus-sysdata; for (devfn = 0; devfn 0x100; devfn += 8) { dev-devfn = devfn; pci_rescan_slot(dev); Index: work/drivers/pci/intel-iommu.c === --- work.orig/drivers/pci/intel-iommu.c 2007-09-08 12:00:47.0 -0700 +++ work/drivers/pci/intel-iommu.c 2007-09-08 12:08:20.0 -0700 @@ -1348,7 +1348,7 @@ list_del(info-link); list_del(info-global); if (info-dev) - info-dev-sysdata = NULL; + info-dev-sys_data = NULL; spin_unlock_irqrestore(device_domain_lock, flags); detach_domain_for_dev(info-domain, info-bus, info-devfn); @@ -1361,7 +1361,7 @@ /* * find_domain - * Note: we use struct pci_dev-sysdata stores the info + * Note: we use struct pci_dev-sys_data stores the info */ struct dmar_domain * find_domain(struct pci_dev *pdev) @@ -1369,7 +1369,7 @@ struct device_domain_info *info; /* No lock here, assumes no domain exit in normal case */ - info = pdev-sysdata; + info = pdev-sys_data; if (info) return info-domain; return NULL; @@ -1519,7 +1519,7 @@ } list_add(info-link, domain-devices); list_add(info-global, device_domain_list); - pdev-sysdata = info; + pdev-sys_data = info; spin_unlock_irqrestore(device_domain_lock, flags); return domain; error: @@ -1579,7 +1579,7 @@ static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr, struct pci_dev *pdev) { - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-sys_data == DUMMY_DEVICE_DOMAIN_INFO) return 0; return iommu_prepare_identity_map(pdev, rmrr-base_address, rmrr-end_address + 1); @@ -1595,7 +1595,7 @@ int ret; for_each_pci_dev(pdev) { - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO || + if (pdev-sys_data == DUMMY_DEVICE_DOMAIN_INFO || !IS_GFX_DEVICE(pdev)) continue; printk(KERN_INFO IOMMU: gfx device %s 1-1 mapping\n, @@ -1836,7 +1836,7 @@ int prot = 0; BUG_ON(dir == DMA_NONE); - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-sys_data == DUMMY_DEVICE_DOMAIN_INFO) return virt_to_bus(addr); domain = get_valid_domain_for_dev(pdev); @@ -1900,7 +1900,7 @@ unsigned long start_addr; struct iova *iova; - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-sys_data == DUMMY_DEVICE_DOMAIN_INFO) return; domain = find_domain(pdev); BUG_ON(!domain); @@ -1974,7 +1974,7 @@ size_t size = 0; void *addr; - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-sys_data == DUMMY_DEVICE_DOMAIN_INFO) return; domain = find_domain(pdev); @@ -2032,7 +2032,7 @@ unsigned long start_addr; BUG_ON(dir == DMA_NONE); - if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO) + if (pdev-sys_data == DUMMY_DEVICE_DOMAIN_INFO) return intel_nontranslate_map_sg(hwdev, sg, nelems, dir); domain = get_valid_domain_for_dev(pdev); @@ -2234,7 +2234,7 @@ for (i = 0; i drhd-devices_cnt; i++) { if (!drhd-devices[i]) continue; - drhd-devices[i]-sysdata =
[patch] Fix BIOS-e820 end address
Subject: [patch] Fix BIOS-e820 end address --snip of boot message-- BIOS-provided physical RAM map: BIOS-e820: - 000a (usable) BIOS-e820: 000f - 0010 (reserved) BIOS-e820: 0010 - 7fe8cc00 (usable) end snip--- As you see from above the address 0010 is both shown as reserved and usable which is confusing. This patch fixes the BIOS-e820 end address. Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- arch/i386/kernel/e820.c |2 +- arch/x86_64/kernel/e820.c |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) Index: work/arch/i386/kernel/e820.c === --- work.orig/arch/i386/kernel/e820.c 2007-09-08 12:00:33.0 -0700 +++ work/arch/i386/kernel/e820.c2007-09-08 13:39:12.0 -0700 @@ -753,7 +753,7 @@ for (i = 0; i e820.nr_map; i++) { printk( %s: %016Lx - %016Lx , who, e820.map[i].addr, - e820.map[i].addr + e820.map[i].size); + e820.map[i].addr + e820.map[i].size - 1); switch (e820.map[i].type) { case E820_RAM: printk((usable)\n); break; Index: work/arch/x86_64/kernel/e820.c === --- work.orig/arch/x86_64/kernel/e820.c 2007-09-08 12:00:46.0 -0700 +++ work/arch/x86_64/kernel/e820.c 2007-09-08 13:38:57.0 -0700 @@ -368,7 +368,7 @@ for (i = 0; i e820.nr_map; i++) { printk(KERN_INFO %s: %016Lx - %016Lx , who, (unsigned long long) e820.map[i].addr, - (unsigned long long) (e820.map[i].addr + e820.map[i].size)); + (unsigned long long) (e820.map[i].addr + e820.map[i].size - 1)); switch (e820.map[i].type) { case E820_RAM: printk((usable)\n); break; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch -mm][Intel-IOMMU] Optimize sg map/unmap calls
On Wed, Aug 01, 2007 at 04:45:54PM -0700, Andrew Morton wrote: > On Wed, 1 Aug 2007 13:06:23 -0700 > "Keshavamurthy, Anil S" <[EMAIL PROTECTED]> wrote: > > > +/* Computes the padding size required, to make the > > + * the start address naturally aligned on its size > > + */ > > +static int > > +iova_get_pad_size(int size, unsigned int limit_pfn) > > +{ > > + unsigned int pad_size = 0; > > + unsigned int order = ilog2(size); > > + > > + if (order) > > + pad_size = (limit_pfn + 1) % (1 << order); > > + > > + return pad_size; > > +} > > This isn't obviously doing the right thing for non-power-of-2 inputs. > ilog2() rounds down... Andrew, The call chain to iova_get_pad_size() is like this alloc_iova()--->__alloc_iova_range()--->iova_get_pad_size(). Inside the alloc_iova() we are rounding the size to __roundup_pow_of_two(size) iff the caller of alloc_iova() request by setting size_aligned bool. And in every call to iova_get_pad_size() we check the size_aligned bool before calling iova_get_pad_size. Hence I don;t see any issues. If you want I can insert a BUG_ON() statement inside the above iova_get_pad_size() function. Please do let me know. thanks, Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [patch -mm][Intel-IOMMU] Optimize sg map/unmap calls
On Wed, Aug 01, 2007 at 04:45:54PM -0700, Andrew Morton wrote: On Wed, 1 Aug 2007 13:06:23 -0700 Keshavamurthy, Anil S [EMAIL PROTECTED] wrote: +/* Computes the padding size required, to make the + * the start address naturally aligned on its size + */ +static int +iova_get_pad_size(int size, unsigned int limit_pfn) +{ + unsigned int pad_size = 0; + unsigned int order = ilog2(size); + + if (order) + pad_size = (limit_pfn + 1) % (1 order); + + return pad_size; +} This isn't obviously doing the right thing for non-power-of-2 inputs. ilog2() rounds down... Andrew, The call chain to iova_get_pad_size() is like this alloc_iova()---__alloc_iova_range()---iova_get_pad_size(). Inside the alloc_iova() we are rounding the size to __roundup_pow_of_two(size) iff the caller of alloc_iova() request by setting size_aligned bool. And in every call to iova_get_pad_size() we check the size_aligned bool before calling iova_get_pad_size. Hence I don;t see any issues. If you want I can insert a BUG_ON() statement inside the above iova_get_pad_size() function. Please do let me know. thanks, Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[patch -mm][Intel-IOMMU] Optimize sg map/unmap calls
This patch adds PageSelectiveInvalidation support replacing existing DomainSelectiveInvalidation for intel_{map/unmap}_sg() calls and also enables to mapping one big contiguous DMA virtual address which is mapped to discontiguous physical address for SG map/unmap calls. "Doamin selective invalidations" wipes out the IOMMU address translation cache based on domain ID where as "Page selective invalidations" wipes out the IOMMU address translation cache for that address mask range which is more cache friendly when compared to Domain selective invalidations. Here is how it is done. 1) changes to iova.c alloc_iova() now takes a bool size_aligned argument, which when when set, returns the io virtual address that is naturally aligned to 2 ^ x, where x is the order of the size requested. Returning this io vitual address which is naturally aligned helps iommu to do the "page selective invalidations" which is IOMMU cache friendly over "domain selective invalidations". 2) Changes to driver/pci/intel-iommu.c Clean up intel_{map/unmap}_{single/sg} () calls so that s/g map/unamp calls is no more dependent on intel_{map/unmap}_single() intel_map_sg() now computes the total DMA virtual address required and allocates the size aligned total DMA virtual address and maps the discontiguous physical address to the allocated contiguous DMA virtual address. In the intel_unmap_sg() case since the DMA virtual address is contiguous and size_aligned, PageSelectiveInvalidation is used replacing earlier DomainSelectiveInvalidations. Andrew, Please add this patch to you mm queue. Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- drivers/pci/intel-iommu.c | 325 +- drivers/pci/iova.c| 63 +++- drivers/pci/iova.h|3 3 files changed, 231 insertions(+), 160 deletions(-) Index: work/drivers/pci/intel-iommu.c === --- work.orig/drivers/pci/intel-iommu.c 2007-08-01 11:18:12.0 -0700 +++ work/drivers/pci/intel-iommu.c 2007-08-01 11:18:31.0 -0700 @@ -665,24 +665,10 @@ non_present_entry_flush); } -static int iommu_get_alignment(u64 base, unsigned int size) -{ - int t = 0; - u64 end; - - end = base + size - 1; - while (base != end) { - t++; - base >>= 1; - end >>= 1; - } - return t; -} - static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did, u64 addr, unsigned int pages, int non_present_entry_flush) { - unsigned int align; + unsigned int mask; BUG_ON(addr & (~PAGE_MASK_4K)); BUG_ON(pages == 0); @@ -696,16 +682,13 @@ * PSI requires page size to be 2 ^ x, and the base address is naturally * aligned to the size */ - align = iommu_get_alignment(addr >> PAGE_SHIFT_4K, pages); + mask = ilog2(__roundup_pow_of_two(pages)); /* Fallback to domain selective flush if size is too big */ - if (align > cap_max_amask_val(iommu->cap)) + if (mask > cap_max_amask_val(iommu->cap)) return iommu_flush_iotlb_dsi(iommu, did, non_present_entry_flush); - addr >>= PAGE_SHIFT_4K + align; - addr <<= PAGE_SHIFT_4K + align; - - return __iommu_flush_iotlb(iommu, did, addr, align, + return __iommu_flush_iotlb(iommu, did, addr, mask, DMA_TLB_PSI_FLUSH, non_present_entry_flush); } @@ -1772,78 +1755,103 @@ } struct iova * -iommu_alloc_iova(struct dmar_domain *domain, void *host_addr, size_t size, - u64 start, u64 end) +iommu_alloc_iova(struct dmar_domain *domain, size_t size, u64 end) { - u64 start_addr; struct iova *piova; /* Make sure it's in range */ - if ((start > DOMAIN_MAX_ADDR(domain->gaw)) || end < start) - return NULL; - end = min_t(u64, DOMAIN_MAX_ADDR(domain->gaw), end); - start_addr = PAGE_ALIGN_4K(start); - size = aligned_size((u64)host_addr, size); - if (!size || (start_addr + size > end)) + if (!size || (IOVA_START_ADDR + size > end)) return NULL; piova = alloc_iova(>iovad, - size >> PAGE_SHIFT_4K, IOVA_PFN(end)); - + size >> PAGE_SHIFT_4K, IOVA_PFN(end), 1); return piova; } -static dma_addr_t __intel_map_single(struct device *dev, void *addr, - size_t size, int dir, u64 *flush_addr, unsigned int *flush_size) +static struct iova * +__intel_alloc_iova(struct device *dev, struct dmar_domain *domain, + size_t size) { - struct dmar_domain *domain; struct pci_dev *pdev = to_pci_dev(dev); - int ret; - int prot = 0; struct iova *iova = NULL; - u64 start_addr; - - addr = (void *)virt_to_phys(addr); - - domain = get_domain_for_dev(pdev, -
[patch -mm][Intel-IOMMU] Optimize sg map/unmap calls
This patch adds PageSelectiveInvalidation support replacing existing DomainSelectiveInvalidation for intel_{map/unmap}_sg() calls and also enables to mapping one big contiguous DMA virtual address which is mapped to discontiguous physical address for SG map/unmap calls. Doamin selective invalidations wipes out the IOMMU address translation cache based on domain ID where as Page selective invalidations wipes out the IOMMU address translation cache for that address mask range which is more cache friendly when compared to Domain selective invalidations. Here is how it is done. 1) changes to iova.c alloc_iova() now takes a bool size_aligned argument, which when when set, returns the io virtual address that is naturally aligned to 2 ^ x, where x is the order of the size requested. Returning this io vitual address which is naturally aligned helps iommu to do the page selective invalidations which is IOMMU cache friendly over domain selective invalidations. 2) Changes to driver/pci/intel-iommu.c Clean up intel_{map/unmap}_{single/sg} () calls so that s/g map/unamp calls is no more dependent on intel_{map/unmap}_single() intel_map_sg() now computes the total DMA virtual address required and allocates the size aligned total DMA virtual address and maps the discontiguous physical address to the allocated contiguous DMA virtual address. In the intel_unmap_sg() case since the DMA virtual address is contiguous and size_aligned, PageSelectiveInvalidation is used replacing earlier DomainSelectiveInvalidations. Andrew, Please add this patch to you mm queue. Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- drivers/pci/intel-iommu.c | 325 +- drivers/pci/iova.c| 63 +++- drivers/pci/iova.h|3 3 files changed, 231 insertions(+), 160 deletions(-) Index: work/drivers/pci/intel-iommu.c === --- work.orig/drivers/pci/intel-iommu.c 2007-08-01 11:18:12.0 -0700 +++ work/drivers/pci/intel-iommu.c 2007-08-01 11:18:31.0 -0700 @@ -665,24 +665,10 @@ non_present_entry_flush); } -static int iommu_get_alignment(u64 base, unsigned int size) -{ - int t = 0; - u64 end; - - end = base + size - 1; - while (base != end) { - t++; - base = 1; - end = 1; - } - return t; -} - static int iommu_flush_iotlb_psi(struct intel_iommu *iommu, u16 did, u64 addr, unsigned int pages, int non_present_entry_flush) { - unsigned int align; + unsigned int mask; BUG_ON(addr (~PAGE_MASK_4K)); BUG_ON(pages == 0); @@ -696,16 +682,13 @@ * PSI requires page size to be 2 ^ x, and the base address is naturally * aligned to the size */ - align = iommu_get_alignment(addr PAGE_SHIFT_4K, pages); + mask = ilog2(__roundup_pow_of_two(pages)); /* Fallback to domain selective flush if size is too big */ - if (align cap_max_amask_val(iommu-cap)) + if (mask cap_max_amask_val(iommu-cap)) return iommu_flush_iotlb_dsi(iommu, did, non_present_entry_flush); - addr = PAGE_SHIFT_4K + align; - addr = PAGE_SHIFT_4K + align; - - return __iommu_flush_iotlb(iommu, did, addr, align, + return __iommu_flush_iotlb(iommu, did, addr, mask, DMA_TLB_PSI_FLUSH, non_present_entry_flush); } @@ -1772,78 +1755,103 @@ } struct iova * -iommu_alloc_iova(struct dmar_domain *domain, void *host_addr, size_t size, - u64 start, u64 end) +iommu_alloc_iova(struct dmar_domain *domain, size_t size, u64 end) { - u64 start_addr; struct iova *piova; /* Make sure it's in range */ - if ((start DOMAIN_MAX_ADDR(domain-gaw)) || end start) - return NULL; - end = min_t(u64, DOMAIN_MAX_ADDR(domain-gaw), end); - start_addr = PAGE_ALIGN_4K(start); - size = aligned_size((u64)host_addr, size); - if (!size || (start_addr + size end)) + if (!size || (IOVA_START_ADDR + size end)) return NULL; piova = alloc_iova(domain-iovad, - size PAGE_SHIFT_4K, IOVA_PFN(end)); - + size PAGE_SHIFT_4K, IOVA_PFN(end), 1); return piova; } -static dma_addr_t __intel_map_single(struct device *dev, void *addr, - size_t size, int dir, u64 *flush_addr, unsigned int *flush_size) +static struct iova * +__intel_alloc_iova(struct device *dev, struct dmar_domain *domain, + size_t size) { - struct dmar_domain *domain; struct pci_dev *pdev = to_pci_dev(dev); - int ret; - int prot = 0; struct iova *iova = NULL; - u64 start_addr; - - addr = (void *)virt_to_phys(addr); - - domain = get_domain_for_dev(pdev, -
RE: OOPS at dmar_table_init (2.6.22-rc6-mm1 kernel)
(Sorry for the top post) This bug is fixed and the fix should appear in the next MM release. See this link for the patch http://marc.info/?l=linux-kernel=118313130109808=2 -Anil -Original Message- From: Randy Dunlap [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 10, 2007 9:50 AM To: Pavel Emelianov Cc: Raj, Ashok; Li, Shaohua; Keshavamurthy, Anil S; Linux Kernel Mailing List; Andrew Morton Subject: Re: OOPS at dmar_table_init (2.6.22-rc6-mm1 kernel) On Tue, 10 Jul 2007 17:57:25 +0400 Pavel Emelianov wrote: > Hi. > > While working with Andrew's kernel I faced an OOPS on x86_64 > machine. Unfortunately kernel logs are not appearing in serial > console by the time oops happens, but I have some info on the > screen: > > OOPs is at dmar_table_init() here: > 80532873 : > 80532873: 41 55 push %r13 > 80532875: 41 54 push %r12 > 80532877: 55 push %rbp > 80532878: 53 push %rbx > 80532879: 51 push %rcx > 8053287a: 4c 8b 2d 07 21 01 00mov 73991(%rip),%r13# 80544988 > 80532881: 41 0f b6 45 24 movzbl 0x24(%r13),%eax <<<<<<<<<<< OOPS, %r13 = NULL > 80532886: 84 c0 test %al,%al > 80532888: 75 11 jne 8053289b > 8053288a: 48 c7 c7 b0 3d 45 80mov $0x80453db0,%rdi > 80532891: e8 c2 4a cf ff callq 80227358 > 80532896: e9 f2 01 00 00 jmpq 80532a8d > > Looks like dmar_tbl is NULL. > BUG is 100% reproducible. Is there any other info that can be useful? You could test the patch from http://lkml.org/lkml/2007/6/29/174 (see "Get diff 1"). --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: OOPS at dmar_table_init (2.6.22-rc6-mm1 kernel)
(Sorry for the top post) This bug is fixed and the fix should appear in the next MM release. See this link for the patch http://marc.info/?l=linux-kernelm=118313130109808w=2 -Anil -Original Message- From: Randy Dunlap [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 10, 2007 9:50 AM To: Pavel Emelianov Cc: Raj, Ashok; Li, Shaohua; Keshavamurthy, Anil S; Linux Kernel Mailing List; Andrew Morton Subject: Re: OOPS at dmar_table_init (2.6.22-rc6-mm1 kernel) On Tue, 10 Jul 2007 17:57:25 +0400 Pavel Emelianov wrote: Hi. While working with Andrew's kernel I faced an OOPS on x86_64 machine. Unfortunately kernel logs are not appearing in serial console by the time oops happens, but I have some info on the screen: OOPs is at dmar_table_init() here: 80532873 dmar_table_init: 80532873: 41 55 push %r13 80532875: 41 54 push %r12 80532877: 55 push %rbp 80532878: 53 push %rbx 80532879: 51 push %rcx 8053287a: 4c 8b 2d 07 21 01 00mov 73991(%rip),%r13# 80544988 dmar_tbl 80532881: 41 0f b6 45 24 movzbl 0x24(%r13),%eax OOPS, %r13 = NULL 80532886: 84 c0 test %al,%al 80532888: 75 11 jne 8053289b dmar_table_init+0x28 8053288a: 48 c7 c7 b0 3d 45 80mov $0x80453db0,%rdi 80532891: e8 c2 4a cf ff callq 80227358 printk 80532896: e9 f2 01 00 00 jmpq 80532a8d dmar_table_init+0x21a Looks like dmar_tbl is NULL. BUG is 100% reproducible. Is there any other info that can be useful? You could test the patch from http://lkml.org/lkml/2007/6/29/174 (see Get diff 1). --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [-mm patch] fix duplicate CONFIG_DMAR Makefile line
On Sun, Jul 01, 2007 at 10:22:43PM +0200, Adrian Bunk wrote: > On Thu, Jun 28, 2007 at 03:43:21AM -0700, Andrew Morton wrote: > >... > > Changes since 2.6.22-rc4-mm2: > >... > > +intel-iommu-intel-iommu-driver.patch > >... > > Intel IOMMU support > >... > > > Contrary to popular belief, two identical Makefile lines don't bring any > advantages. ;-) :). Thanks for the fix. > > Signed-off-by: Adrian Bunk <[EMAIL PROTECTED]> Acked-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> > > --- > --- linux-2.6.22-rc6-mm1/drivers/pci/Makefile.old 2007-06-30 > 03:33:54.0 +0200 > +++ linux-2.6.22-rc6-mm1/drivers/pci/Makefile 2007-06-30 03:34:04.0 > +0200 > @@ -23,9 +23,6 @@ > # Build Intel IOMMU support > obj-$(CONFIG_DMAR) += dmar.o iova.o intel-iommu.o > > -#Build Intel-IOMMU support > -obj-$(CONFIG_DMAR) += iova.o dmar.o intel-iommu.o > - > # > # Some architectures use the generic PCI setup functions > # - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [-mm patch] fix duplicate CONFIG_DMAR Makefile line
On Sun, Jul 01, 2007 at 10:22:43PM +0200, Adrian Bunk wrote: On Thu, Jun 28, 2007 at 03:43:21AM -0700, Andrew Morton wrote: ... Changes since 2.6.22-rc4-mm2: ... +intel-iommu-intel-iommu-driver.patch ... Intel IOMMU support ... Contrary to popular belief, two identical Makefile lines don't bring any advantages. ;-) :). Thanks for the fix. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] Acked-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- --- linux-2.6.22-rc6-mm1/drivers/pci/Makefile.old 2007-06-30 03:33:54.0 +0200 +++ linux-2.6.22-rc6-mm1/drivers/pci/Makefile 2007-06-30 03:34:04.0 +0200 @@ -23,9 +23,6 @@ # Build Intel IOMMU support obj-$(CONFIG_DMAR) += dmar.o iova.o intel-iommu.o -#Build Intel-IOMMU support -obj-$(CONFIG_DMAR) += iova.o dmar.o intel-iommu.o - # # Some architectures use the generic PCI setup functions # - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64
On Fri, Jun 29, 2007 at 12:23:43PM -0400, Muli Ben-Yehuda wrote: > On Fri, Jun 29, 2007 at 08:28:58AM -0700, Keshavamurthy, Anil S wrote: > > > +++ linux-2.6.22-rc4-mm2/drivers/pci/dmar.c 2007-06-29 07:46:25.0 > > -0700 > > @@ -260,6 +260,8 @@ > > int ret = 0; > > > > dmar = (struct acpi_table_dmar *)dmar_tbl; > > + if (!dmar) > > + return -ENODEV; > > > > if (!dmar->width) { > > printk (KERN_WARNING PREFIX "Zero: Invalid DMAR haw\n"); > > @@ -301,7 +303,7 @@ > > > > parse_dmar_table(); > > if (list_empty(_drhd_units)) { > > - printk(KERN_ERR PREFIX "No DMAR devices found\n"); > > + printk(KERN_INFO PREFIX "No DMAR devices found\n"); > > return -ENODEV; > > } > > return 0; > > The convention is to print a KERN_DEBUG message if hardware is not > found when probing it, otherwise the boot messages become cluttered > with lots of "$FOO not found". Since this is IOMMU is built into the kernel and it is good idea to report that the device is not present. The above is printed only once and is consistent with other IOMMU implementation. Atleast it is useful when people report bugs we can makeout whether IOMMU is being detected or not. Here is what I see on my box. [..] "PCI-GART: No AMD northbridge found." [..] Calgary: detecting Calgary via BIOS EBDA area Calgary: Unable to locate Rio Grande table in EBDA - bailing! [..] As you can see I don;t have either GART or Calgary on my box. -Thanks, Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64
On Thu, Jun 28, 2007 at 06:14:27PM -0700, Li, Shaohua wrote: > > > ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22- > >rc6/2.6.22-rc6-mm1/ > >> > >>> +intel-iommu-dmar-detection-and-parsing-logic.patch [..] > > > >I took a picture of it, looks like the backtrace is: > > > >NULL pointer dereference at 024 > >EIP:dmar_table_init+0x11 > >intel_iommu_init+0x30 > >pci_iommu_init+0xe > >kernel_init+0x16e > > > >Presumably something is NULL in dmar_table_init that wasn't expected to > >be.. I would guess it likely crashes on any system without an Intel > >IOMMU in it. Yup, that is correct. > How about something like below? > > > int __init dmar_table_init(void) > { > + if (!dmar_tbl) > + return -ENODEV; > parse_dmar_table(); why not check for NULL in the function where it touched? Also when there are no DMAR devices we need the below printk on the console. > if (list_empty(_drhd_units)) { > printk(KERN_ERR PREFIX "No DMAR devices found\n"); > return -ENODEV; > } > return 0; > } Here is the revised patch of the above. Andrew, please add this fix to +intel-iommu-dmar-detection-and-parsing-logic.patch Check for dmar_tbl pointer as this can be NULL on systems with no Intel VT-d support. Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- drivers/pci/dmar.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: linux-2.6.22-rc4-mm2/drivers/pci/dmar.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/dmar.c2007-06-29 07:43:43.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/dmar.c 2007-06-29 07:46:25.0 -0700 @@ -260,6 +260,8 @@ int ret = 0; dmar = (struct acpi_table_dmar *)dmar_tbl; + if (!dmar) + return -ENODEV; if (!dmar->width) { printk (KERN_WARNING PREFIX "Zero: Invalid DMAR haw\n"); @@ -301,7 +303,7 @@ parse_dmar_table(); if (list_empty(_drhd_units)) { - printk(KERN_ERR PREFIX "No DMAR devices found\n"); + printk(KERN_INFO PREFIX "No DMAR devices found\n"); return -ENODEV; } return 0; - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64
On Thu, Jun 28, 2007 at 06:14:27PM -0700, Li, Shaohua wrote: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22- rc6/2.6.22-rc6-mm1/ +intel-iommu-dmar-detection-and-parsing-logic.patch [..] I took a picture of it, looks like the backtrace is: NULL pointer dereference at 024 EIP:dmar_table_init+0x11 intel_iommu_init+0x30 pci_iommu_init+0xe kernel_init+0x16e Presumably something is NULL in dmar_table_init that wasn't expected to be.. I would guess it likely crashes on any system without an Intel IOMMU in it. Yup, that is correct. How about something like below? int __init dmar_table_init(void) { + if (!dmar_tbl) + return -ENODEV; parse_dmar_table(); why not check for NULL in the function where it touched? Also when there are no DMAR devices we need the below printk on the console. if (list_empty(dmar_drhd_units)) { printk(KERN_ERR PREFIX No DMAR devices found\n); return -ENODEV; } return 0; } Here is the revised patch of the above. Andrew, please add this fix to +intel-iommu-dmar-detection-and-parsing-logic.patch Check for dmar_tbl pointer as this can be NULL on systems with no Intel VT-d support. Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- drivers/pci/dmar.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) Index: linux-2.6.22-rc4-mm2/drivers/pci/dmar.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/dmar.c2007-06-29 07:43:43.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/dmar.c 2007-06-29 07:46:25.0 -0700 @@ -260,6 +260,8 @@ int ret = 0; dmar = (struct acpi_table_dmar *)dmar_tbl; + if (!dmar) + return -ENODEV; if (!dmar-width) { printk (KERN_WARNING PREFIX Zero: Invalid DMAR haw\n); @@ -301,7 +303,7 @@ parse_dmar_table(); if (list_empty(dmar_drhd_units)) { - printk(KERN_ERR PREFIX No DMAR devices found\n); + printk(KERN_INFO PREFIX No DMAR devices found\n); return -ENODEV; } return 0; - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: 2.6.22-rc6-mm1 Intel DMAR crash on AMD x86_64
On Fri, Jun 29, 2007 at 12:23:43PM -0400, Muli Ben-Yehuda wrote: On Fri, Jun 29, 2007 at 08:28:58AM -0700, Keshavamurthy, Anil S wrote: +++ linux-2.6.22-rc4-mm2/drivers/pci/dmar.c 2007-06-29 07:46:25.0 -0700 @@ -260,6 +260,8 @@ int ret = 0; dmar = (struct acpi_table_dmar *)dmar_tbl; + if (!dmar) + return -ENODEV; if (!dmar-width) { printk (KERN_WARNING PREFIX Zero: Invalid DMAR haw\n); @@ -301,7 +303,7 @@ parse_dmar_table(); if (list_empty(dmar_drhd_units)) { - printk(KERN_ERR PREFIX No DMAR devices found\n); + printk(KERN_INFO PREFIX No DMAR devices found\n); return -ENODEV; } return 0; The convention is to print a KERN_DEBUG message if hardware is not found when probing it, otherwise the boot messages become cluttered with lots of $FOO not found. Since this is IOMMU is built into the kernel and it is good idea to report that the device is not present. The above is printed only once and is consistent with other IOMMU implementation. Atleast it is useful when people report bugs we can makeout whether IOMMU is being detected or not. Here is what I see on my box. [..] PCI-GART: No AMD northbridge found. [..] Calgary: detecting Calgary via BIOS EBDA area Calgary: Unable to locate Rio Grande table in EBDA - bailing! [..] As you can see I don;t have either GART or Calgary on my box. -Thanks, Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 10/10] Iommu floppy workaround
On Tue, Jun 26, 2007 at 12:37:55PM +0200, Andi Kleen wrote: > > > > Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig > > > === > > > --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig 2007-06-18 > > > 15:45:08.0 -0700 > > > +++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig 2007-06-18 > > > 15:45:09.0 -0700 > > > @@ -752,6 +752,16 @@ > > >all the OS visible memory. Hence the driver can continue > > >to use physical addresses for DMA. > > > > > > +config DMAR_FLPY_WA > > > > FLOPPY is spelled "FLOPPY"! > > Also this shouldn't be a user visible config. The floppy driver should just > do this transparently when loaded and undo when unloaded. Yup, I agree. Here goes the patch to make it user invisible config option. Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- arch/x86_64/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig 2007-06-26 12:04:42.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-26 12:06:01.0 -0700 @@ -753,7 +753,7 @@ to use physical addresses for DMA. config DMAR_FLOPPY_WA - bool "Support for Floppy disk workaround" + bool depends on DMAR default y help - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 05/10] Intel IOMMU driver
On Mon, Jun 25, 2007 at 11:25:11PM -0700, Andrew Morton wrote: > On Tue, 19 Jun 2007 14:37:06 -0700 "Keshavamurthy, Anil S" <[EMAIL > PROTECTED]> wrote: > > > None of these actually _need_ to be macros and it would be better to > implement them in C. That way things are more self-documenting, more > pleasant to read, more likely to get commented and you'll fix the > two bugs wherein the argument to a macro is evaluated more than once. Agree. Will send a patch when I get back from OLS as any changes to this needs thorugh testing. -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 05/10] Intel IOMMU driver
On Mon, Jun 25, 2007 at 11:32:49PM -0700, Andrew Morton wrote: > On Tue, 19 Jun 2007 16:32:23 -0700 (PDT) Christoph Lameter <[EMAIL > PROTECTED]> wrote: > > > On Tue, 19 Jun 2007, Keshavamurthy, Anil S wrote: > > > > > +static inline void *alloc_pgtable_page(void) > > > +{ > > > + return (void *)get_zeroed_page(GFP_ATOMIC); > > > +} > > > > Need to pass gfp_t parameter. Repeates a couple of times. > > ... > > Is it not possible here to drop the lock and do the alloc with GFP_KERNEL > > and deal with the resulting race? That is done in other parts of the > > kernel. > > ... > > This may be able to become a GFP_KERNEL alloc since interrupts are enabled > > at this point? > > ... > > GFP_KERNEL alloc possible? > > > > Yeah, if there are any callsites at all at which we know that we can > perform a sleeping allocation, Christoph's suggestions should be adopted. > Because even a bare GFP_NOIO is heaps more robust than GFP_ATOMIC, and it > will also reload the free-pages reserves, making subsequent GFP_ATOMIC > allocations more likely to succeed. Yup, will do as part of making this code work for IA64, which is my next item in my todo list. -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 10/10] Iommu floppy workaround
On Mon, Jun 25, 2007 at 11:42:22PM -0700, Andrew Morton wrote: > On Tue, 19 Jun 2007 14:37:11 -0700 "Keshavamurthy, Anil S" <[EMAIL > PROTECTED]> wrote: > > Bit weird that this was implemented in the header like that. Sorry, it is my mistake as I understood thus from your previous code review comment. > > How about this? (Also contains rather a lot of obvious style fixes) Yup, looks good. -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 04/10] IOVA allocation and management routines
On Mon, Jun 25, 2007 at 11:07:47PM -0700, Andrew Morton wrote: > On Tue, 19 Jun 2007 14:37:05 -0700 "Keshavamurthy, Anil S" <[EMAIL > PROTECTED]> wrote: > > All the inlines in this code are pretty pointless: all those functions have > a single callsite so the compiler inlines them anyway. If we later add > more callsites for these functions, they're too big to be inlined. > > inline is usually wrong: don't do it! Yup, I agree and will follow in future. > > + > > +/** > > + * find_iova - find's an iova for a given pfn > > + * @iovad - iova domain in question. > > + * pfn - page frame number > > + * This function finds and returns an iova belonging to the > > + * given doamin which matches the given pfn. > > + */ > > +struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn) > > +{ > > + unsigned long flags; > > + struct rb_node *node; > > + > > + spin_lock_irqsave(>iova_rbtree_lock, flags); > > + node = iovad->rbroot.rb_node; > > + while (node) { > > + struct iova *iova = container_of(node, struct iova, node); > > + > > + /* If pfn falls within iova's range, return iova */ > > + if ((pfn >= iova->pfn_lo) && (pfn <= iova->pfn_hi)) { > > + spin_unlock_irqrestore(>iova_rbtree_lock, flags); > > + return iova; > > + } > > + > > + if (pfn < iova->pfn_lo) > > + node = node->rb_left; > > + else if (pfn > iova->pfn_lo) > > + node = node->rb_right; > > + } > > + > > + spin_unlock_irqrestore(>iova_rbtree_lock, flags); > > + return NULL; > > +} > > So we take the lock, look up an item, then drop the lock then return the > item we just found. We took no refcount on it and we didn't do anything to > keep this object alive. > > Is that a bug, or does the (afacit undocumented) lifecycle management of > these things take care of it in some manner? If yes, please reply via an > add-a-comment patch. Nope, this is not a bug. Adding a comment patch which explains the same. > > > > +/** > > + * __free_iova - frees the given iova > > + * @iovad: iova domain in question. > > + * @iova: iova in question. > > + * Frees the given iova belonging to the giving domain > > + */ > > +void > > +__free_iova(struct iova_domain *iovad, struct iova *iova) > > +{ > > + unsigned long flags; > > + > > + if (iova) { > > + spin_lock_irqsave(>iova_rbtree_lock, flags); > > + __cached_rbnode_delete_update(iovad, iova); > > + rb_erase(>node, >rbroot); > > + spin_unlock_irqrestore(>iova_rbtree_lock, flags); > > + free_iova_mem(iova); > > + } > > +} > > Can this really be called with NULL? If so, under what circumstances? > (This reader couldn't work it out from a brief look at the code, so perhaps > others will not be able to either. Perhaps a comment is needed) It was getting called from only one place free_iova(). Below patch address your concern. > > > +/** > > + * free_iova - finds and frees the iova for a given pfn > > + * @iovad: - iova domain in question. > > + * @pfn: - pfn that is allocated previously > > + * This functions finds an iova for a given pfn and then > > + * frees the iova from that domain. > > + */ > > +void > > +free_iova(struct iova_domain *iovad, unsigned long pfn) > > +{ > > + struct iova *iova = find_iova(iovad, pfn); > > + __free_iova(iovad, iova); > > + > > +} > > + > > +/** > > + * put_iova_domain - destroys the iova doamin > > + * @iovad: - iova domain in question. > > + * All the iova's in that domain are destroyed. > > + */ > > +void put_iova_domain(struct iova_domain *iovad) > > +{ > > + struct rb_node *node; > > + unsigned long flags; > > + > > + spin_lock_irqsave(>iova_rbtree_lock, flags); > > + node = rb_first(>rbroot); > > + while (node) { > > + struct iova *iova = container_of(node, struct iova, node); > > + rb_erase(node, >rbroot); > > + free_iova_mem(iova); > > + node = rb_first(>rbroot); > > + } > > + spin_unlock_irqrestore(>iova_rbtree_lock, flags); > > +} > > Right, so I suspect what's happening here is that all iova's remain valid > until their entire domain is destroyed, yes? Nope. IOVA are valid only for the duration of DMA
Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2
On Tue, Jun 26, 2007 at 11:11:25AM -0400, Muli Ben-Yehuda wrote: > On Tue, Jun 26, 2007 at 08:03:59AM -0700, Arjan van de Ven wrote: > > Muli Ben-Yehuda wrote: > > >How much? we have numbers (to be presented at OLS later this week) > > >that show that on bare-metal an IOMMU can cost as much as 15%-30% more > > >CPU utilization for an IO intensive workload (netperf). It will be > > >interesting to see comparable numbers for VT-d. > > > > for VT-d it is a LOT less. I'll let anil give you his data :) > > Looking forward to it. Note that this is on a large SMP machine with > Gigabit ethernet, with netperf TCP stream. Comparing numbers for other > benchmarks on other machines is ... less than useful, but the numbers > themeselves are interesting. Our initial benchmark results showed we had around 3% extra CPU utilization overhead when compared to native(i.e without IOMMU). Again, our benchmark was on small SMP machine and we used iperf and a 1G ethernet cards. Going forward we will do more benchmark tests and will share the results. -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 02/10] PCI generic helper function
On Mon, Jun 25, 2007 at 10:49:37PM -0700, Andrew Morton wrote: > On Tue, 19 Jun 2007 14:37:03 -0700 "Keshavamurthy, Anil S" <[EMAIL > PROTECTED]> wrote: > > > +struct pci_dev * > > +pci_find_upstream_pcie_bridge(struct pci_dev *pdev) > > You didn't need a newline there, but that's what the rest of that file > does. Hu hum. > > > +{ > > + struct pci_dev *tmp = NULL; > > + > > + if (pdev->is_pcie) > > + return NULL; > > + while (1) { > > + if (!pdev->bus->self) > > + break; > > + pdev = pdev->bus->self; > > + /* a p2p bridge */ > > + if (!pdev->is_pcie) { > > + tmp = pdev; > > + continue; > > + } > > + /* PCI device should connect to a PCIE bridge */ > > + BUG_ON(pdev->pcie_type != PCI_EXP_TYPE_PCI_BRIDGE); > > I assume that if this bug triggers, we've found some broken hardware? > > Going BUG seems like a pretty rude reaction to this, especially when it > would be so easy to drop a warning and then recover. > > > How's about this? Looks good, thanks. > > --- a/drivers/pci/search.c~intel-iommu-pci-generic-helper-function-fix > +++ a/drivers/pci/search.c > @@ -38,7 +38,11 @@ pci_find_upstream_pcie_bridge(struct pci > continue; > } > /* PCI device should connect to a PCIE bridge */ > - BUG_ON(pdev->pcie_type != PCI_EXP_TYPE_PCI_BRIDGE); > + if (pdev->pcie_type != PCI_EXP_TYPE_PCI_BRIDGE) { > + /* Busted hardware? */ > + WARN_ON_ONCE(1); > + return NULL; > + } > return pdev; > } > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 02/10] PCI generic helper function
On Mon, Jun 25, 2007 at 10:49:37PM -0700, Andrew Morton wrote: On Tue, 19 Jun 2007 14:37:03 -0700 Keshavamurthy, Anil S [EMAIL PROTECTED] wrote: +struct pci_dev * +pci_find_upstream_pcie_bridge(struct pci_dev *pdev) You didn't need a newline there, but that's what the rest of that file does. Hu hum. +{ + struct pci_dev *tmp = NULL; + + if (pdev-is_pcie) + return NULL; + while (1) { + if (!pdev-bus-self) + break; + pdev = pdev-bus-self; + /* a p2p bridge */ + if (!pdev-is_pcie) { + tmp = pdev; + continue; + } + /* PCI device should connect to a PCIE bridge */ + BUG_ON(pdev-pcie_type != PCI_EXP_TYPE_PCI_BRIDGE); I assume that if this bug triggers, we've found some broken hardware? Going BUG seems like a pretty rude reaction to this, especially when it would be so easy to drop a warning and then recover. How's about this? Looks good, thanks. --- a/drivers/pci/search.c~intel-iommu-pci-generic-helper-function-fix +++ a/drivers/pci/search.c @@ -38,7 +38,11 @@ pci_find_upstream_pcie_bridge(struct pci continue; } /* PCI device should connect to a PCIE bridge */ - BUG_ON(pdev-pcie_type != PCI_EXP_TYPE_PCI_BRIDGE); + if (pdev-pcie_type != PCI_EXP_TYPE_PCI_BRIDGE) { + /* Busted hardware? */ + WARN_ON_ONCE(1); + return NULL; + } return pdev; } - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 00/10] Intel IOMMU support, take #2
On Tue, Jun 26, 2007 at 11:11:25AM -0400, Muli Ben-Yehuda wrote: On Tue, Jun 26, 2007 at 08:03:59AM -0700, Arjan van de Ven wrote: Muli Ben-Yehuda wrote: How much? we have numbers (to be presented at OLS later this week) that show that on bare-metal an IOMMU can cost as much as 15%-30% more CPU utilization for an IO intensive workload (netperf). It will be interesting to see comparable numbers for VT-d. for VT-d it is a LOT less. I'll let anil give you his data :) Looking forward to it. Note that this is on a large SMP machine with Gigabit ethernet, with netperf TCP stream. Comparing numbers for other benchmarks on other machines is ... less than useful, but the numbers themeselves are interesting. Our initial benchmark results showed we had around 3% extra CPU utilization overhead when compared to native(i.e without IOMMU). Again, our benchmark was on small SMP machine and we used iperf and a 1G ethernet cards. Going forward we will do more benchmark tests and will share the results. -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 04/10] IOVA allocation and management routines
On Mon, Jun 25, 2007 at 11:07:47PM -0700, Andrew Morton wrote: On Tue, 19 Jun 2007 14:37:05 -0700 Keshavamurthy, Anil S [EMAIL PROTECTED] wrote: All the inlines in this code are pretty pointless: all those functions have a single callsite so the compiler inlines them anyway. If we later add more callsites for these functions, they're too big to be inlined. inline is usually wrong: don't do it! Yup, I agree and will follow in future. + +/** + * find_iova - find's an iova for a given pfn + * @iovad - iova domain in question. + * pfn - page frame number + * This function finds and returns an iova belonging to the + * given doamin which matches the given pfn. + */ +struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn) +{ + unsigned long flags; + struct rb_node *node; + + spin_lock_irqsave(iovad-iova_rbtree_lock, flags); + node = iovad-rbroot.rb_node; + while (node) { + struct iova *iova = container_of(node, struct iova, node); + + /* If pfn falls within iova's range, return iova */ + if ((pfn = iova-pfn_lo) (pfn = iova-pfn_hi)) { + spin_unlock_irqrestore(iovad-iova_rbtree_lock, flags); + return iova; + } + + if (pfn iova-pfn_lo) + node = node-rb_left; + else if (pfn iova-pfn_lo) + node = node-rb_right; + } + + spin_unlock_irqrestore(iovad-iova_rbtree_lock, flags); + return NULL; +} So we take the lock, look up an item, then drop the lock then return the item we just found. We took no refcount on it and we didn't do anything to keep this object alive. Is that a bug, or does the (afacit undocumented) lifecycle management of these things take care of it in some manner? If yes, please reply via an add-a-comment patch. Nope, this is not a bug. Adding a comment patch which explains the same. +/** + * __free_iova - frees the given iova + * @iovad: iova domain in question. + * @iova: iova in question. + * Frees the given iova belonging to the giving domain + */ +void +__free_iova(struct iova_domain *iovad, struct iova *iova) +{ + unsigned long flags; + + if (iova) { + spin_lock_irqsave(iovad-iova_rbtree_lock, flags); + __cached_rbnode_delete_update(iovad, iova); + rb_erase(iova-node, iovad-rbroot); + spin_unlock_irqrestore(iovad-iova_rbtree_lock, flags); + free_iova_mem(iova); + } +} Can this really be called with NULL? If so, under what circumstances? (This reader couldn't work it out from a brief look at the code, so perhaps others will not be able to either. Perhaps a comment is needed) It was getting called from only one place free_iova(). Below patch address your concern. +/** + * free_iova - finds and frees the iova for a given pfn + * @iovad: - iova domain in question. + * @pfn: - pfn that is allocated previously + * This functions finds an iova for a given pfn and then + * frees the iova from that domain. + */ +void +free_iova(struct iova_domain *iovad, unsigned long pfn) +{ + struct iova *iova = find_iova(iovad, pfn); + __free_iova(iovad, iova); + +} + +/** + * put_iova_domain - destroys the iova doamin + * @iovad: - iova domain in question. + * All the iova's in that domain are destroyed. + */ +void put_iova_domain(struct iova_domain *iovad) +{ + struct rb_node *node; + unsigned long flags; + + spin_lock_irqsave(iovad-iova_rbtree_lock, flags); + node = rb_first(iovad-rbroot); + while (node) { + struct iova *iova = container_of(node, struct iova, node); + rb_erase(node, iovad-rbroot); + free_iova_mem(iova); + node = rb_first(iovad-rbroot); + } + spin_unlock_irqrestore(iovad-iova_rbtree_lock, flags); +} Right, so I suspect what's happening here is that all iova's remain valid until their entire domain is destroyed, yes? Nope. IOVA are valid only for the duration of DMA MAP and DMA UNMAP calls. In case of Intel-iommu driver, the iova's are valid only for the duration of __intel_map_singl() and __intel_unmap_single() calls. What is the upper bound to the memory consumpotion here, and what provides it? As explained above, iova are freed and reused again during the DMA map calls. Again, some code comments about these design issues are appropriate. +/* + * We need a fixed PAGE_SIZE of 4K irrespective of + * arch PAGE_SIZE for IOMMU page tables. + */ +#define PAGE_SHIFT_4K (12) +#define PAGE_SIZE_4K (1UL PAGE_SHIFT_4K) +#define PAGE_MASK_4K (((u64)-1) PAGE_SHIFT_4K) +#define PAGE_ALIGN_4K(addr)(((addr) + PAGE_SIZE_4K - 1) PAGE_MASK_4K) Am still wondering why we cannot use PAGE_SIZE, PAGE_SHIFT, etc here. VT-d hardware(a.k.a Intel IOMMU
Re: [Intel IOMMU 10/10] Iommu floppy workaround
On Mon, Jun 25, 2007 at 11:42:22PM -0700, Andrew Morton wrote: On Tue, 19 Jun 2007 14:37:11 -0700 Keshavamurthy, Anil S [EMAIL PROTECTED] wrote: Bit weird that this was implemented in the header like that. Sorry, it is my mistake as I understood thus from your previous code review comment. How about this? (Also contains rather a lot of obvious style fixes) Yup, looks good. -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 05/10] Intel IOMMU driver
On Mon, Jun 25, 2007 at 11:32:49PM -0700, Andrew Morton wrote: On Tue, 19 Jun 2007 16:32:23 -0700 (PDT) Christoph Lameter [EMAIL PROTECTED] wrote: On Tue, 19 Jun 2007, Keshavamurthy, Anil S wrote: +static inline void *alloc_pgtable_page(void) +{ + return (void *)get_zeroed_page(GFP_ATOMIC); +} Need to pass gfp_t parameter. Repeates a couple of times. ... Is it not possible here to drop the lock and do the alloc with GFP_KERNEL and deal with the resulting race? That is done in other parts of the kernel. ... This may be able to become a GFP_KERNEL alloc since interrupts are enabled at this point? ... GFP_KERNEL alloc possible? Yeah, if there are any callsites at all at which we know that we can perform a sleeping allocation, Christoph's suggestions should be adopted. Because even a bare GFP_NOIO is heaps more robust than GFP_ATOMIC, and it will also reload the free-pages reserves, making subsequent GFP_ATOMIC allocations more likely to succeed. Yup, will do as part of making this code work for IA64, which is my next item in my todo list. -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 05/10] Intel IOMMU driver
On Mon, Jun 25, 2007 at 11:25:11PM -0700, Andrew Morton wrote: On Tue, 19 Jun 2007 14:37:06 -0700 Keshavamurthy, Anil S [EMAIL PROTECTED] wrote: None of these actually _need_ to be macros and it would be better to implement them in C. That way things are more self-documenting, more pleasant to read, more likely to get commented and you'll fix the two bugs wherein the argument to a macro is evaluated more than once. Agree. Will send a patch when I get back from OLS as any changes to this needs thorugh testing. -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 10/10] Iommu floppy workaround
On Tue, Jun 26, 2007 at 12:37:55PM +0200, Andi Kleen wrote: Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig 2007-06-18 15:45:08.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig 2007-06-18 15:45:09.0 -0700 @@ -752,6 +752,16 @@ all the OS visible memory. Hence the driver can continue to use physical addresses for DMA. +config DMAR_FLPY_WA FLOPPY is spelled FLOPPY! Also this shouldn't be a user visible config. The floppy driver should just do this transparently when loaded and undo when unloaded. Yup, I agree. Here goes the patch to make it user invisible config option. Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- arch/x86_64/Kconfig |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig 2007-06-26 12:04:42.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-26 12:06:01.0 -0700 @@ -753,7 +753,7 @@ to use physical addresses for DMA. config DMAR_FLOPPY_WA - bool Support for Floppy disk workaround + bool depends on DMAR default y help - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls
On Thu, Jun 21, 2007 at 09:13:11AM +0200, Peter Zijlstra wrote: > On Wed, 2007-06-20 at 23:37 -0700, Keshavamurthy, Anil S wrote: > > On Thu, Jun 21, 2007 at 08:29:34AM +0200, Peter Zijlstra wrote: > > > On Wed, 2007-06-20 at 23:11 -0700, Arjan van de Ven wrote: > > > > Peter Zijlstra wrote: > > > Also, the other iommu code you pointed me to, was happy to fail, it did > > > not attempt to use the emergency reserves. > > > > Is the same behavior acceptable here? > > I would say it is. Failure is a part of life. > > If you have a (small) mempool with 16 pages or so, that should give you > plenty megabytes of io-space to get out of a tight spot. That is, you > can queue many pages with that. If it is depleted you know you have at > least that many pages outstanding. So failing will just delay the next > pages. > > Throughput is not a key issue when that low on memory, a guarantee of > progress is. Andrew, Can you please queue all the other patches except this one for your next MM release? (Yes, We can safely drop this patch without any issues in applying rest of the patches). -thanks, Anil Keshavamurthy - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls
On Thu, Jun 21, 2007 at 08:29:34AM +0200, Peter Zijlstra wrote: > On Wed, 2007-06-20 at 23:11 -0700, Arjan van de Ven wrote: > > Peter Zijlstra wrote: > Also, the other iommu code you pointed me to, was happy to fail, it did > not attempt to use the emergency reserves. Is the same behavior acceptable here? -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls
On Wed, Jun 20, 2007 at 11:11:05PM -0700, Arjan van de Ven wrote: > Peter Zijlstra wrote: > >What I'm saying is that if you do use the reserves, you should ensure > >the use is bounded. I'm not seeing anything like that. > > each mapping takes at most 3 pages With 3 pages(3 level page table), IOMMU can map at most 2MB and each additional last level page helps map another 2MB. Again, the IOMMU driver re-uses the virtual address instead of giving contiguous virtual address which helps to keep the page table growth and helps reuse the same page table entries, in that sense we are bounded, again we are not sure how much IO will be in flight on a perticular system so as to reserve that many pages for IOMMU page table setup. -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls
On Wed, Jun 20, 2007 at 11:11:05PM -0700, Arjan van de Ven wrote: Peter Zijlstra wrote: What I'm saying is that if you do use the reserves, you should ensure the use is bounded. I'm not seeing anything like that. each mapping takes at most 3 pages With 3 pages(3 level page table), IOMMU can map at most 2MB and each additional last level page helps map another 2MB. Again, the IOMMU driver re-uses the virtual address instead of giving contiguous virtual address which helps to keep the page table growth and helps reuse the same page table entries, in that sense we are bounded, again we are not sure how much IO will be in flight on a perticular system so as to reserve that many pages for IOMMU page table setup. -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls
On Thu, Jun 21, 2007 at 08:29:34AM +0200, Peter Zijlstra wrote: On Wed, 2007-06-20 at 23:11 -0700, Arjan van de Ven wrote: Peter Zijlstra wrote: Also, the other iommu code you pointed me to, was happy to fail, it did not attempt to use the emergency reserves. Is the same behavior acceptable here? -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls
On Thu, Jun 21, 2007 at 09:13:11AM +0200, Peter Zijlstra wrote: On Wed, 2007-06-20 at 23:37 -0700, Keshavamurthy, Anil S wrote: On Thu, Jun 21, 2007 at 08:29:34AM +0200, Peter Zijlstra wrote: On Wed, 2007-06-20 at 23:11 -0700, Arjan van de Ven wrote: Peter Zijlstra wrote: Also, the other iommu code you pointed me to, was happy to fail, it did not attempt to use the emergency reserves. Is the same behavior acceptable here? I would say it is. Failure is a part of life. If you have a (small) mempool with 16 pages or so, that should give you plenty megabytes of io-space to get out of a tight spot. That is, you can queue many pages with that. If it is depleted you know you have at least that many pages outstanding. So failing will just delay the next pages. Throughput is not a key issue when that low on memory, a guarantee of progress is. Andrew, Can you please queue all the other patches except this one for your next MM release? (Yes, We can safely drop this patch without any issues in applying rest of the patches). -thanks, Anil Keshavamurthy - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls
On Wed, Jun 20, 2007 at 10:08:51PM +0200, Peter Zijlstra wrote: > On Wed, 2007-06-20 at 12:14 -0700, Arjan van de Ven wrote: > > Peter Zijlstra wrote: > > > So a reclaim context (kswapd and direct reclaim) set PF_MEMALLOC to > > > ensure they themselves will not block on a memory allocation. And it is > > > understood that these code paths have a bounded memory footprint. > > > > > > that's a too simplistic view though; what happens is that kswapd will > > queue the IO, but the irq context will then take the IO from the queue > > and do the DMA mapping... which needs the memory. As Arjan is saying, that a reclaim context sets PF_MEMALLOC flag and submits the IO, but the controller driver decides to queue the IO, and later in the interrupt context it de-queues and calls the IOMMU driver for mapping the DMA physical address and in this DMA map api call we may need the memory to satisfy the DMA map api call. Hence PF_MEMALLOC set by the reclaim context should work from interrupt context too, if it is not then that needs to be fixed. > > Right, but who stops some unrelated interrupt handler from completely > depleting memory? The DMA map API's exposed by IOMMU are called by the storage/network controller driver and it is in this IOMMU driver we are talking about allocating memory. So we have no clue how much I/O the controller above us is capable of submitting. And all we do is the mapping of DMA phyical address and provide the caller with the virtual DMA address and it is in this process we may need some memory. > > What I'm saying is that there should be some coupling between the > reclaim context and the irq context doing work on its behalf. Nope this info is not available when upper level drivers calls the standard DMA map api's. > > For instance, you know how many pages are in the queue, and which queue. > So you could preallocate enough memory to handle that many pages from > irq context and couple that reserve to the queue object. Then irq > context can use that memory to do the work. As I have said earlier, we are not the storage or network controller driver and we have no idea how much the IO the controller is capable of. Again this patch is the best effort for not to fail the DMA map api, if it fails due to lack of memory, we have no choice but to return failure to upper level driver. But today, upper level drivers are not capable of handling failures from DMA map api's hence this best effort for not to fail the DMA map api call. thanks, -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls
On Wed, Jun 20, 2007 at 10:08:51PM +0200, Peter Zijlstra wrote: On Wed, 2007-06-20 at 12:14 -0700, Arjan van de Ven wrote: Peter Zijlstra wrote: So a reclaim context (kswapd and direct reclaim) set PF_MEMALLOC to ensure they themselves will not block on a memory allocation. And it is understood that these code paths have a bounded memory footprint. that's a too simplistic view though; what happens is that kswapd will queue the IO, but the irq context will then take the IO from the queue and do the DMA mapping... which needs the memory. As Arjan is saying, that a reclaim context sets PF_MEMALLOC flag and submits the IO, but the controller driver decides to queue the IO, and later in the interrupt context it de-queues and calls the IOMMU driver for mapping the DMA physical address and in this DMA map api call we may need the memory to satisfy the DMA map api call. Hence PF_MEMALLOC set by the reclaim context should work from interrupt context too, if it is not then that needs to be fixed. Right, but who stops some unrelated interrupt handler from completely depleting memory? The DMA map API's exposed by IOMMU are called by the storage/network controller driver and it is in this IOMMU driver we are talking about allocating memory. So we have no clue how much I/O the controller above us is capable of submitting. And all we do is the mapping of DMA phyical address and provide the caller with the virtual DMA address and it is in this process we may need some memory. What I'm saying is that there should be some coupling between the reclaim context and the irq context doing work on its behalf. Nope this info is not available when upper level drivers calls the standard DMA map api's. For instance, you know how many pages are in the queue, and which queue. So you could preallocate enough memory to handle that many pages from irq context and couple that reserve to the queue object. Then irq context can use that memory to do the work. As I have said earlier, we are not the storage or network controller driver and we have no idea how much the IO the controller is capable of. Again this patch is the best effort for not to fail the DMA map api, if it fails due to lack of memory, we have no choice but to return failure to upper level driver. But today, upper level drivers are not capable of handling failures from DMA map api's hence this best effort for not to fail the DMA map api call. thanks, -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel IOMMU 05/10] Intel IOMMU driver
On Tue, Jun 19, 2007 at 04:32:23PM -0700, Christoph Lameter wrote: > On Tue, 19 Jun 2007, Keshavamurthy, Anil S wrote: > > > +static inline void *alloc_pgtable_page(void) > > +{ > > + return (void *)get_zeroed_page(GFP_ATOMIC); > > +} > > Need to pass gfp_t parameter. Repeates a couple of times. > > > + addr &= (((u64)1) << addr_width) - 1; > > + parent = domain->pgd; > > + > > + spin_lock_irqsave(>mapping_lock, flags); > > + while (level > 0) { > > + void *tmp_page; > > + > > + offset = address_level_offset(addr, level); > > + pte = [offset]; > > + if (level == 1) > > + break; > > + > > + if (!dma_pte_present(*pte)) { > > + tmp_page = alloc_pgtable_page(); > > Is it not possible here to drop the lock and do the alloc with GFP_KERNEL > and deal with the resulting race? That is done in other parts of the > kernel. > > > +/* iommu handling */ > > +static int iommu_alloc_root_entry(struct intel_iommu *iommu) > > +{ > > + struct root_entry *root; > > + unsigned long flags; > > + > > + root = (struct root_entry *)alloc_pgtable_page(); > > This may be able to become a GFP_KERNEL alloc since interrupts are enabled > at this point? Memory allocated during driver init is very less and not much benefit with the suggested changes I think. Please correct me If I am wrong. The biggest benifit will be when we can figure out GPF_ flags during runtime when DMA map api's are called. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Intel IOMMU 10/10] Iommu floppy workaround
This config option (DMAR_FLPY_WA) sets up 1:1 mapping for the floppy device so that the floppy device which does not use DMA api's will continue to work. Once the floppy driver starts using DMA api's this config option can be turn off or this patch can be yanked out of kernel at that time. Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- arch/x86_64/Kconfig | 10 ++ drivers/pci/intel-iommu.c | 22 ++ drivers/pci/intel-iommu.h |7 +++ 3 files changed, 39 insertions(+) Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig 2007-06-18 15:45:08.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-18 15:45:09.0 -0700 @@ -752,6 +752,16 @@ all the OS visible memory. Hence the driver can continue to use physical addresses for DMA. +config DMAR_FLPY_WA + bool "Support for Floppy disk workaround" + depends on DMAR + default y + help +Floppy disk drivers are know to by pass dma api calls +their by failing to work when IOMMU is enabled. This +work around will setup a 1 to 1 mappings for the first +16M to make floppy(isa device) work. + source "drivers/pci/pcie/Kconfig" source "drivers/pci/Kconfig" Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 15:45:08.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c 2007-06-18 15:45:09.0 -0700 @@ -1631,6 +1631,26 @@ } #endif +#ifdef CONFIG_DMAR_FLPY_WA +static inline void iommu_prepare_isa(void) +{ + struct pci_dev *pdev = NULL; + int ret; + + pdev = pci_get_class (PCI_CLASS_BRIDGE_ISA << 8, NULL); + if (!pdev) + return; + + printk (KERN_INFO "IOMMU: Prepare 0-16M unity mapping for LPC\n"); + ret = iommu_prepare_identity_map(pdev, 0, 16*1024*1024); + + if (ret) + printk ("IOMMU: Failed to create 0-64M identity map, \ + Floppy might not work\n"); + +} +#endif + int __init init_dmars(void) { struct dmar_drhd_unit *drhd; @@ -1696,6 +1716,8 @@ iommu_prepare_gfx_mapping(); + iommu_prepare_isa(); + /* * for each drhd * enable fault log Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.h === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.h 2007-06-18 15:45:08.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.h 2007-06-18 15:45:09.0 -0700 @@ -322,4 +322,11 @@ } #endif /* !CONFIG_DMAR_GFX_WA */ +#ifndef CONFIG_DMAR_FLPY_WA +static inline void iommu_prepare_isa(void) +{ + return; +} +#endif /* !CONFIG_DMAR_FLPY_WA */ + #endif -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Intel IOMMU 08/10] DMAR fault handling support
MSI interrupt handler registrations and fault handling support for Intel-IOMMU hadrware. This patch enables the MSI interrupts for the DMA remapping units and in the interrupt handler read the fault cause and outputs the same on to the console. Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- Documentation/Intel-IOMMU.txt | 17 +++ arch/x86_64/kernel/io_apic.c | 59 drivers/pci/intel-iommu.c | 194 ++ include/linux/dmar.h | 12 ++ 4 files changed, 281 insertions(+), 1 deletion(-) Index: linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt === --- linux-2.6.22-rc4-mm2.orig/Documentation/Intel-IOMMU.txt 2007-06-18 15:45:46.0 -0700 +++ linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt 2007-06-19 13:05:03.0 -0700 @@ -63,6 +63,15 @@ The same is true for peer to peer transactions. Hence we reserve the address from PCI MMIO ranges so they are not allocated for IOVA addresses. + +Fault reporting +--- +When errors are reported, the DMA engine signals via an interrupt. The fault +reason and device that caused it with fault reason is printed on console. + +See below for sample. + + Boot Message Sample --- @@ -85,6 +94,14 @@ PCI-DMA: Using DMAR IOMMU +Fault reporting +--- + +DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000 +DMAR:[fault reason 05] PTE Write access is not set +DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000 +DMAR:[fault reason 05] PTE Write access is not set + TBD Index: linux-2.6.22-rc4-mm2/arch/x86_64/kernel/io_apic.c === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/kernel/io_apic.c 2007-06-18 15:45:38.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/kernel/io_apic.c 2007-06-18 15:45:46.0 -0700 @@ -31,6 +31,7 @@ #include #include #include +#include #ifdef CONFIG_ACPI #include #endif @@ -2026,8 +2027,64 @@ destroy_irq(irq); } -#endif /* CONFIG_PCI_MSI */ +#ifdef CONFIG_DMAR +#ifdef CONFIG_SMP +static void dmar_msi_set_affinity(unsigned int irq, cpumask_t mask) +{ + struct irq_cfg *cfg = irq_cfg + irq; + struct msi_msg msg; + unsigned int dest; + cpumask_t tmp; + + cpus_and(tmp, mask, cpu_online_map); + if (cpus_empty(tmp)) + return; + + if (assign_irq_vector(irq, mask)) + return; + + cpus_and(tmp, cfg->domain, mask); + dest = cpu_mask_to_apicid(tmp); + + dmar_msi_read(irq, ); + + msg.data &= ~MSI_DATA_VECTOR_MASK; + msg.data |= MSI_DATA_VECTOR(cfg->vector); + msg.address_lo &= ~MSI_ADDR_DEST_ID_MASK; + msg.address_lo |= MSI_ADDR_DEST_ID(dest); + + dmar_msi_write(irq, ); + irq_desc[irq].affinity = mask; +} +#endif /* CONFIG_SMP */ + +struct irq_chip dmar_msi_type = { + .name = "DMAR_MSI", + .unmask = dmar_msi_unmask, + .mask = dmar_msi_mask, + .ack = ack_apic_edge, +#ifdef CONFIG_SMP + .set_affinity = dmar_msi_set_affinity, +#endif + .retrigger = ioapic_retrigger_irq, +}; + +int arch_setup_dmar_msi(unsigned int irq) +{ + int ret; + struct msi_msg msg; + + ret = msi_compose_msg(NULL, irq, ); + if (ret < 0) + return ret; + dmar_msi_write(irq, ); + set_irq_chip_and_handler_name(irq, _msi_type, handle_edge_irq, + "edge"); + return 0; +} +#endif +#endif /* CONFIG_PCI_MSI */ /* * Hypertransport interrupt support */ Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 15:45:46.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c 2007-06-19 13:05:03.0 -0700 @@ -742,6 +742,196 @@ return 0; } +/* iommu interrupt handling. Most stuff are MSI-like. */ + +static char *fault_reason_strings[] = +{ + "Software", + "Present bit in root entry is clear", + "Present bit in context entry is clear", + "Invalid context entry", + "Access beyond MGAW", + "PTE Write access is not set", + "PTE Read access is not set", + "Next page table ptr is invalid", + "Root table address invalid", + "Context table ptr is invalid", + "non-zero reserved fields in RTP", + "non-zero reserved fields in CTP", + "non-zero reserved fields in PTE", + "Unknown" +}; +#define MAX_FAULT_REASON_IDX ARRAY_SIZE(fault_reason_strings) + +char *dmar_get_fault_reason(u8 fault_reason) +{ + if (fault_reason > MAX_FAULT_REASON_IDX) + return fault_reason_strings[MAX_FAULT_REASON_IDX]; + else + return fault_reason_strings[fault_reason]; +} + +void
[Intel IOMMU 04/10] IOVA allocation and management routines
This code implements a generic IOVA allocation and management. As per Dave's suggestion we are now allocating IO virtual address from Higher DMA limit address rather than lower end address and this eliminated the need to preserve the IO virtual address for multiple devices sharing the same domain virtual address. Also this code uses red black trees to store the allocated and reserved iova nodes. This showed a good performance improvements over previous linear linked list. Changes from previous posting: 1) Fixed mostly coding style issues Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- drivers/pci/iova.c | 351 + drivers/pci/iova.h | 62 + 2 files changed, 413 insertions(+) Index: linux-2.6.22-rc4-mm2/drivers/pci/iova.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.22-rc4-mm2/drivers/pci/iova.c 2007-06-18 15:45:46.0 -0700 @@ -0,0 +1,351 @@ +/* + * Copyright (c) 2006, Intel Corporation. + * + * This file is released under the GPLv2. + * + * Copyright (C) 2006 Anil S Keshavamurthy <[EMAIL PROTECTED]> + */ + +#include "iova.h" + +void +init_iova_domain(struct iova_domain *iovad) +{ + spin_lock_init(>iova_alloc_lock); + spin_lock_init(>iova_rbtree_lock); + iovad->rbroot = RB_ROOT; + iovad->cached32_node = NULL; + +} + +static struct rb_node * +__get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn) +{ + if ((*limit_pfn != DMA_32BIT_PFN) || + (iovad->cached32_node == NULL)) + return rb_last(>rbroot); + else { + struct rb_node *prev_node = rb_prev(iovad->cached32_node); + struct iova *curr_iova = + container_of(iovad->cached32_node, struct iova, node); + *limit_pfn = curr_iova->pfn_lo - 1; + return prev_node; + } +} + +static inline void +__cached_rbnode_insert_update(struct iova_domain *iovad, + unsigned long limit_pfn, struct iova *new) +{ + if (limit_pfn != DMA_32BIT_PFN) + return; + iovad->cached32_node = >node; +} + +static inline void +__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free) +{ + struct iova *cached_iova; + struct rb_node *curr; + + if (!iovad->cached32_node) + return; + curr = iovad->cached32_node; + cached_iova = container_of(curr, struct iova, node); + + if (free->pfn_lo >= cached_iova->pfn_lo) + iovad->cached32_node = rb_next(>node); +} + +static inline int __alloc_iova_range(struct iova_domain *iovad, + unsigned long size, unsigned long limit_pfn, struct iova *new) +{ + struct rb_node *curr = NULL; + unsigned long flags; + unsigned long saved_pfn; + + /* Walk the tree backwards */ + spin_lock_irqsave(>iova_rbtree_lock, flags); + saved_pfn = limit_pfn; + curr = __get_cached_rbnode(iovad, _pfn); + while (curr) { + struct iova *curr_iova = container_of(curr, struct iova, node); + if (limit_pfn < curr_iova->pfn_lo) + goto move_left; + if (limit_pfn < curr_iova->pfn_hi) + goto adjust_limit_pfn; + if ((curr_iova->pfn_hi + size) <= limit_pfn) + break; /* found a free slot */ +adjust_limit_pfn: + limit_pfn = curr_iova->pfn_lo - 1; +move_left: + curr = rb_prev(curr); + } + + if ((!curr) && !(IOVA_START_PFN + size <= limit_pfn)) { + spin_unlock_irqrestore(>iova_rbtree_lock, flags); + return -ENOMEM; + } + new->pfn_hi = limit_pfn; + new->pfn_lo = limit_pfn - size + 1; + + spin_unlock_irqrestore(>iova_rbtree_lock, flags); + return 0; +} + +static void +iova_insert_rbtree(struct rb_root *root, struct iova *iova) +{ + struct rb_node **new = &(root->rb_node), *parent = NULL; + /* Figure out where to put new node */ + while (*new) { + struct iova *this = container_of(*new, struct iova, node); + parent = *new; + + if (iova->pfn_lo < this->pfn_lo) + new = &((*new)->rb_left); + else if (iova->pfn_lo > this->pfn_lo) + new = &((*new)->rb_right); + else + BUG(); /* this should not happen */ + } + /* Add new node and rebalance tree. */ + rb_link_node(>node, parent, new); + rb_insert_color(>node, root); +} + +/** + * alloc_iova - allocates an iova + * @iovad - iova domain in question + * @size - size of page frames to allocate + * @limit_pfn - max limit address + * This function allocates an iova in the range limit_pfn to IOVA_START_PFN + * looking from limit_pfn instead from IOVA_START_PFN. + */
[Intel IOMMU 09/10] Iommu Gfx workaround
When we fix all the opensource gfx drivers to use the DMA api's, at that time we can yank this config options out. Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- Documentation/Intel-IOMMU.txt |5 + arch/x86_64/Kconfig | 11 +++ arch/x86_64/kernel/e820.c | 19 +++ drivers/pci/intel-iommu.c | 33 + drivers/pci/intel-iommu.h |7 +++ 5 files changed, 75 insertions(+) Index: linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt === --- linux-2.6.22-rc4-mm2.orig/Documentation/Intel-IOMMU.txt 2007-06-18 15:45:08.0 -0700 +++ linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt 2007-06-18 15:45:08.0 -0700 @@ -57,6 +57,11 @@ If you encounter issues with graphics devices, you can try adding option intel_iommu=igfx_off to turn off the integrated graphics engine. +If it happens to be a PCI device included in the INCLUDE_ALL Engine, +then try enabling CONFIG_DMAR_GFX_WA to setup a 1-1 map. We hear +graphics drivers may be in process of using DMA api's in the near +future and at that time this option can be yanked out. + Some exceptions to IOVA --- Interrupt ranges are not address translated, (0xfee0 - 0xfeef). Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig 2007-06-18 15:45:07.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-18 15:45:08.0 -0700 @@ -741,6 +741,17 @@ and includes pci device scope covered by these DMA remapping device. +config DMAR_GFX_WA + bool "Support for Graphics workaround" + depends on DMAR + default y + help +Current Graphics drivers tend to use physical address +for DMA and avoid using DMA api's. Setting this config +option permits the IOMMU driver to set a unity map for +all the OS visible memory. Hence the driver can continue +to use physical addresses for DMA. + source "drivers/pci/pcie/Kconfig" source "drivers/pci/Kconfig" Index: linux-2.6.22-rc4-mm2/arch/x86_64/kernel/e820.c === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/kernel/e820.c 2007-06-18 15:44:44.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/kernel/e820.c 2007-06-18 15:45:08.0 -0700 @@ -723,3 +723,22 @@ printk(KERN_INFO "Allocating PCI resources starting at %lx (gap: %lx:%lx)\n", pci_mem_start, gapstart, gapsize); } + +int __init arch_get_ram_range(int slot, u64 *addr, u64 *size) +{ + int i; + + if (slot < 0 || slot >= e820.nr_map) + return -1; + for (i = slot; i < e820.nr_map; i++) { + if (e820.map[i].type != E820_RAM) + continue; + break; + } + if (i == e820.nr_map || e820.map[i].addr > (max_pfn << PAGE_SHIFT)) + return -1; + *addr = e820.map[i].addr; + *size = min_t(u64, e820.map[i].size + e820.map[i].addr, + max_pfn << PAGE_SHIFT) - *addr; + return i + 1; +} Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 15:45:08.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c 2007-06-18 15:45:08.0 -0700 @@ -1601,6 +1601,36 @@ rmrr->end_address + 1); } +#ifdef CONFIG_DMAR_GFX_WA +extern int arch_get_ram_range(int slot, u64 *addr, u64 *size); +static void __init iommu_prepare_gfx_mapping(void) +{ + struct pci_dev *pdev = NULL; + u64 base, size; + int slot; + int ret; + + for_each_pci_dev(pdev) { + if (pdev->sysdata == DUMMY_DEVICE_DOMAIN_INFO || + !IS_GFX_DEVICE(pdev)) + continue; + printk(KERN_INFO "IOMMU: gfx device %s 1-1 mapping\n", + pci_name(pdev)); + slot = arch_get_ram_range(0, , ); + while (slot >= 0) { + ret = iommu_prepare_identity_map(pdev, + base, base + size); + if (ret) + goto error; + slot = arch_get_ram_range(slot, , ); + } + continue; +error: + printk(KERN_ERR "IOMMU: mapping reserved region failed\n"); + } +} +#endif + int __init init_dmars(void) { struct dmar_drhd_unit *drhd; @@ -1664,6 +1694,8 @@ } } + iommu_prepare_gfx_mapping(); + /* * for each drhd * enable fault log @@ -2175,3 +2207,4 @@ dma_ops =
[Intel IOMMU 07/10] Intel iommu cmdline option - forcedac
Introduce intel_iommu=forcedac commandline option. This option is helpful to verify the pci device capability of handling physical dma'able address greater than 4G. Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- Documentation/kernel-parameters.txt |7 +++ drivers/pci/intel-iommu.c |7 ++- 2 files changed, 13 insertions(+), 1 deletion(-) Index: linux-2.6.22-rc4-mm2/Documentation/kernel-parameters.txt === --- linux-2.6.22-rc4-mm2.orig/Documentation/kernel-parameters.txt 2007-06-18 15:45:46.0 -0700 +++ linux-2.6.22-rc4-mm2/Documentation/kernel-parameters.txt2007-06-18 15:45:46.0 -0700 @@ -788,6 +788,13 @@ bypassed by not enabling DMAR with this option. In this case, gfx device will use physical address for DMA. + forcedac [x86_64] + With this option iommu will not optimize to look + for io virtual address below 32 bit forcing dual + address cycle on pci bus for cards supporting greater + than 32 bit addressing. The default is to look + for translation below 32 bit and if not available + then look in the higher range. io7=[HW] IO7 for Marvel based alpha systems See comment before marvel_specify_io7 in Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 15:45:46.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c 2007-06-19 13:09:06.0 -0700 @@ -53,6 +53,7 @@ static int dmar_disabled; static int __initdata dmar_map_gfx = 1; +static int dmar_forcedac; #define DUMMY_DEVICE_DOMAIN_INFO ((struct device_domain_info *)(-1)) static DEFINE_SPINLOCK(device_domain_lock); @@ -70,6 +71,10 @@ dmar_map_gfx = 0; printk(KERN_INFO "Intel-IOMMU: disable GFX device mapping\n"); + } else if (!strncmp(str, "forcedac", 8)) { + printk (KERN_INFO + "Intel-IOMMU: Forcing DAC for PCI devices\n"); + dmar_forcedac = 1; } str += strcspn(str, ","); @@ -1557,7 +1562,7 @@ start_addr = IOVA_START_ADDR; - if (pdev->dma_mask <= DMA_32BIT_MASK) { + if ((pdev->dma_mask <= DMA_32BIT_MASK) || (dmar_forcedac)) { iova = iommu_alloc_iova(domain, addr, size, start_addr, pdev->dma_mask); } else { -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Intel IOMMU 03/10] clflush_cache_range now takes size param
Introduce the size param for clflush_cache_range(). Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- arch/x86_64/mm/pageattr.c |6 +++--- include/asm-x86_64/cacheflush.h |1 + 2 files changed, 4 insertions(+), 3 deletions(-) Index: linux-2.6.22-rc4-mm2/arch/x86_64/mm/pageattr.c === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/mm/pageattr.c 2007-06-18 15:45:39.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/mm/pageattr.c 2007-06-18 15:45:46.0 -0700 @@ -61,10 +61,10 @@ return base; } -static void cache_flush_page(void *adr) +void clflush_cache_range(void *adr, int size) { int i; - for (i = 0; i < PAGE_SIZE; i += boot_cpu_data.x86_clflush_size) + for (i = 0; i < size; i += boot_cpu_data.x86_clflush_size) asm volatile("clflush (%0)" :: "r" (adr + i)); } @@ -80,7 +80,7 @@ list_for_each_entry(pg, l, lru) { void *adr = page_address(pg); if (cpu_has_clflush) - cache_flush_page(adr); + clflush_cache_range(adr, PAGE_SIZE); } __flush_tlb_all(); } Index: linux-2.6.22-rc4-mm2/include/asm-x86_64/cacheflush.h === --- linux-2.6.22-rc4-mm2.orig/include/asm-x86_64/cacheflush.h 2007-06-18 15:45:39.0 -0700 +++ linux-2.6.22-rc4-mm2/include/asm-x86_64/cacheflush.h2007-06-18 15:45:46.0 -0700 @@ -27,6 +27,7 @@ void global_flush_tlb(void); int change_page_attr(struct page *page, int numpages, pgprot_t prot); int change_page_attr_addr(unsigned long addr, int numpages, pgprot_t prot); +void clflush_cache_range(void *addr, int size); #ifdef CONFIG_DEBUG_RODATA void mark_rodata_ro(void); -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls
Intel IOMMU driver needs memory during DMA map calls to setup its internal page tables and for other data structures. As we all know that these DMA map calls are mostly called in the interrupt context or with the spinlock held by the upper level drivers(network/storage drivers), so in order to avoid any memory allocation failure due to low memory issues, this patch makes memory allocation by temporarily setting PF_MEMALLOC flags for the current task before making memory allocation calls. We evaluated mempools as a backup when kmem_cache_alloc() fails and found that mempools are really not useful here because 1) We don;t know for sure how much to reserve in advance 2) And mempools are not useful for GFP_ATOMIC case (as we call memory alloc functions with GFP_ATOMIC) With PF_MEMALLOC flag set in the current->flags, the VM subsystem avoids any watermark checks before allocating memory thus guarantee'ing the memory till the last free page. Further, looking at the code in mm/page_alloc.c in __alloc_pages() function, looks like this flag is useful only in the non-interrupt context. If we are in the interrupt context and memory allocation in IOMMU driver fails for some reason, then the DMA map api's will return failure and it is up to the higher level drivers to retry. Suppose, if upper level driver programs the controller with the buggy DMA virtual address, the IOMMU will block that DMA transaction when that happens thus preventing any corruption to main memory. So far in our test scenario, we were unable to create any memory allocation failure inside dma map api calls. Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- drivers/pci/intel-iommu.c | 30 ++ 1 file changed, 26 insertions(+), 4 deletions(-) Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 15:45:46.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c 2007-06-19 13:10:29.0 -0700 @@ -84,9 +84,31 @@ static struct kmem_cache *iommu_devinfo_cache; static struct kmem_cache *iommu_iova_cache; +static inline void *iommu_kmem_cache_alloc(struct kmem_cache *cachep) +{ + unsigned int flags; + void *vaddr; + + /* trying to avoid low memory issues */ + flags = current->flags & PF_MEMALLOC; + current->flags |= PF_MEMALLOC; + vaddr = kmem_cache_alloc(cachep, GFP_ATOMIC); + current->flags &= (~PF_MEMALLOC | flags); + return vaddr; +} + + static inline void *alloc_pgtable_page(void) { - return (void *)get_zeroed_page(GFP_ATOMIC); + unsigned int flags; + void *vaddr; + + /* trying to avoid low memory issues */ + flags = current->flags & PF_MEMALLOC; + current->flags |= PF_MEMALLOC; + vaddr = (void *)get_zeroed_page(GFP_ATOMIC); + current->flags &= (~PF_MEMALLOC | flags); + return vaddr; } static inline void free_pgtable_page(void *vaddr) @@ -96,7 +118,7 @@ static inline void *alloc_domain_mem(void) { - return kmem_cache_alloc(iommu_domain_cache, GFP_ATOMIC); + return iommu_kmem_cache_alloc(iommu_domain_cache); } static inline void free_domain_mem(void *vaddr) @@ -106,7 +128,7 @@ static inline void * alloc_devinfo_mem(void) { - return kmem_cache_alloc(iommu_devinfo_cache, GFP_ATOMIC); + return iommu_kmem_cache_alloc(iommu_devinfo_cache); } static inline void free_devinfo_mem(void *vaddr) @@ -116,7 +138,7 @@ struct iova *alloc_iova_mem(void) { - return kmem_cache_alloc(iommu_iova_cache, GFP_ATOMIC); + return iommu_kmem_cache_alloc(iommu_iova_cache); } void free_iova_mem(struct iova *iova) -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Intel IOMMU 01/10] DMAR detection and parsing logic
This patch adds support for early detection and parsing of DMAR's (DMA Remapping ) reported to OS via ACPI tables. DMA remapping(DMAR) devices support enables independent address translations for Direct Memory Access(DMA) from Devices. These DMA remapping devices are reported via ACPI tables and includes pci device scope covered by these DMA remapping device. For detailed info on the specification of "Intel(R) Virtualization Technology for Directed I/O Architecture" please see http://www.intel.com/technology/virtualization/index.htm Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- arch/x86_64/Kconfig | 11 + drivers/pci/Makefile |3 drivers/pci/dmar.c| 327 ++ include/acpi/actbl1.h | 27 +++- include/linux/dmar.h | 52 +++ 5 files changed, 413 insertions(+), 7 deletions(-) Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig 2007-06-18 15:45:39.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-19 13:05:03.0 -0700 @@ -730,6 +730,17 @@ bool "Support mmconfig PCI config space access" depends on PCI && ACPI +config DMAR + bool "Support for DMA Remapping Devices (EXPERIMENTAL)" + depends on PCI_MSI && ACPI && EXPERIMENTAL + default y + help + DMA remapping(DMAR) devices support enables independent address + translations for Direct Memory Access(DMA) from Devices. + These DMA remapping devices are reported via ACPI tables + and includes pci device scope covered by these DMA + remapping device. + source "drivers/pci/pcie/Kconfig" source "drivers/pci/Kconfig" Index: linux-2.6.22-rc4-mm2/drivers/pci/Makefile === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/Makefile 2007-06-18 15:45:39.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/Makefile 2007-06-19 13:43:14.0 -0700 @@ -20,6 +20,9 @@ # Build the Hypertransport interrupt support obj-$(CONFIG_HT_IRQ) += htirq.o +# Build Intel IOMMU support +obj-$(CONFIG_DMAR) += dmar.o + # # Some architectures use the generic PCI setup functions # Index: linux-2.6.22-rc4-mm2/drivers/pci/dmar.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.22-rc4-mm2/drivers/pci/dmar.c 2007-06-18 15:45:46.0 -0700 @@ -0,0 +1,327 @@ +/* + * Copyright (c) 2006, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + * Copyright (C) Ashok Raj <[EMAIL PROTECTED]> + * Copyright (C) Shaohua Li <[EMAIL PROTECTED]> + * Copyright (C) Anil S Keshavamurthy <[EMAIL PROTECTED]> + * + * This file implements early detection/parsing of DMA Remapping Devices + * reported to OS through BIOS via DMA remapping reporting (DMAR) ACPI + * tables. + */ + +#include +#include + +#undef PREFIX +#define PREFIX "DMAR:" + +/* No locks are needed as DMA remapping hardware unit + * list is constructed at boot time and hotplug of + * these units are not supported by the architecture. + */ +LIST_HEAD(dmar_drhd_units); +LIST_HEAD(dmar_rmrr_units); + +static struct acpi_table_header * __initdata dmar_tbl; + +static void __init dmar_register_drhd_unit(struct dmar_drhd_unit *drhd) +{ + /* +* add INCLUDE_ALL at the tail, so scan the list will find it at +* the very end. +*/ + if (drhd->include_all) + list_add_tail(>list, _drhd_units); + else + list_add(>list, _drhd_units); +} + +static void __init dmar_register_rmrr_unit(struct dmar_rmrr_unit *rmrr) +{ + list_add(>list, _rmrr_units); +} + +static int __init dmar_parse_one_dev_scope(struct acpi_dmar_device_scope *scope, + struct pci_dev **dev, u16 segment) +{ + struct pci_bus *bus; + struct pci_dev *pdev = NULL; + struct acpi_dmar_pci_path *path; + int count; + + bus = pci_find_bus(segment, scope->bus); + path = (struct acpi_dmar_pci_path *)(scope + 1); + count = (scope->length - sizeof(struct acpi_dmar_device_scope)) + / sizeof(struct acpi_dmar_pci_path); +
[Intel IOMMU 02/10] PCI generic helper function
When devices are under a p2p bridge, upstream transactions get replaced by the device id of the bridge as it owns the PCIE transaction. Hence its necessary to setup translations on behalf of the bridge as well. Due to this limitation all devices under a p2p share the same domain in a DMAR. We just cache the type of device, if its a native PCIe device or not for later use. changes from previous posting 1) fixed mostly coding style issues nothing major Signed-off-by: Anil S Keshavamurthy <[EMAIL PROTECTED]> --- drivers/pci/pci.h|1 + drivers/pci/probe.c | 14 ++ drivers/pci/search.c | 30 ++ include/linux/pci.h |2 ++ 4 files changed, 47 insertions(+) Index: linux-2.6.22-rc4-mm2/drivers/pci/pci.h === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/pci.h 2007-06-18 15:44:45.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/pci.h 2007-06-18 15:45:07.0 -0700 @@ -92,3 +92,4 @@ return NULL; } +struct pci_dev *pci_find_upstream_pcie_bridge(struct pci_dev *pdev); Index: linux-2.6.22-rc4-mm2/drivers/pci/probe.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/probe.c 2007-06-18 15:44:45.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/probe.c2007-06-18 15:45:07.0 -0700 @@ -851,6 +851,19 @@ kfree(pci_dev); } +static void set_pcie_port_type(struct pci_dev *pdev) +{ + int pos; + u16 reg16; + + pos = pci_find_capability(pdev, PCI_CAP_ID_EXP); + if (!pos) + return; + pdev->is_pcie = 1; + pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, ); + pdev->pcie_type = (reg16 & PCI_EXP_FLAGS_TYPE) >> 4; +} + /** * pci_cfg_space_size - get the configuration space size of the PCI device. * @dev: PCI device @@ -965,6 +978,7 @@ dev->device = (l >> 16) & 0x; dev->cfg_size = pci_cfg_space_size(dev); dev->error_state = pci_channel_io_normal; + set_pcie_port_type(dev); /* Assume 32-bit PCI; let 64-bit PCI cards (which are far rarer) set this higher, assuming the system even supports it. */ Index: linux-2.6.22-rc4-mm2/drivers/pci/search.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/search.c 2007-06-18 15:44:45.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/search.c 2007-06-18 15:45:07.0 -0700 @@ -14,6 +14,36 @@ #include "pci.h" DECLARE_RWSEM(pci_bus_sem); +/* + * find the upstream PCIE-to-PCI bridge of a PCI device + * if the device is PCIE, return NULL + * if the device isn't connected to a PCIE bridge (that is its parent is a + * legacy PCI bridge and the bridge is directly connected to bus 0), return its + * parent + */ +struct pci_dev * +pci_find_upstream_pcie_bridge(struct pci_dev *pdev) +{ + struct pci_dev *tmp = NULL; + + if (pdev->is_pcie) + return NULL; + while (1) { + if (!pdev->bus->self) + break; + pdev = pdev->bus->self; + /* a p2p bridge */ + if (!pdev->is_pcie) { + tmp = pdev; + continue; + } + /* PCI device should connect to a PCIE bridge */ + BUG_ON(pdev->pcie_type != PCI_EXP_TYPE_PCI_BRIDGE); + return pdev; + } + + return tmp; +} static struct pci_bus *pci_do_find_bus(struct pci_bus *bus, unsigned char busnr) { Index: linux-2.6.22-rc4-mm2/include/linux/pci.h === --- linux-2.6.22-rc4-mm2.orig/include/linux/pci.h 2007-06-18 15:44:45.0 -0700 +++ linux-2.6.22-rc4-mm2/include/linux/pci.h2007-06-18 15:45:07.0 -0700 @@ -140,6 +140,7 @@ unsigned short subsystem_device; unsigned intclass; /* 3 bytes: (base,sub,prog-if) */ u8 hdr_type; /* PCI header type (`multi' flag masked out) */ + u8 pcie_type; /* PCI-E device/port type */ u8 rom_base_reg; /* which config register controls the ROM */ u8 pin;/* which interrupt pin this device uses */ @@ -182,6 +183,7 @@ unsigned intmsi_enabled:1; unsigned intmsix_enabled:1; unsigned intis_managed:1; + unsigned intis_pcie:1; atomic_tenable_cnt; /* pci_enable_device has been called */ u32 saved_config_space[16]; /* config space saved at suspend time */ -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Intel IOMMU 00/10] Intel IOMMU support, take #2
Hi All, This patch supports the upcomming Intel IOMMU hardware a.k.a. Intel(R) Virtualization Technology for Directed I/O Architecture and the hardware spec for the same can be found here http://www.intel.com/technology/virtualization/index.htm This version of the patches incorporates several feedback obtained from previous postings. Some of the major changes are 1) Removed resource pool (a.k.a. pre-allocate pool) patch 2) For memory allocation in the DMA map api calls we now use kmem_cache_alloc() and get_zeroedpage() functions to allocate memory for internal data structures and for page table setup memory. 3) The allocation of memory in the DMA map api calls is very critical and to avoid failures during memory allocation in the DMA map api calls we evaluated several technique a) mempool - We found that mempool is pretty much useless if we try to allocate memory with GFP_ATOMIC which is our case. Also we found that it is difficult to judge how much to reserver during the creation of mempool. b) PF_MEMALLOC - When a task flags (current->flags) are set with PF_MEMALLOC then watermark checks are avoided during the memory allocation. We choose to use the latter (option b) and make this as a separate patch which can be debated further. Please see patch 6/10. Other minor changes are mostly coding style fixes and making sure that checkpatch.pl passes the patches. Please include this set of patches for next MM release. Thanks and regards, -Anil S Keshavamurthy E-mail: [EMAIL PROTECTED] -- - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Intel IOMMU 00/10] Intel IOMMU support, take #2
Hi All, This patch supports the upcomming Intel IOMMU hardware a.k.a. Intel(R) Virtualization Technology for Directed I/O Architecture and the hardware spec for the same can be found here http://www.intel.com/technology/virtualization/index.htm This version of the patches incorporates several feedback obtained from previous postings. Some of the major changes are 1) Removed resource pool (a.k.a. pre-allocate pool) patch 2) For memory allocation in the DMA map api calls we now use kmem_cache_alloc() and get_zeroedpage() functions to allocate memory for internal data structures and for page table setup memory. 3) The allocation of memory in the DMA map api calls is very critical and to avoid failures during memory allocation in the DMA map api calls we evaluated several technique a) mempool - We found that mempool is pretty much useless if we try to allocate memory with GFP_ATOMIC which is our case. Also we found that it is difficult to judge how much to reserver during the creation of mempool. b) PF_MEMALLOC - When a task flags (current-flags) are set with PF_MEMALLOC then watermark checks are avoided during the memory allocation. We choose to use the latter (option b) and make this as a separate patch which can be debated further. Please see patch 6/10. Other minor changes are mostly coding style fixes and making sure that checkpatch.pl passes the patches. Please include this set of patches for next MM release. Thanks and regards, -Anil S Keshavamurthy E-mail: [EMAIL PROTECTED] -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Intel IOMMU 01/10] DMAR detection and parsing logic
This patch adds support for early detection and parsing of DMAR's (DMA Remapping ) reported to OS via ACPI tables. DMA remapping(DMAR) devices support enables independent address translations for Direct Memory Access(DMA) from Devices. These DMA remapping devices are reported via ACPI tables and includes pci device scope covered by these DMA remapping device. For detailed info on the specification of Intel(R) Virtualization Technology for Directed I/O Architecture please see http://www.intel.com/technology/virtualization/index.htm Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- arch/x86_64/Kconfig | 11 + drivers/pci/Makefile |3 drivers/pci/dmar.c| 327 ++ include/acpi/actbl1.h | 27 +++- include/linux/dmar.h | 52 +++ 5 files changed, 413 insertions(+), 7 deletions(-) Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig 2007-06-18 15:45:39.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-19 13:05:03.0 -0700 @@ -730,6 +730,17 @@ bool Support mmconfig PCI config space access depends on PCI ACPI +config DMAR + bool Support for DMA Remapping Devices (EXPERIMENTAL) + depends on PCI_MSI ACPI EXPERIMENTAL + default y + help + DMA remapping(DMAR) devices support enables independent address + translations for Direct Memory Access(DMA) from Devices. + These DMA remapping devices are reported via ACPI tables + and includes pci device scope covered by these DMA + remapping device. + source drivers/pci/pcie/Kconfig source drivers/pci/Kconfig Index: linux-2.6.22-rc4-mm2/drivers/pci/Makefile === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/Makefile 2007-06-18 15:45:39.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/Makefile 2007-06-19 13:43:14.0 -0700 @@ -20,6 +20,9 @@ # Build the Hypertransport interrupt support obj-$(CONFIG_HT_IRQ) += htirq.o +# Build Intel IOMMU support +obj-$(CONFIG_DMAR) += dmar.o + # # Some architectures use the generic PCI setup functions # Index: linux-2.6.22-rc4-mm2/drivers/pci/dmar.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.22-rc4-mm2/drivers/pci/dmar.c 2007-06-18 15:45:46.0 -0700 @@ -0,0 +1,327 @@ +/* + * Copyright (c) 2006, Intel Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + * + * Copyright (C) Ashok Raj [EMAIL PROTECTED] + * Copyright (C) Shaohua Li [EMAIL PROTECTED] + * Copyright (C) Anil S Keshavamurthy [EMAIL PROTECTED] + * + * This file implements early detection/parsing of DMA Remapping Devices + * reported to OS through BIOS via DMA remapping reporting (DMAR) ACPI + * tables. + */ + +#include linux/pci.h +#include linux/dmar.h + +#undef PREFIX +#define PREFIX DMAR: + +/* No locks are needed as DMA remapping hardware unit + * list is constructed at boot time and hotplug of + * these units are not supported by the architecture. + */ +LIST_HEAD(dmar_drhd_units); +LIST_HEAD(dmar_rmrr_units); + +static struct acpi_table_header * __initdata dmar_tbl; + +static void __init dmar_register_drhd_unit(struct dmar_drhd_unit *drhd) +{ + /* +* add INCLUDE_ALL at the tail, so scan the list will find it at +* the very end. +*/ + if (drhd-include_all) + list_add_tail(drhd-list, dmar_drhd_units); + else + list_add(drhd-list, dmar_drhd_units); +} + +static void __init dmar_register_rmrr_unit(struct dmar_rmrr_unit *rmrr) +{ + list_add(rmrr-list, dmar_rmrr_units); +} + +static int __init dmar_parse_one_dev_scope(struct acpi_dmar_device_scope *scope, + struct pci_dev **dev, u16 segment) +{ + struct pci_bus *bus; + struct pci_dev *pdev = NULL; + struct acpi_dmar_pci_path *path; + int count; + + bus = pci_find_bus(segment, scope-bus); + path = (struct acpi_dmar_pci_path *)(scope + 1); + count = (scope-length - sizeof(struct acpi_dmar_device_scope)) + / sizeof(struct
[Intel IOMMU 02/10] PCI generic helper function
When devices are under a p2p bridge, upstream transactions get replaced by the device id of the bridge as it owns the PCIE transaction. Hence its necessary to setup translations on behalf of the bridge as well. Due to this limitation all devices under a p2p share the same domain in a DMAR. We just cache the type of device, if its a native PCIe device or not for later use. changes from previous posting 1) fixed mostly coding style issues nothing major Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- drivers/pci/pci.h|1 + drivers/pci/probe.c | 14 ++ drivers/pci/search.c | 30 ++ include/linux/pci.h |2 ++ 4 files changed, 47 insertions(+) Index: linux-2.6.22-rc4-mm2/drivers/pci/pci.h === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/pci.h 2007-06-18 15:44:45.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/pci.h 2007-06-18 15:45:07.0 -0700 @@ -92,3 +92,4 @@ return NULL; } +struct pci_dev *pci_find_upstream_pcie_bridge(struct pci_dev *pdev); Index: linux-2.6.22-rc4-mm2/drivers/pci/probe.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/probe.c 2007-06-18 15:44:45.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/probe.c2007-06-18 15:45:07.0 -0700 @@ -851,6 +851,19 @@ kfree(pci_dev); } +static void set_pcie_port_type(struct pci_dev *pdev) +{ + int pos; + u16 reg16; + + pos = pci_find_capability(pdev, PCI_CAP_ID_EXP); + if (!pos) + return; + pdev-is_pcie = 1; + pci_read_config_word(pdev, pos + PCI_EXP_FLAGS, reg16); + pdev-pcie_type = (reg16 PCI_EXP_FLAGS_TYPE) 4; +} + /** * pci_cfg_space_size - get the configuration space size of the PCI device. * @dev: PCI device @@ -965,6 +978,7 @@ dev-device = (l 16) 0x; dev-cfg_size = pci_cfg_space_size(dev); dev-error_state = pci_channel_io_normal; + set_pcie_port_type(dev); /* Assume 32-bit PCI; let 64-bit PCI cards (which are far rarer) set this higher, assuming the system even supports it. */ Index: linux-2.6.22-rc4-mm2/drivers/pci/search.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/search.c 2007-06-18 15:44:45.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/search.c 2007-06-18 15:45:07.0 -0700 @@ -14,6 +14,36 @@ #include pci.h DECLARE_RWSEM(pci_bus_sem); +/* + * find the upstream PCIE-to-PCI bridge of a PCI device + * if the device is PCIE, return NULL + * if the device isn't connected to a PCIE bridge (that is its parent is a + * legacy PCI bridge and the bridge is directly connected to bus 0), return its + * parent + */ +struct pci_dev * +pci_find_upstream_pcie_bridge(struct pci_dev *pdev) +{ + struct pci_dev *tmp = NULL; + + if (pdev-is_pcie) + return NULL; + while (1) { + if (!pdev-bus-self) + break; + pdev = pdev-bus-self; + /* a p2p bridge */ + if (!pdev-is_pcie) { + tmp = pdev; + continue; + } + /* PCI device should connect to a PCIE bridge */ + BUG_ON(pdev-pcie_type != PCI_EXP_TYPE_PCI_BRIDGE); + return pdev; + } + + return tmp; +} static struct pci_bus *pci_do_find_bus(struct pci_bus *bus, unsigned char busnr) { Index: linux-2.6.22-rc4-mm2/include/linux/pci.h === --- linux-2.6.22-rc4-mm2.orig/include/linux/pci.h 2007-06-18 15:44:45.0 -0700 +++ linux-2.6.22-rc4-mm2/include/linux/pci.h2007-06-18 15:45:07.0 -0700 @@ -140,6 +140,7 @@ unsigned short subsystem_device; unsigned intclass; /* 3 bytes: (base,sub,prog-if) */ u8 hdr_type; /* PCI header type (`multi' flag masked out) */ + u8 pcie_type; /* PCI-E device/port type */ u8 rom_base_reg; /* which config register controls the ROM */ u8 pin;/* which interrupt pin this device uses */ @@ -182,6 +183,7 @@ unsigned intmsi_enabled:1; unsigned intmsix_enabled:1; unsigned intis_managed:1; + unsigned intis_pcie:1; atomic_tenable_cnt; /* pci_enable_device has been called */ u32 saved_config_space[16]; /* config space saved at suspend time */ -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Intel IOMMU 06/10] Avoid memory allocation failures in dma map api calls
Intel IOMMU driver needs memory during DMA map calls to setup its internal page tables and for other data structures. As we all know that these DMA map calls are mostly called in the interrupt context or with the spinlock held by the upper level drivers(network/storage drivers), so in order to avoid any memory allocation failure due to low memory issues, this patch makes memory allocation by temporarily setting PF_MEMALLOC flags for the current task before making memory allocation calls. We evaluated mempools as a backup when kmem_cache_alloc() fails and found that mempools are really not useful here because 1) We don;t know for sure how much to reserve in advance 2) And mempools are not useful for GFP_ATOMIC case (as we call memory alloc functions with GFP_ATOMIC) With PF_MEMALLOC flag set in the current-flags, the VM subsystem avoids any watermark checks before allocating memory thus guarantee'ing the memory till the last free page. Further, looking at the code in mm/page_alloc.c in __alloc_pages() function, looks like this flag is useful only in the non-interrupt context. If we are in the interrupt context and memory allocation in IOMMU driver fails for some reason, then the DMA map api's will return failure and it is up to the higher level drivers to retry. Suppose, if upper level driver programs the controller with the buggy DMA virtual address, the IOMMU will block that DMA transaction when that happens thus preventing any corruption to main memory. So far in our test scenario, we were unable to create any memory allocation failure inside dma map api calls. Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- drivers/pci/intel-iommu.c | 30 ++ 1 file changed, 26 insertions(+), 4 deletions(-) Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 15:45:46.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c 2007-06-19 13:10:29.0 -0700 @@ -84,9 +84,31 @@ static struct kmem_cache *iommu_devinfo_cache; static struct kmem_cache *iommu_iova_cache; +static inline void *iommu_kmem_cache_alloc(struct kmem_cache *cachep) +{ + unsigned int flags; + void *vaddr; + + /* trying to avoid low memory issues */ + flags = current-flags PF_MEMALLOC; + current-flags |= PF_MEMALLOC; + vaddr = kmem_cache_alloc(cachep, GFP_ATOMIC); + current-flags = (~PF_MEMALLOC | flags); + return vaddr; +} + + static inline void *alloc_pgtable_page(void) { - return (void *)get_zeroed_page(GFP_ATOMIC); + unsigned int flags; + void *vaddr; + + /* trying to avoid low memory issues */ + flags = current-flags PF_MEMALLOC; + current-flags |= PF_MEMALLOC; + vaddr = (void *)get_zeroed_page(GFP_ATOMIC); + current-flags = (~PF_MEMALLOC | flags); + return vaddr; } static inline void free_pgtable_page(void *vaddr) @@ -96,7 +118,7 @@ static inline void *alloc_domain_mem(void) { - return kmem_cache_alloc(iommu_domain_cache, GFP_ATOMIC); + return iommu_kmem_cache_alloc(iommu_domain_cache); } static inline void free_domain_mem(void *vaddr) @@ -106,7 +128,7 @@ static inline void * alloc_devinfo_mem(void) { - return kmem_cache_alloc(iommu_devinfo_cache, GFP_ATOMIC); + return iommu_kmem_cache_alloc(iommu_devinfo_cache); } static inline void free_devinfo_mem(void *vaddr) @@ -116,7 +138,7 @@ struct iova *alloc_iova_mem(void) { - return kmem_cache_alloc(iommu_iova_cache, GFP_ATOMIC); + return iommu_kmem_cache_alloc(iommu_iova_cache); } void free_iova_mem(struct iova *iova) -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Intel IOMMU 07/10] Intel iommu cmdline option - forcedac
Introduce intel_iommu=forcedac commandline option. This option is helpful to verify the pci device capability of handling physical dma'able address greater than 4G. Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- Documentation/kernel-parameters.txt |7 +++ drivers/pci/intel-iommu.c |7 ++- 2 files changed, 13 insertions(+), 1 deletion(-) Index: linux-2.6.22-rc4-mm2/Documentation/kernel-parameters.txt === --- linux-2.6.22-rc4-mm2.orig/Documentation/kernel-parameters.txt 2007-06-18 15:45:46.0 -0700 +++ linux-2.6.22-rc4-mm2/Documentation/kernel-parameters.txt2007-06-18 15:45:46.0 -0700 @@ -788,6 +788,13 @@ bypassed by not enabling DMAR with this option. In this case, gfx device will use physical address for DMA. + forcedac [x86_64] + With this option iommu will not optimize to look + for io virtual address below 32 bit forcing dual + address cycle on pci bus for cards supporting greater + than 32 bit addressing. The default is to look + for translation below 32 bit and if not available + then look in the higher range. io7=[HW] IO7 for Marvel based alpha systems See comment before marvel_specify_io7 in Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 15:45:46.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c 2007-06-19 13:09:06.0 -0700 @@ -53,6 +53,7 @@ static int dmar_disabled; static int __initdata dmar_map_gfx = 1; +static int dmar_forcedac; #define DUMMY_DEVICE_DOMAIN_INFO ((struct device_domain_info *)(-1)) static DEFINE_SPINLOCK(device_domain_lock); @@ -70,6 +71,10 @@ dmar_map_gfx = 0; printk(KERN_INFO Intel-IOMMU: disable GFX device mapping\n); + } else if (!strncmp(str, forcedac, 8)) { + printk (KERN_INFO + Intel-IOMMU: Forcing DAC for PCI devices\n); + dmar_forcedac = 1; } str += strcspn(str, ,); @@ -1557,7 +1562,7 @@ start_addr = IOVA_START_ADDR; - if (pdev-dma_mask = DMA_32BIT_MASK) { + if ((pdev-dma_mask = DMA_32BIT_MASK) || (dmar_forcedac)) { iova = iommu_alloc_iova(domain, addr, size, start_addr, pdev-dma_mask); } else { -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Intel IOMMU 03/10] clflush_cache_range now takes size param
Introduce the size param for clflush_cache_range(). Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- arch/x86_64/mm/pageattr.c |6 +++--- include/asm-x86_64/cacheflush.h |1 + 2 files changed, 4 insertions(+), 3 deletions(-) Index: linux-2.6.22-rc4-mm2/arch/x86_64/mm/pageattr.c === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/mm/pageattr.c 2007-06-18 15:45:39.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/mm/pageattr.c 2007-06-18 15:45:46.0 -0700 @@ -61,10 +61,10 @@ return base; } -static void cache_flush_page(void *adr) +void clflush_cache_range(void *adr, int size) { int i; - for (i = 0; i PAGE_SIZE; i += boot_cpu_data.x86_clflush_size) + for (i = 0; i size; i += boot_cpu_data.x86_clflush_size) asm volatile(clflush (%0) :: r (adr + i)); } @@ -80,7 +80,7 @@ list_for_each_entry(pg, l, lru) { void *adr = page_address(pg); if (cpu_has_clflush) - cache_flush_page(adr); + clflush_cache_range(adr, PAGE_SIZE); } __flush_tlb_all(); } Index: linux-2.6.22-rc4-mm2/include/asm-x86_64/cacheflush.h === --- linux-2.6.22-rc4-mm2.orig/include/asm-x86_64/cacheflush.h 2007-06-18 15:45:39.0 -0700 +++ linux-2.6.22-rc4-mm2/include/asm-x86_64/cacheflush.h2007-06-18 15:45:46.0 -0700 @@ -27,6 +27,7 @@ void global_flush_tlb(void); int change_page_attr(struct page *page, int numpages, pgprot_t prot); int change_page_attr_addr(unsigned long addr, int numpages, pgprot_t prot); +void clflush_cache_range(void *addr, int size); #ifdef CONFIG_DEBUG_RODATA void mark_rodata_ro(void); -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Intel IOMMU 09/10] Iommu Gfx workaround
When we fix all the opensource gfx drivers to use the DMA api's, at that time we can yank this config options out. Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- Documentation/Intel-IOMMU.txt |5 + arch/x86_64/Kconfig | 11 +++ arch/x86_64/kernel/e820.c | 19 +++ drivers/pci/intel-iommu.c | 33 + drivers/pci/intel-iommu.h |7 +++ 5 files changed, 75 insertions(+) Index: linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt === --- linux-2.6.22-rc4-mm2.orig/Documentation/Intel-IOMMU.txt 2007-06-18 15:45:08.0 -0700 +++ linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt 2007-06-18 15:45:08.0 -0700 @@ -57,6 +57,11 @@ If you encounter issues with graphics devices, you can try adding option intel_iommu=igfx_off to turn off the integrated graphics engine. +If it happens to be a PCI device included in the INCLUDE_ALL Engine, +then try enabling CONFIG_DMAR_GFX_WA to setup a 1-1 map. We hear +graphics drivers may be in process of using DMA api's in the near +future and at that time this option can be yanked out. + Some exceptions to IOVA --- Interrupt ranges are not address translated, (0xfee0 - 0xfeef). Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig 2007-06-18 15:45:07.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-18 15:45:08.0 -0700 @@ -741,6 +741,17 @@ and includes pci device scope covered by these DMA remapping device. +config DMAR_GFX_WA + bool Support for Graphics workaround + depends on DMAR + default y + help +Current Graphics drivers tend to use physical address +for DMA and avoid using DMA api's. Setting this config +option permits the IOMMU driver to set a unity map for +all the OS visible memory. Hence the driver can continue +to use physical addresses for DMA. + source drivers/pci/pcie/Kconfig source drivers/pci/Kconfig Index: linux-2.6.22-rc4-mm2/arch/x86_64/kernel/e820.c === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/kernel/e820.c 2007-06-18 15:44:44.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/kernel/e820.c 2007-06-18 15:45:08.0 -0700 @@ -723,3 +723,22 @@ printk(KERN_INFO Allocating PCI resources starting at %lx (gap: %lx:%lx)\n, pci_mem_start, gapstart, gapsize); } + +int __init arch_get_ram_range(int slot, u64 *addr, u64 *size) +{ + int i; + + if (slot 0 || slot = e820.nr_map) + return -1; + for (i = slot; i e820.nr_map; i++) { + if (e820.map[i].type != E820_RAM) + continue; + break; + } + if (i == e820.nr_map || e820.map[i].addr (max_pfn PAGE_SHIFT)) + return -1; + *addr = e820.map[i].addr; + *size = min_t(u64, e820.map[i].size + e820.map[i].addr, + max_pfn PAGE_SHIFT) - *addr; + return i + 1; +} Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 15:45:08.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c 2007-06-18 15:45:08.0 -0700 @@ -1601,6 +1601,36 @@ rmrr-end_address + 1); } +#ifdef CONFIG_DMAR_GFX_WA +extern int arch_get_ram_range(int slot, u64 *addr, u64 *size); +static void __init iommu_prepare_gfx_mapping(void) +{ + struct pci_dev *pdev = NULL; + u64 base, size; + int slot; + int ret; + + for_each_pci_dev(pdev) { + if (pdev-sysdata == DUMMY_DEVICE_DOMAIN_INFO || + !IS_GFX_DEVICE(pdev)) + continue; + printk(KERN_INFO IOMMU: gfx device %s 1-1 mapping\n, + pci_name(pdev)); + slot = arch_get_ram_range(0, base, size); + while (slot = 0) { + ret = iommu_prepare_identity_map(pdev, + base, base + size); + if (ret) + goto error; + slot = arch_get_ram_range(slot, base, size); + } + continue; +error: + printk(KERN_ERR IOMMU: mapping reserved region failed\n); + } +} +#endif + int __init init_dmars(void) { struct dmar_drhd_unit *drhd; @@ -1664,6 +1694,8 @@ } } + iommu_prepare_gfx_mapping(); + /* * for each drhd * enable fault log @@ -2175,3 +2207,4 @@ dma_ops = intel_dma_ops;
[Intel IOMMU 04/10] IOVA allocation and management routines
This code implements a generic IOVA allocation and management. As per Dave's suggestion we are now allocating IO virtual address from Higher DMA limit address rather than lower end address and this eliminated the need to preserve the IO virtual address for multiple devices sharing the same domain virtual address. Also this code uses red black trees to store the allocated and reserved iova nodes. This showed a good performance improvements over previous linear linked list. Changes from previous posting: 1) Fixed mostly coding style issues Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- drivers/pci/iova.c | 351 + drivers/pci/iova.h | 62 + 2 files changed, 413 insertions(+) Index: linux-2.6.22-rc4-mm2/drivers/pci/iova.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.22-rc4-mm2/drivers/pci/iova.c 2007-06-18 15:45:46.0 -0700 @@ -0,0 +1,351 @@ +/* + * Copyright (c) 2006, Intel Corporation. + * + * This file is released under the GPLv2. + * + * Copyright (C) 2006 Anil S Keshavamurthy [EMAIL PROTECTED] + */ + +#include iova.h + +void +init_iova_domain(struct iova_domain *iovad) +{ + spin_lock_init(iovad-iova_alloc_lock); + spin_lock_init(iovad-iova_rbtree_lock); + iovad-rbroot = RB_ROOT; + iovad-cached32_node = NULL; + +} + +static struct rb_node * +__get_cached_rbnode(struct iova_domain *iovad, unsigned long *limit_pfn) +{ + if ((*limit_pfn != DMA_32BIT_PFN) || + (iovad-cached32_node == NULL)) + return rb_last(iovad-rbroot); + else { + struct rb_node *prev_node = rb_prev(iovad-cached32_node); + struct iova *curr_iova = + container_of(iovad-cached32_node, struct iova, node); + *limit_pfn = curr_iova-pfn_lo - 1; + return prev_node; + } +} + +static inline void +__cached_rbnode_insert_update(struct iova_domain *iovad, + unsigned long limit_pfn, struct iova *new) +{ + if (limit_pfn != DMA_32BIT_PFN) + return; + iovad-cached32_node = new-node; +} + +static inline void +__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free) +{ + struct iova *cached_iova; + struct rb_node *curr; + + if (!iovad-cached32_node) + return; + curr = iovad-cached32_node; + cached_iova = container_of(curr, struct iova, node); + + if (free-pfn_lo = cached_iova-pfn_lo) + iovad-cached32_node = rb_next(free-node); +} + +static inline int __alloc_iova_range(struct iova_domain *iovad, + unsigned long size, unsigned long limit_pfn, struct iova *new) +{ + struct rb_node *curr = NULL; + unsigned long flags; + unsigned long saved_pfn; + + /* Walk the tree backwards */ + spin_lock_irqsave(iovad-iova_rbtree_lock, flags); + saved_pfn = limit_pfn; + curr = __get_cached_rbnode(iovad, limit_pfn); + while (curr) { + struct iova *curr_iova = container_of(curr, struct iova, node); + if (limit_pfn curr_iova-pfn_lo) + goto move_left; + if (limit_pfn curr_iova-pfn_hi) + goto adjust_limit_pfn; + if ((curr_iova-pfn_hi + size) = limit_pfn) + break; /* found a free slot */ +adjust_limit_pfn: + limit_pfn = curr_iova-pfn_lo - 1; +move_left: + curr = rb_prev(curr); + } + + if ((!curr) !(IOVA_START_PFN + size = limit_pfn)) { + spin_unlock_irqrestore(iovad-iova_rbtree_lock, flags); + return -ENOMEM; + } + new-pfn_hi = limit_pfn; + new-pfn_lo = limit_pfn - size + 1; + + spin_unlock_irqrestore(iovad-iova_rbtree_lock, flags); + return 0; +} + +static void +iova_insert_rbtree(struct rb_root *root, struct iova *iova) +{ + struct rb_node **new = (root-rb_node), *parent = NULL; + /* Figure out where to put new node */ + while (*new) { + struct iova *this = container_of(*new, struct iova, node); + parent = *new; + + if (iova-pfn_lo this-pfn_lo) + new = ((*new)-rb_left); + else if (iova-pfn_lo this-pfn_lo) + new = ((*new)-rb_right); + else + BUG(); /* this should not happen */ + } + /* Add new node and rebalance tree. */ + rb_link_node(iova-node, parent, new); + rb_insert_color(iova-node, root); +} + +/** + * alloc_iova - allocates an iova + * @iovad - iova domain in question + * @size - size of page frames to allocate + * @limit_pfn - max limit address + * This function allocates an iova in the range limit_pfn to IOVA_START_PFN + * looking from limit_pfn instead from IOVA_START_PFN. + */
[Intel IOMMU 10/10] Iommu floppy workaround
This config option (DMAR_FLPY_WA) sets up 1:1 mapping for the floppy device so that the floppy device which does not use DMA api's will continue to work. Once the floppy driver starts using DMA api's this config option can be turn off or this patch can be yanked out of kernel at that time. Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- arch/x86_64/Kconfig | 10 ++ drivers/pci/intel-iommu.c | 22 ++ drivers/pci/intel-iommu.h |7 +++ 3 files changed, 39 insertions(+) Index: linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/Kconfig 2007-06-18 15:45:08.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/Kconfig2007-06-18 15:45:09.0 -0700 @@ -752,6 +752,16 @@ all the OS visible memory. Hence the driver can continue to use physical addresses for DMA. +config DMAR_FLPY_WA + bool Support for Floppy disk workaround + depends on DMAR + default y + help +Floppy disk drivers are know to by pass dma api calls +their by failing to work when IOMMU is enabled. This +work around will setup a 1 to 1 mappings for the first +16M to make floppy(isa device) work. + source drivers/pci/pcie/Kconfig source drivers/pci/Kconfig Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 15:45:08.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c 2007-06-18 15:45:09.0 -0700 @@ -1631,6 +1631,26 @@ } #endif +#ifdef CONFIG_DMAR_FLPY_WA +static inline void iommu_prepare_isa(void) +{ + struct pci_dev *pdev = NULL; + int ret; + + pdev = pci_get_class (PCI_CLASS_BRIDGE_ISA 8, NULL); + if (!pdev) + return; + + printk (KERN_INFO IOMMU: Prepare 0-16M unity mapping for LPC\n); + ret = iommu_prepare_identity_map(pdev, 0, 16*1024*1024); + + if (ret) + printk (IOMMU: Failed to create 0-64M identity map, \ + Floppy might not work\n); + +} +#endif + int __init init_dmars(void) { struct dmar_drhd_unit *drhd; @@ -1696,6 +1716,8 @@ iommu_prepare_gfx_mapping(); + iommu_prepare_isa(); + /* * for each drhd * enable fault log Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.h === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.h 2007-06-18 15:45:08.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.h 2007-06-18 15:45:09.0 -0700 @@ -322,4 +322,11 @@ } #endif /* !CONFIG_DMAR_GFX_WA */ +#ifndef CONFIG_DMAR_FLPY_WA +static inline void iommu_prepare_isa(void) +{ + return; +} +#endif /* !CONFIG_DMAR_FLPY_WA */ + #endif -- - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Intel IOMMU 08/10] DMAR fault handling support
MSI interrupt handler registrations and fault handling support for Intel-IOMMU hadrware. This patch enables the MSI interrupts for the DMA remapping units and in the interrupt handler read the fault cause and outputs the same on to the console. Signed-off-by: Anil S Keshavamurthy [EMAIL PROTECTED] --- Documentation/Intel-IOMMU.txt | 17 +++ arch/x86_64/kernel/io_apic.c | 59 drivers/pci/intel-iommu.c | 194 ++ include/linux/dmar.h | 12 ++ 4 files changed, 281 insertions(+), 1 deletion(-) Index: linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt === --- linux-2.6.22-rc4-mm2.orig/Documentation/Intel-IOMMU.txt 2007-06-18 15:45:46.0 -0700 +++ linux-2.6.22-rc4-mm2/Documentation/Intel-IOMMU.txt 2007-06-19 13:05:03.0 -0700 @@ -63,6 +63,15 @@ The same is true for peer to peer transactions. Hence we reserve the address from PCI MMIO ranges so they are not allocated for IOVA addresses. + +Fault reporting +--- +When errors are reported, the DMA engine signals via an interrupt. The fault +reason and device that caused it with fault reason is printed on console. + +See below for sample. + + Boot Message Sample --- @@ -85,6 +94,14 @@ PCI-DMA: Using DMAR IOMMU +Fault reporting +--- + +DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000 +DMAR:[fault reason 05] PTE Write access is not set +DMAR:[DMA Write] Request device [00:02.0] fault addr 6df084000 +DMAR:[fault reason 05] PTE Write access is not set + TBD Index: linux-2.6.22-rc4-mm2/arch/x86_64/kernel/io_apic.c === --- linux-2.6.22-rc4-mm2.orig/arch/x86_64/kernel/io_apic.c 2007-06-18 15:45:38.0 -0700 +++ linux-2.6.22-rc4-mm2/arch/x86_64/kernel/io_apic.c 2007-06-18 15:45:46.0 -0700 @@ -31,6 +31,7 @@ #include linux/sysdev.h #include linux/msi.h #include linux/htirq.h +#include linux/dmar.h #ifdef CONFIG_ACPI #include acpi/acpi_bus.h #endif @@ -2026,8 +2027,64 @@ destroy_irq(irq); } -#endif /* CONFIG_PCI_MSI */ +#ifdef CONFIG_DMAR +#ifdef CONFIG_SMP +static void dmar_msi_set_affinity(unsigned int irq, cpumask_t mask) +{ + struct irq_cfg *cfg = irq_cfg + irq; + struct msi_msg msg; + unsigned int dest; + cpumask_t tmp; + + cpus_and(tmp, mask, cpu_online_map); + if (cpus_empty(tmp)) + return; + + if (assign_irq_vector(irq, mask)) + return; + + cpus_and(tmp, cfg-domain, mask); + dest = cpu_mask_to_apicid(tmp); + + dmar_msi_read(irq, msg); + + msg.data = ~MSI_DATA_VECTOR_MASK; + msg.data |= MSI_DATA_VECTOR(cfg-vector); + msg.address_lo = ~MSI_ADDR_DEST_ID_MASK; + msg.address_lo |= MSI_ADDR_DEST_ID(dest); + + dmar_msi_write(irq, msg); + irq_desc[irq].affinity = mask; +} +#endif /* CONFIG_SMP */ + +struct irq_chip dmar_msi_type = { + .name = DMAR_MSI, + .unmask = dmar_msi_unmask, + .mask = dmar_msi_mask, + .ack = ack_apic_edge, +#ifdef CONFIG_SMP + .set_affinity = dmar_msi_set_affinity, +#endif + .retrigger = ioapic_retrigger_irq, +}; + +int arch_setup_dmar_msi(unsigned int irq) +{ + int ret; + struct msi_msg msg; + + ret = msi_compose_msg(NULL, irq, msg); + if (ret 0) + return ret; + dmar_msi_write(irq, msg); + set_irq_chip_and_handler_name(irq, dmar_msi_type, handle_edge_irq, + edge); + return 0; +} +#endif +#endif /* CONFIG_PCI_MSI */ /* * Hypertransport interrupt support */ Index: linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c === --- linux-2.6.22-rc4-mm2.orig/drivers/pci/intel-iommu.c 2007-06-18 15:45:46.0 -0700 +++ linux-2.6.22-rc4-mm2/drivers/pci/intel-iommu.c 2007-06-19 13:05:03.0 -0700 @@ -742,6 +742,196 @@ return 0; } +/* iommu interrupt handling. Most stuff are MSI-like. */ + +static char *fault_reason_strings[] = +{ + Software, + Present bit in root entry is clear, + Present bit in context entry is clear, + Invalid context entry, + Access beyond MGAW, + PTE Write access is not set, + PTE Read access is not set, + Next page table ptr is invalid, + Root table address invalid, + Context table ptr is invalid, + non-zero reserved fields in RTP, + non-zero reserved fields in CTP, + non-zero reserved fields in PTE, + Unknown +}; +#define MAX_FAULT_REASON_IDX ARRAY_SIZE(fault_reason_strings) + +char *dmar_get_fault_reason(u8 fault_reason) +{ + if (fault_reason MAX_FAULT_REASON_IDX) + return fault_reason_strings[MAX_FAULT_REASON_IDX]; + else + return
Re: [Intel IOMMU 05/10] Intel IOMMU driver
On Tue, Jun 19, 2007 at 04:32:23PM -0700, Christoph Lameter wrote: On Tue, 19 Jun 2007, Keshavamurthy, Anil S wrote: +static inline void *alloc_pgtable_page(void) +{ + return (void *)get_zeroed_page(GFP_ATOMIC); +} Need to pass gfp_t parameter. Repeates a couple of times. + addr = (((u64)1) addr_width) - 1; + parent = domain-pgd; + + spin_lock_irqsave(domain-mapping_lock, flags); + while (level 0) { + void *tmp_page; + + offset = address_level_offset(addr, level); + pte = parent[offset]; + if (level == 1) + break; + + if (!dma_pte_present(*pte)) { + tmp_page = alloc_pgtable_page(); Is it not possible here to drop the lock and do the alloc with GFP_KERNEL and deal with the resulting race? That is done in other parts of the kernel. +/* iommu handling */ +static int iommu_alloc_root_entry(struct intel_iommu *iommu) +{ + struct root_entry *root; + unsigned long flags; + + root = (struct root_entry *)alloc_pgtable_page(); This may be able to become a GFP_KERNEL alloc since interrupts are enabled at this point? Memory allocated during driver init is very less and not much benefit with the suggested changes I think. Please correct me If I am wrong. The biggest benifit will be when we can figure out GPF_ flags during runtime when DMA map api's are called. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel-IOMMU 06/10] Intel IOMMU driver
On Thu, Jun 07, 2007 at 04:57:39PM -0700, Andrew Morton wrote: > On Wed, 06 Jun 2007 11:57:04 -0700 > [EMAIL PROTECTED] wrote: Andrew, Most of your comments make sense, I will fix all of them and resend soon. Thanks for your time taking a look at my patch. -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel-IOMMU 06/10] Intel IOMMU driver
On Thu, Jun 07, 2007 at 04:57:39PM -0700, Andrew Morton wrote: On Wed, 06 Jun 2007 11:57:04 -0700 [EMAIL PROTECTED] wrote: Andrew, Most of your comments make sense, I will fix all of them and resend soon. Thanks for your time taking a look at my patch. -Anil - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel-IOMMU 02/10] Library routine for pre-allocat pool handling
On Mon, Jun 11, 2007 at 02:14:49PM -0700, Andrew Morton wrote: > On Mon, 11 Jun 2007 13:44:42 -0700 > "Keshavamurthy, Anil S" <[EMAIL PROTECTED]> wrote: > > > In the first implementation of ours, we had used mempools api's to > > allocate memory and we were told that mempools with GFP_ATOMIC is > > useless and hence in the second implementation we came up with > > resource pools ( which is preallocate pools) and again as I understand > > the argument is why create another when we have slab allocation which > > is similar to this resource pools. > > Odd. mempool with GFP_ATOMIC is basically equivalent to your > resource-pools, isn't it?: we'll try the slab allocator and if that failed, > fall back to the reserves. slab allocators don;t reserve the memory, in other words this memory can be consumed by VM under memory pressure which we don;t want in IOMMU case. Nope,they both are exactly opposite. mempool with GFP_ATOMIC, first tries to get memory from OS and if that fails, it looks for the object in the pool and returns. Where as resource pool is exactly opposite of mempool, where each time it looks for an object in the pool and if it exist then we return that object else we try to get the memory for OS while scheduling the work to grow the pool objects. In fact, the work is schedule to grow the pool when the low threshold point is hit. Hence, I still feel resource pool implementation is best choice for IOMMU. -Anil - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Intel-IOMMU 02/10] Library routine for pre-allocat pool handling
On Tue, Jun 12, 2007 at 12:25:57AM +0200, Andi Kleen wrote: > > > Please advice. > > I think the short term only safe option would be to fully preallocate an > aperture. > If it is too small you can try GFP_ATOMIC but it would be just > a unreliable fallback. For safety you could perhaps have some kernel thread > that tries to enlarge it in the background depending on current > use. That would be not 100% guaranteed to keep up with load, > but would at least keep up if the system is not too busy. > > That is basically what your resource pools do, but they seem > to be unnecessarily convoluted for the task :- after all you > could just preallocate the page tables and rewrite/flush them without > having some kind of allocator inbetween, can't you? Nope, it is not convoluted for the task. If you see carefully how the IO virtual address is obtained, I am basically reusing the the previous translated virutal address once it is freed instead of going for continious IO virtural address. Because of this reuse of IO virtual address, these address tend to map to the same PAGE tables and hence the memory for page tables itself does not grow unless there is that much IO going on in the System where all entries in the page tables are full(which means that much IO is in flight). The only defect I see in the current resource pool is that I am queuing the work on Keventd to grow the pool which could be a problem as many other subsystem in the kernel depends on keventd and as Anderew pointed out we could dead lock. If we have a separate worker thread to grow the pool then the deadlock issues is taken care. I would still love to get my current resource pool (just fix the keventd to separate thread to grow the pool) implementations to get into linus kernrl compared to kmem_cache_alloc implementation as I don;t see any benifit in moving to kmem_cache_alloc. But if people want I can provide kmem_cache_alloc implementation too just for comparisions. But this does not solve the fundamental problem that we have today. So ideally for IOMMU's we should have some preallocated buffers and if the buffers reach certain min_threshould the pool should grow in the background and all of these features is in resource pool implementation. Since we did not see any problems, can we atleat try this resource pool implementation in the linux MM kernels? If it is too bad, then I will change to kmem_cache_alloc() version. If this testing is OKAY, then I will refresh my patch for the coding styles etc and resubmit with resource pool implementation. Andrew?? > > If you make the start value large enough (256+MB?) that might reasonably > work. How much memory in page tables would that take? Or perhaps scale > it with available memory or available devices. What you are suggesting is to prealloacate and setup the page tables at the begining. But this would waste lot of memory because we don't know ahead of time how large the page table setup should be and in future our hardware can support 64K domains where each domain can dish out independent address from its start to end address range. And pre-setup of tables for all of the 64K domains is not feasible. > > In theory it could also be precomputed from the block/network device queue > lengths etc.; the trouble is just such checks would need to be added to all > kinds of > other odd subsystems that manage devices too. That would be much more work. > > Some investigation how to do sleeping block/network submit would be > also interesting (e.g. replace the spinlocks there with mutexes and see how > much it affects performance). For networking you would need to keep > at least a non sleeping path though because packets can be legally > submitted from interrupt context. If it works out then sleeping > interfaces to the IOMMU code could be added. Yup, these investigations needs to happen and sooner the better for all and for general linux community. > > -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/